Qwen3.5-122B-A10B Claude-Distill
A fine-tuned version of Qwen/Qwen3.5-122B-A10B through knowledge distillation from Claude. This model is trained with full parameter fine-tuning.
Training Data
Distilled from Claude on the following datasets:
| Dataset | Samples | Description |
|---|---|---|
| Claude Opus 4.5 High Reasoning | 250 | High reasoning depth samples |
| Claude Opus 4.6 Reasoning | 9,633 | Math, logic puzzles, multi-step instructions with CoT |
| Claude Opus 4.6 High Reasoning | 757 | Coding and creative writing with adaptive reasoning |
| Claude Opus 4.6 Extended Reasoning | 500 | Extended reasoning across STEM and practical domains |
| Claude Opus 4.6 Extended Reasoning 887x | 887 | Tool calling, bullshit detection, multi-turn traces |
| Claude Sonnet & Opus 4.6 Reasoning | 524 | Natural human-written prompts from Reddit & Stack Overflow |
| Opus 4.6 Reasoning Filtered | 2,326 | Filtered reasoning traces (refusals removed) |
Total: ~14.9K samples
Benchmark Results
For detailed benchmark results and model architecture, please refer to the original Qwen3.5-122B-A10B model card.
Quickstart
For full usage guide, please refer to the original Qwen3.5-122B-A10B model card.
Using with vLLM
vllm serve Kassadin88/Qwen3.5-122B-A10B-Claude-distill \
--port 8000 \
--tensor-parallel-size 8 \
--max-model-len 262144 \
--trust-remote-code \
--reasoning-parser qwen3
Using with SGLang
python -m sglang.launch_server \
--model-path Kassadin88/Qwen3.5-122B-A10B-Claude-distill \
--port 8000 \
--tp-size 8 \
--mem-fraction-static 0.8 \
--context-length 262144 \
--reasoning-parser qwen3
Using with Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Kassadin88/Qwen3.5-122B-A10B-Claude-distill"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
# prepare the model input
messages = [
{"role": "user", "content": "Hello, how are you?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct generation
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
# decode the generated tokens
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Citation
@misc{qwen3.5,
title = {{Qwen3.5}: Towards Native Multimodal Agents},
author = {{Qwen Team}},
month = {February},
year = {2026},
url = {https://qwen.ai/blog?id=qwen3.5}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Kassadin88/Qwen3.5-122B-A10B-Claude-distill
Base model
Qwen/Qwen3.5-122B-A10B