Qwen3.5-122B-A10B Claude-Distill

A fine-tuned version of Qwen/Qwen3.5-122B-A10B through knowledge distillation from Claude. This model is trained with full parameter fine-tuning.

Training Data

Distilled from Claude on the following datasets:

Dataset Samples Description
Claude Opus 4.5 High Reasoning 250 High reasoning depth samples
Claude Opus 4.6 Reasoning 9,633 Math, logic puzzles, multi-step instructions with CoT
Claude Opus 4.6 High Reasoning 757 Coding and creative writing with adaptive reasoning
Claude Opus 4.6 Extended Reasoning 500 Extended reasoning across STEM and practical domains
Claude Opus 4.6 Extended Reasoning 887x 887 Tool calling, bullshit detection, multi-turn traces
Claude Sonnet & Opus 4.6 Reasoning 524 Natural human-written prompts from Reddit & Stack Overflow
Opus 4.6 Reasoning Filtered 2,326 Filtered reasoning traces (refusals removed)

Total: ~14.9K samples

Benchmark Results

Benchmark Results

For detailed benchmark results and model architecture, please refer to the original Qwen3.5-122B-A10B model card.

Quickstart

For full usage guide, please refer to the original Qwen3.5-122B-A10B model card.

Using with vLLM

vllm serve Kassadin88/Qwen3.5-122B-A10B-Claude-distill \
    --port 8000 \
    --tensor-parallel-size 8 \
    --max-model-len 262144 \
    --trust-remote-code \
    --reasoning-parser qwen3

Using with SGLang

python -m sglang.launch_server \
    --model-path Kassadin88/Qwen3.5-122B-A10B-Claude-distill \
    --port 8000 \
    --tp-size 8 \
    --mem-fraction-static 0.8 \
    --context-length 262144 \
    --reasoning-parser qwen3

Using with Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Kassadin88/Qwen3.5-122B-A10B-Claude-distill"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

# prepare the model input
messages = [
    {"role": "user", "content": "Hello, how are you?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct generation
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

# decode the generated tokens
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Citation

@misc{qwen3.5,
    title  = {{Qwen3.5}: Towards Native Multimodal Agents},
    author = {{Qwen Team}},
    month  = {February},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.5}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kassadin88/Qwen3.5-122B-A10B-Claude-distill

Finetuned
(23)
this model