Qwen3.5-122B-A10B Claude-Distill

A fine-tuned version of Qwen/Qwen3.5-122B-A10B through knowledge distillation from Claude. This model is trained with full parameter fine-tuning.

Training Data

Distilled from Claude on the following datasets:

Dataset	Samples	Description
Claude Opus 4.5 High Reasoning	250	High reasoning depth samples
Claude Opus 4.6 Reasoning	9,633	Math, logic puzzles, multi-step instructions with CoT
Claude Opus 4.6 High Reasoning	757	Coding and creative writing with adaptive reasoning
Claude Opus 4.6 Extended Reasoning	500	Extended reasoning across STEM and practical domains
Claude Opus 4.6 Extended Reasoning 887x	887	Tool calling, bullshit detection, multi-turn traces
Claude Sonnet & Opus 4.6 Reasoning	524	Natural human-written prompts from Reddit & Stack Overflow
Opus 4.6 Reasoning Filtered	2,326	Filtered reasoning traces (refusals removed)

Total: ~14.9K samples

Benchmark Results

For detailed benchmark results and model architecture, please refer to the original Qwen3.5-122B-A10B model card.

Quickstart

For full usage guide, please refer to the original Qwen3.5-122B-A10B model card.

Using with vLLM

vllm serve Kassadin88/Qwen3.5-122B-A10B-Claude-distill \
    --port 8000 \
    --tensor-parallel-size 8 \
    --max-model-len 262144 \
    --trust-remote-code \
    --reasoning-parser qwen3

Using with SGLang

python -m sglang.launch_server \
    --model-path Kassadin88/Qwen3.5-122B-A10B-Claude-distill \
    --port 8000 \
    --tp-size 8 \
    --mem-fraction-static 0.8 \
    --context-length 262144 \
    --reasoning-parser qwen3

Using with Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Kassadin88/Qwen3.5-122B-A10B-Claude-distill"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

# prepare the model input
messages = [
    {"role": "user", "content": "Hello, how are you?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct generation
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

# decode the generated tokens
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Citation

@misc{qwen3.5,
    title  = {{Qwen3.5}: Towards Native Multimodal Agents},
    author = {{Qwen Team}},
    month  = {February},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.5}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kassadin88/Qwen3.5-122B-A10B-Claude-distill

Base model

Qwen/Qwen3.5-122B-A10B

Finetuned

(23)

this model