Introduction

MMLU

Qwen3-4B-Diversity is a fine-tuned language model based on Qwen/Qwen3-4B that has been trained on a diverse collection of high-quality reasoning datasets. This model combines knowledge distilled from various state-of-the-art AI systems to provide enhanced reasoning capabilities across multiple domains including mathematics, coding, general problem-solving, and multi-turn conversations.

Training Configuration

The model was trained using supervised fine-tuning techniques with parameter-efficient methods to optimize performance while maintaining computational efficiency. Key training parameters include:

Parameter Value
Number of Epochs 2
Context Length 40,960

Hardware and Resources

Resource Specification
GPU A100-80GB
Training Duration Approximately 17 hours
Estimated Cost $27 to $30

Training Data

Dataset Rows Used Model
ianncity/KIMI-K2.5-550000x (General-Distillation) 1,000 Kimi K2.5
Jackrong/Qwen3.5-reasoning-700x 633 Qwen3.5
nohurry/Opus-4.6-Reasoning-3000x-filtered 2,326 Claude Opus 4.6
TeichAI/claude-4.5-opus-high-reasoning-250x 250 Claude Opus 4.5
TeichAI/gemini-3-pro-preview-high-reasoning-250x 248 Gemini 3 Pro
TeichAI/claude-haiku-4.5-high-reasoning-1700x 1,688 Claude Haiku 4.5
TeichAI/gpt-5.2-high-reasoning-250x 249 GPT-5.2
Roman1111111/gemini-3.1-pro-hard-high-reasoning 3,150 Gemini 3.1 Pro
Jackrong/glm-4.7-multiturn-CoT 5,090 GLM-4.7
bmeyer2025/glm5-reasoning-traces 1,744 GLM-5
TeichAI/claude-sonnet-4.5-high-reasoning-250x 247 Claude Sonnet 4.5
TeichAI/deepseek-v3.2-speciale-openr1-math-3k 3,317 DeepSeek V3.2-Speciale
TeichAI/deepseek-v3.2-speciale-OpenCodeReasoning-3k 2,953 DeepSeek V3.2-Speciale
TeichAI/deepseek-v3.2-speciale-1000x 991 DeepSeek V3.2-Speciale
TeichAI/gpt-5-codex-1000x 991 GPT-5 Codex
Total 24,877 Combined diverse reasoning dataset

Model Capabilities

This model excels in several key areas:

  1. Advanced Reasoning: The model can break down complex problems into steps and provide detailed reasoning processes.

  2. Mathematical Problem Solving: Enhanced capabilities for mathematical reasoning and problem-solving through dedicated math-focused datasets.

  3. Code Generation and Understanding: Improved coding abilities from multiple code-reasoning datasets including DeepSeek and GPT-5 Codex data.

  4. Multi-Turn Conversations: Better handling of extended dialogues and context-aware responses.

  5. Domain Versatility: Exposure to reasoning patterns from various AI systems provides flexibility across different domains and task types.

Usage

Quick Demo

If you are looking for a quick demo that is completely free and without any cost, you can use Google Colab.

Ollama (Local)

# https://ollama.com/hadad/qwen3-4bd

# hadad/qwen3-4bd:Q8_0  |  4.3GB
# hadad/qwen3-4bd:BF16  |  8.1GB

# ollama pull hadad/qwen3-4bd:Q8_0

ollama run hadad/qwen3-4bd:Q8_0

If you are using Ollama and are interested in tools or function calling, it is recommended to use the OpenAI-compatible API provided by Ollama. This approach is more powerful.

Refer to the Ollama documentation.

Python (Local)

#pip install transformers==4.56.2
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "hadadxyz/Qwen3-4B-Diversity"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Inference Parameters

For optimal results, we recommend the following generation parameters:

Thinking

Parameter Recommended Value Description
temperature 0.6 Controls randomness in generation
top_p 0.95 Nucleus sampling threshold
top_k 20 Top-k sampling parameter
min_p 0 Minimum probability threshold

Non-Thinking

Parameter Recommended Value Description
temperature 0.7 Controls randomness in generation
top_p 0.8 Nucleus sampling threshold
top_k 20 Top-k sampling parameter
min_p 0 Minimum probability threshold

Citation

If you use this model in your research or applications, please cite both this model and the base model:

@misc{qwen3-4b-diversity,
  author = {hadadxyz},
  title  = {Qwen3-4B-Diversity},
  year   = {2026},
  url    = {https://huggingface.co/hadadxyz/Qwen3-4B-Diversity}
}

Acknowledgments

This model was made possible through the combination of multiple high-quality datasets from the community. We acknowledge and thank all dataset creators and the Qwen team for providing the excellent base model.

Downloads last month
335
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 3 Ask for provider support

Model tree for hadadxyz/Qwen3-4B-Diversity

Finetuned
Qwen/Qwen3-4B
Finetuned
(545)
this model

Datasets used to train hadadxyz/Qwen3-4B-Diversity

Evaluation results