Harmonic-9B
A reasoning-focused fine-tune of Qwen 3.5 9B trained on structurally validated data where every row passes automated quality gates. No junk, no filler, no shallow traces.
The name comes from harmonic analysis of reasoning patterns - the structural signal that separates genuine thinking from surface-level chain-of-thought.
For the agentic tool-calling variant, see Harmonic-Hermes-9B (coming soon) - a Stage 2 fine-tune of this model on quality-filtered agent traces from DJLougen/hermes-agent-traces-filtered.
Training Approach
799 curated rows. That's it. A small, precisely curated dataset instead of tens of thousands of unfiltered examples. The base model already has the knowledge from pretraining - the fine-tune teaches it a reasoning behavior pattern.
Every training row contains explicit self-correction ("wait, that's not right"), verification ("let me check by plugging back in"), and multi-path exploration ("alternatively, I could try..."). The data was generated from multiple frontier models and filtered through a custom structural quality pipeline that enforces reasoning depth, coherence, and flow patterns. 100% of rows pass all quality gates simultaneously.
Training Data Quality
The reasoning data was curated using a custom structural process supervision pipeline. Key metrics:
| Metric | Value |
|---|---|
| Signal quality score | 78.7 mean (61.5 min, 90.0 max) |
| Thinking trace depth | 1,667 words average |
| Self-correction | 100% of rows (17.2 per row avg) |
| Verification | 100% of rows (10.3 per row avg) |
| Exploration | 100% of rows (6.3 per row avg) |
| Quality gate pass rate | 100% |
Every row was scored across multiple structural dimensions and only rows passing all thresholds simultaneously were included. No rows were manually curated - the pipeline is fully automated and reproducible.
How It Compares
We ran our structural quality analysis against every major public reasoning dataset used for Opus/Qwen distillation. The results:
| Dataset | Rows | Think Words | Self-Correction | Verification | Exploration | Signal Score | Gate Pass |
|---|---|---|---|---|---|---|---|
| Harmonic (ours) | 799 | 1,667 | 100% | 100% | 100% | 78.7 | 100% |
| Crownelius/Opus-3300x | 2,160 | 188 | 5.9% | 22.6% | 5.2% | 28.0 | 0.1% |
| nohurry/Opus-Filtered | 2,326 | 191 | 6.7% | 24.1% | 5.3% | 28.5 | 0.1% |
| TeichAI/Opus-250x | 250 | 323 | 17.2% | 26.8% | 6.8% | 24.6 | 0.4% |
| Jackrong/Qwen-700x | 633 | 6,653 | 97.5% | 97.6% | 69.8% | 75.6 | 22.7% |
| Bespoke-Stratos-17k | 16,710 | 1,322 | 88.2% | 72.7% | 59.7% | 71.7 | 49.0% |
| glaiveai/reasoning-20m | 22M+ | 799 | 64.1% | 41.4% | 37.3% | 46.2 | 12.8% |
| KingNish/reasoning-20k | 19,944 | 132 | 0.7% | 4.2% | 4.3% | 27.4 | 0.0% |
The popular Opus distillation datasets (Crownelius, nohurry, TeichAI) have less than 1% quality gate pass rate. Their thinking traces average under 200 words with near-zero self-correction. Models trained on this data learn to produce short, shallow chain-of-thought that looks like reasoning but lacks the structural behaviors that make reasoning reliable.
Jackrong and Stratos are closer competitors but still fall short on consistency. Jackrong has massive traces (6,653 words avg) but only 22.7% pass the quality gate - the thinking is verbose but wanders. Stratos has decent markers but 49% of rows still fail, meaning half the gradient updates during training push the model toward shallow patterns.
Harmonic's data is smaller by design. Every row passes. Every gradient update reinforces genuine reasoning behavior.
Reasoning Flow
Marker density measured across 20 equal segments of each thinking trace. The characteristic curve shows reasoning intensity building through the middle of the trace and peaking in the later segments as the model enters verification and self-correction before committing to an answer.
Training Configuration
base_model: Qwen/Qwen3.5-9B
dataset: 799 curated reasoning rows
epochs: 1
learning_rate: 1e-4
lr_scheduler: cosine
warmup_ratio: 0.1
max_seq_length: 8192
lora_rank: 32
lora_alpha: 32
dropout: 0.05
micro_batch_size: 1
gradient_accumulation_steps: 4
weight_decay: 0.01
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("DJLougen/Harmonic-9B")
tokenizer = AutoTokenizer.from_pretrained("DJLougen/Harmonic-9B")
Reasoning format
The model uses <think> blocks for reasoning:
<think>
The user is asking about X. Let me consider two approaches...
Approach 1: ...
Approach 2: ...
I'll go with Approach 1 because...
Wait, I need to be careful here - this assumes Y, which may not hold.
Let me verify by checking a special case...
Yes, that confirms the result.
</think>
[Final answer here]
Intended Use
- Reasoning tasks requiring genuine multi-step thinking
- Mathematical problem-solving with self-correction
- Code analysis and generation with structured verification
- General conversation (conversational ability preserved through training design)
- Base model for Stage 2 agentic fine-tuning
Limitations
- 9B parameter model - not suitable for tasks requiring extensive world knowledge
- Reasoning traces can be verbose for simple questions
- Not optimized for tool calling - see Harmonic-Hermes-9B (coming soon) for agentic use
- Benchmark evaluation is ongoing
Architecture
- Base: Qwen 3.5 9B (9.65B parameters)
- Training: LoRA fine-tuning, merged into base weights
- Precision: BF16
- Context: 8192 tokens
License
Apache 2.0 - same as the base model. All training data is from Apache 2.0 or MIT licensed sources. Fully commercial use permitted.
Links
- GGUF quantizations: DJLougen/Harmonic-9B-GGUF
- Agentic variant: Harmonic-Hermes-9B (coming soon)
- Filtered agent dataset: DJLougen/hermes-agent-traces-filtered
- Downloads last month
- 1,109
Model tree for DJLougen/Harmonic-9B
Base model
Qwen/Qwen3.5-9B-Base


