Harmonic-9B

A reasoning-focused fine-tune of Qwen 3.5 9B trained on structurally validated data where every row passes automated quality gates. No junk, no filler, no shallow traces.

The name comes from harmonic analysis of reasoning patterns - the structural signal that separates genuine thinking from surface-level chain-of-thought.

For the agentic tool-calling variant, see Harmonic-Hermes-9B (coming soon) - a Stage 2 fine-tune of this model on quality-filtered agent traces from DJLougen/hermes-agent-traces-filtered.

Training Approach

Pipeline

799 curated rows. That's it. A small, precisely curated dataset instead of tens of thousands of unfiltered examples. The base model already has the knowledge from pretraining - the fine-tune teaches it a reasoning behavior pattern.

Every training row contains explicit self-correction ("wait, that's not right"), verification ("let me check by plugging back in"), and multi-path exploration ("alternatively, I could try..."). The data was generated from multiple frontier models and filtered through a custom structural quality pipeline that enforces reasoning depth, coherence, and flow patterns. 100% of rows pass all quality gates simultaneously.

Training Data Quality

Training Quality

The reasoning data was curated using a custom structural process supervision pipeline. Key metrics:

Metric Value
Signal quality score 78.7 mean (61.5 min, 90.0 max)
Thinking trace depth 1,667 words average
Self-correction 100% of rows (17.2 per row avg)
Verification 100% of rows (10.3 per row avg)
Exploration 100% of rows (6.3 per row avg)
Quality gate pass rate 100%

Every row was scored across multiple structural dimensions and only rows passing all thresholds simultaneously were included. No rows were manually curated - the pipeline is fully automated and reproducible.

How It Compares

Competitor Comparison

We ran our structural quality analysis against every major public reasoning dataset used for Opus/Qwen distillation. The results:

Dataset Rows Think Words Self-Correction Verification Exploration Signal Score Gate Pass
Harmonic (ours) 799 1,667 100% 100% 100% 78.7 100%
Crownelius/Opus-3300x 2,160 188 5.9% 22.6% 5.2% 28.0 0.1%
nohurry/Opus-Filtered 2,326 191 6.7% 24.1% 5.3% 28.5 0.1%
TeichAI/Opus-250x 250 323 17.2% 26.8% 6.8% 24.6 0.4%
Jackrong/Qwen-700x 633 6,653 97.5% 97.6% 69.8% 75.6 22.7%
Bespoke-Stratos-17k 16,710 1,322 88.2% 72.7% 59.7% 71.7 49.0%
glaiveai/reasoning-20m 22M+ 799 64.1% 41.4% 37.3% 46.2 12.8%
KingNish/reasoning-20k 19,944 132 0.7% 4.2% 4.3% 27.4 0.0%

The popular Opus distillation datasets (Crownelius, nohurry, TeichAI) have less than 1% quality gate pass rate. Their thinking traces average under 200 words with near-zero self-correction. Models trained on this data learn to produce short, shallow chain-of-thought that looks like reasoning but lacks the structural behaviors that make reasoning reliable.

Jackrong and Stratos are closer competitors but still fall short on consistency. Jackrong has massive traces (6,653 words avg) but only 22.7% pass the quality gate - the thinking is verbose but wanders. Stratos has decent markers but 49% of rows still fail, meaning half the gradient updates during training push the model toward shallow patterns.

Harmonic's data is smaller by design. Every row passes. Every gradient update reinforces genuine reasoning behavior.

Reasoning Flow

Reasoning Flow

Marker density measured across 20 equal segments of each thinking trace. The characteristic curve shows reasoning intensity building through the middle of the trace and peaking in the later segments as the model enters verification and self-correction before committing to an answer.

Training Configuration

base_model: Qwen/Qwen3.5-9B
dataset: 799 curated reasoning rows
epochs: 1
learning_rate: 1e-4
lr_scheduler: cosine
warmup_ratio: 0.1
max_seq_length: 8192
lora_rank: 32
lora_alpha: 32
dropout: 0.05
micro_batch_size: 1
gradient_accumulation_steps: 4
weight_decay: 0.01

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("DJLougen/Harmonic-9B")
tokenizer = AutoTokenizer.from_pretrained("DJLougen/Harmonic-9B")

Reasoning format

The model uses <think> blocks for reasoning:

<think>
The user is asking about X. Let me consider two approaches...

Approach 1: ...
Approach 2: ...

I'll go with Approach 1 because...

Wait, I need to be careful here - this assumes Y, which may not hold.
Let me verify by checking a special case...

Yes, that confirms the result.
</think>

[Final answer here]

Intended Use

  • Reasoning tasks requiring genuine multi-step thinking
  • Mathematical problem-solving with self-correction
  • Code analysis and generation with structured verification
  • General conversation (conversational ability preserved through training design)
  • Base model for Stage 2 agentic fine-tuning

Limitations

  • 9B parameter model - not suitable for tasks requiring extensive world knowledge
  • Reasoning traces can be verbose for simple questions
  • Not optimized for tool calling - see Harmonic-Hermes-9B (coming soon) for agentic use
  • Benchmark evaluation is ongoing

Architecture

  • Base: Qwen 3.5 9B (9.65B parameters)
  • Training: LoRA fine-tuning, merged into base weights
  • Precision: BF16
  • Context: 8192 tokens

License

Apache 2.0 - same as the base model. All training data is from Apache 2.0 or MIT licensed sources. Fully commercial use permitted.

Links

Downloads last month
1,109
Safetensors
Model size
10B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Harmonic-9B

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(160)
this model
Finetunes
1 model
Quantizations
3 models