LlamaTron RS1 MedThinker

A 1B parameter medical reasoning model fine-tuned on 810K chain-of-thought samples,

built to think through clinical cases step by step like a junior doctor.

What is LlamaTron RS1 MedThinker?

Most medical language models are trained to retrieve answers. LlamaTron RS1 MedThinker was trained to reason through them.

Built on Meta's Llama 3.2 1B Instruct and fine-tuned on 810,374 medical chain-of-thought samples, MedThinker doesn't just return a diagnosis — it works through the differential, explains the pathophysiology behind its conclusion, gives dose-specific treatment steps, and tells you exactly when to escalate care.

The reasoning pattern mirrors that of a junior physician presenting a case: structured, explainable, and clinically grounded.

Demo

Model Output — Step-by-Step Clinical Reasoning

Output Format

Every response follows a strict four-part clinical structure:

DIAGNOSIS          Primary diagnosis with key differentials
REASONING          Clinical logic, pathophysiology, and decision trail
IMMEDIATE TREATMENT  Prioritised, dose-specific intervention steps
RED FLAGS          Escalation criteria and emergency triggers

Example — Input:

3yo boy, barking cough, stridor, worse at night. Diagnosis and treatment?

Example — Output:

DIAGNOSIS: Viral croup (laryngotracheobronchitis).
Differentials: epiglottitis, foreign body aspiration, bacterial tracheitis.

REASONING: The classic triad of barking cough, inspiratory stridor, and nocturnal
worsening in a toddler is pathognomonic for croup. Caused by parainfluenza virus
producing subglottic edema. Nighttime worsening occurs due to supine positioning
increasing mucosal congestion. Age 6 months to 3 years is the peak incidence window.
Epiglottitis is deprioritised given the absence of drooling, tripod posturing,
or toxic appearance.

IMMEDIATE TREATMENT:
- Dexamethasone 0.6mg/kg PO/IM single dose (reduces subglottic edema within 2-6h)
- Nebulised epinephrine 5ml of 1:1000 if moderate-severe stridor present at rest
- Humidified cool air — supportive, reduces mucosal congestion
- Keep child calm — agitation significantly worsens stridor
- Oxygen via mask if SpO2 drops below 92%

RED FLAGS: Call emergency services immediately if stridor is present at rest and
not improving, child is drooling or cannot swallow, cyanosis appears, or the child
becomes exhausted. These indicate impending airway obstruction requiring intubation.

Training Details

Parameter	Value
Base Model	meta-llama/Llama-3.2-1B-Instruct
Fine-Tune Method	LoRA SFT via Unsloth
LoRA Rank	32
LoRA Alpha	64
Target Modules	q, k, v, o, gate, up, down projections
Sequence Length	2048
Batch Size	16 per device
Gradient Accumulation	2 steps — effective batch size 32
Learning Rate	2e-4
Optimizer	AdamW 8-bit
Precision	BF16
Packing	Enabled
Hardware	NVIDIA RTX A6000 48GB
Framework	Unsloth + TRL

Dataset

Trained on OpenMed/Medical-Reasoning-SFT-Trinity-Mini, generated using arcee-ai/Trinity-Mini.

Metric	Value
Total Samples	810,374
Total Tokens	1.52 Billion
Reasoning Tokens	977 Million
Content Tokens	542 Million
Language	English

Each sample contains two components: the content (the answer) and the reasoning_content (the chain-of-thought trace that produced it). Training on both means the model internalised not just medical knowledge, but the structured thinking process behind clinical decision-making.

Dataset credit: Maziyar P.

Quickstart

Installation

pip install torch transformers accelerate

Inference

import torch
from transformers import AutoTokenizer, LlamaForCausalLM

MODEL_PATH = "Rumiii/LlamaTron-RS1-MedThinker"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = LlamaForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

FEW_SHOT = """CASE: 2yo girl, high fever, tugging right ear, irritable, not sleeping.

DIAGNOSIS: Acute otitis media (AOM). Differentials: otitis externa, teething.

REASONING: Unilateral ear tugging with fever and irritability in a toddler is the
classic AOM presentation. Peak incidence at 6mo-2yr due to horizontal Eustachian
tube anatomy impairing drainage.

IMMEDIATE TREATMENT:
- Amoxicillin 90mg/kg/day divided BID x 10 days
- Ibuprofen/paracetamol for pain and fever
- Re-evaluate in 48-72h if no improvement

RED FLAGS: Refer immediately if mastoid swelling, facial palsy, or no improvement
after 72h of antibiotics."""

def ask(question: str) -> str:
    prompt = (
        f"<|begin_of_text|>"
        f"<|start_header_id|>system<|end_header_id|>\n"
        f"You are LlamaTron RS1 MedThinker, a clinical medical assistant. "
        f"Always use the structured format shown.<|eot_id|>"
        f"<|start_header_id|>user<|end_header_id|>\n"
        f"Answer this case using structured format:\n\n{FEW_SHOT}<|eot_id|>"
        f"<|start_header_id|>assistant<|end_header_id|>\n{FEW_SHOT}<|eot_id|>"
        f"<|start_header_id|>user<|end_header_id|>\nCASE: {question}<|eot_id|>"
        f"<|start_header_id|>assistant<|end_header_id|>\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    input_len = inputs["input_ids"].shape[1]
    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=400,
            temperature=0.35,
            top_p=0.85,
            do_sample=True,
            repetition_penalty=1.2,
            pad_token_id=tokenizer.eos_token_id,
        )
    raw = tokenizer.decode(out[0][input_len:], skip_special_tokens=False)
    for stop in ["<|eot_id|>", "<|end_of_text|>", "<|start_header_id|>"]:
        if stop in raw:
            raw = raw[:raw.index(stop)]
    return raw.strip()

# Run
print(ask("68yo woman, chest pain radiating to left arm, diaphoresis, nausea. BP 90/60, HR 110."))

Important Notes on Inference

This model benefits significantly from few-shot prompting. Because the fine-tuning dataset emphasised reasoning content over instruction-following format, providing a single worked example in the prompt before your real question enforces the structured output reliably. The quickstart code above includes this pattern — do not remove the FEW_SHOT block.

Recommended inference parameters:

Parameter	Value	Reason
temperature	0.35	Confident without hallucinating
top_p	0.85	Cuts low-probability tokens
repetition_penalty	1.2	Prevents reasoning loops
max_new_tokens	400-512	Sufficient for full structured response

Disclaimer

LlamaTron RS1 MedThinker is intended strictly for research and educational purposes. It is not a substitute for professional medical advice, clinical diagnosis, or treatment decisions. All outputs must be reviewed by a qualified medical professional before any clinical application. The authors accept no liability for decisions made on the basis of this model's outputs.

Citation

@model{llamatron_rs1_medthinker_2026,
  title        = {LlamaTron RS1 MedThinker},
  author       = {Rumiii},
  year         = {2026},
  base_model   = {meta-llama/Llama-3.2-1B-Instruct},
  dataset      = {OpenMed/Medical-Reasoning-SFT-Trinity-Mini},
  method       = {LoRA SFT via Unsloth},
  url          = {https://huggingface.co/Rumiii/LlamaTron-RS1-MedThinker}
}

Downloads last month: 210

Safetensors

Model size

1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rumiii/LlamaTron-RS1-MedThinker

Base model

meta-llama/Llama-3.2-1B-Instruct

Adapter

(587)

this model

Rumiii
/

LlamaTron-RS1-MedThinker