LlamaTron RS1 MedThinker

fY4n2

A 1B parameter medical reasoning model fine-tuned on 810K chain-of-thought samples,

built to think through clinical cases step by step like a junior doctor.

LlamaTron RS1 MedThinker Interface

Base Model Dataset Method Hardware



What is LlamaTron RS1 MedThinker?

Most medical language models are trained to retrieve answers. LlamaTron RS1 MedThinker was trained to reason through them.

Built on Meta's Llama 3.2 1B Instruct and fine-tuned on 810,374 medical chain-of-thought samples, MedThinker doesn't just return a diagnosis — it works through the differential, explains the pathophysiology behind its conclusion, gives dose-specific treatment steps, and tells you exactly when to escalate care.

The reasoning pattern mirrors that of a junior physician presenting a case: structured, explainable, and clinically grounded.


Demo

Model Output — Step-by-Step Clinical Reasoning

LlamaTron RS1 MedThinker Output


Output Format

Every response follows a strict four-part clinical structure:

DIAGNOSIS          Primary diagnosis with key differentials
REASONING          Clinical logic, pathophysiology, and decision trail
IMMEDIATE TREATMENT  Prioritised, dose-specific intervention steps
RED FLAGS          Escalation criteria and emergency triggers

Example — Input:

3yo boy, barking cough, stridor, worse at night. Diagnosis and treatment?

Example — Output:

DIAGNOSIS: Viral croup (laryngotracheobronchitis).
Differentials: epiglottitis, foreign body aspiration, bacterial tracheitis.

REASONING: The classic triad of barking cough, inspiratory stridor, and nocturnal
worsening in a toddler is pathognomonic for croup. Caused by parainfluenza virus
producing subglottic edema. Nighttime worsening occurs due to supine positioning
increasing mucosal congestion. Age 6 months to 3 years is the peak incidence window.
Epiglottitis is deprioritised given the absence of drooling, tripod posturing,
or toxic appearance.

IMMEDIATE TREATMENT:
- Dexamethasone 0.6mg/kg PO/IM single dose (reduces subglottic edema within 2-6h)
- Nebulised epinephrine 5ml of 1:1000 if moderate-severe stridor present at rest
- Humidified cool air — supportive, reduces mucosal congestion
- Keep child calm — agitation significantly worsens stridor
- Oxygen via mask if SpO2 drops below 92%

RED FLAGS: Call emergency services immediately if stridor is present at rest and
not improving, child is drooling or cannot swallow, cyanosis appears, or the child
becomes exhausted. These indicate impending airway obstruction requiring intubation.

Training Details

Parameter Value
Base Model meta-llama/Llama-3.2-1B-Instruct
Fine-Tune Method LoRA SFT via Unsloth
LoRA Rank 32
LoRA Alpha 64
Target Modules q, k, v, o, gate, up, down projections
Sequence Length 2048
Batch Size 16 per device
Gradient Accumulation 2 steps — effective batch size 32
Learning Rate 2e-4
Optimizer AdamW 8-bit
Precision BF16
Packing Enabled
Hardware NVIDIA RTX A6000 48GB
Framework Unsloth + TRL

Dataset

Trained on OpenMed/Medical-Reasoning-SFT-Trinity-Mini, generated using arcee-ai/Trinity-Mini.

Metric Value
Total Samples 810,374
Total Tokens 1.52 Billion
Reasoning Tokens 977 Million
Content Tokens 542 Million
Language English

Each sample contains two components: the content (the answer) and the reasoning_content (the chain-of-thought trace that produced it). Training on both means the model internalised not just medical knowledge, but the structured thinking process behind clinical decision-making.

Dataset credit: Maziyar P.


Quickstart

Installation

pip install torch transformers accelerate

Inference

import torch
from transformers import AutoTokenizer, LlamaForCausalLM

MODEL_PATH = "Rumiii/LlamaTron-RS1-MedThinker"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = LlamaForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

FEW_SHOT = """CASE: 2yo girl, high fever, tugging right ear, irritable, not sleeping.

DIAGNOSIS: Acute otitis media (AOM). Differentials: otitis externa, teething.

REASONING: Unilateral ear tugging with fever and irritability in a toddler is the
classic AOM presentation. Peak incidence at 6mo-2yr due to horizontal Eustachian
tube anatomy impairing drainage.

IMMEDIATE TREATMENT:
- Amoxicillin 90mg/kg/day divided BID x 10 days
- Ibuprofen/paracetamol for pain and fever
- Re-evaluate in 48-72h if no improvement

RED FLAGS: Refer immediately if mastoid swelling, facial palsy, or no improvement
after 72h of antibiotics."""

def ask(question: str) -> str:
    prompt = (
        f"<|begin_of_text|>"
        f"<|start_header_id|>system<|end_header_id|>\n"
        f"You are LlamaTron RS1 MedThinker, a clinical medical assistant. "
        f"Always use the structured format shown.<|eot_id|>"
        f"<|start_header_id|>user<|end_header_id|>\n"
        f"Answer this case using structured format:\n\n{FEW_SHOT}<|eot_id|>"
        f"<|start_header_id|>assistant<|end_header_id|>\n{FEW_SHOT}<|eot_id|>"
        f"<|start_header_id|>user<|end_header_id|>\nCASE: {question}<|eot_id|>"
        f"<|start_header_id|>assistant<|end_header_id|>\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    input_len = inputs["input_ids"].shape[1]
    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=400,
            temperature=0.35,
            top_p=0.85,
            do_sample=True,
            repetition_penalty=1.2,
            pad_token_id=tokenizer.eos_token_id,
        )
    raw = tokenizer.decode(out[0][input_len:], skip_special_tokens=False)
    for stop in ["<|eot_id|>", "<|end_of_text|>", "<|start_header_id|>"]:
        if stop in raw:
            raw = raw[:raw.index(stop)]
    return raw.strip()

# Run
print(ask("68yo woman, chest pain radiating to left arm, diaphoresis, nausea. BP 90/60, HR 110."))

Important Notes on Inference

This model benefits significantly from few-shot prompting. Because the fine-tuning dataset emphasised reasoning content over instruction-following format, providing a single worked example in the prompt before your real question enforces the structured output reliably. The quickstart code above includes this pattern — do not remove the FEW_SHOT block.

Recommended inference parameters:

Parameter Value Reason
temperature 0.35 Confident without hallucinating
top_p 0.85 Cuts low-probability tokens
repetition_penalty 1.2 Prevents reasoning loops
max_new_tokens 400-512 Sufficient for full structured response

Disclaimer

LlamaTron RS1 MedThinker is intended strictly for research and educational purposes. It is not a substitute for professional medical advice, clinical diagnosis, or treatment decisions. All outputs must be reviewed by a qualified medical professional before any clinical application. The authors accept no liability for decisions made on the basis of this model's outputs.


Citation

@model{llamatron_rs1_medthinker_2026,
  title        = {LlamaTron RS1 MedThinker},
  author       = {Rumiii},
  year         = {2026},
  base_model   = {meta-llama/Llama-3.2-1B-Instruct},
  dataset      = {OpenMed/Medical-Reasoning-SFT-Trinity-Mini},
  method       = {LoRA SFT via Unsloth},
  url          = {https://huggingface.co/Rumiii/LlamaTron-RS1-MedThinker}
}
Downloads last month
210
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rumiii/LlamaTron-RS1-MedThinker

Adapter
(587)
this model

Dataset used to train Rumiii/LlamaTron-RS1-MedThinker