Instructions to use diffutron/DiffutronLM-0.3B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use diffutron/DiffutronLM-0.3B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="diffutron/DiffutronLM-0.3B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("diffutron/DiffutronLM-0.3B-Instruct")
model = AutoModelForMaskedLM.from_pretrained("diffutron/DiffutronLM-0.3B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use diffutron/DiffutronLM-0.3B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "diffutron/DiffutronLM-0.3B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "diffutron/DiffutronLM-0.3B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/diffutron/DiffutronLM-0.3B-Instruct

SGLang

How to use diffutron/DiffutronLM-0.3B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "diffutron/DiffutronLM-0.3B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "diffutron/DiffutronLM-0.3B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "diffutron/DiffutronLM-0.3B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "diffutron/DiffutronLM-0.3B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use diffutron/DiffutronLM-0.3B-Instruct with Docker Model Runner:
```
docker model run hf.co/diffutron/DiffutronLM-0.3B-Instruct
```

DiffutronLM-0.3B-Instruct / README.md

suayptalha

Update README.md

204e81a verified about 2 months ago

preview code

raw

history blame contribute delete

5.86 kB

	---
	library_name: transformers
	tags:
	- mdlm
	- diffusion
	license: apache-2.0
	datasets:
	- turkish-nlp-suite/InstrucTurca
	language:
	- tr
	base_model:
	- diffutron/DiffutronLM-0.3B-Alpaca
	pipeline_tag: text-generation
	---

	# DiffutronLM-0.3B-Instruct

	Diffutron is a parameter-efficient, Masked Diffusion Language Model (MDLM) specifically designed for the Turkish language. Unlike standard autoregressive models that generate text one token at a time, Diffutron generates text by iteratively refining sequences in parallel, allowing for simultaneous consideration of the entire sentence context.

	Despite its compact size of 307 million parameters, `DiffutronLM-0.3B-Instruct` achieves highly competitive performance against much larger, multi-billion-parameter autoregressive baselines on Turkish NLP benchmarks.

	## 📌 Model Details

	* Model Type: Masked Diffusion Language Model (MDLM)
	* Base Architecture: `jhu-clsp/mmBERT-base` (Multilingual Encoder)
	* Language: Turkish
	* Parameter Count: 307M (0.3B)
	* Context Length: 256 tokens (Instruct), 512 tokens (Base)
	* Training Libraries: `dllm`, PyTorch

	## 🚀 Architecture & Approach

	Diffutron departs from traditional next-token prediction. It treats text generation as a discrete diffusion process:
	1. Forward Process: Clean text is gradually corrupted into a sequence of `<mask>` tokens.
	2. Reverse Process: The model learns to denoise the sequence globally, attending to visible context bi-directionally to predict the original tokens.

	This non-autoregressive paradigm compresses linguistic knowledge efficiently, allowing this 0.3B model to punch significantly above its weight class.

	## 📚 Training Pipeline

	The model was developed through a resource-efficient, multi-stage training pipeline:

	### 1. Continual Pre-training (CPT)
	To adapt the multilingual backbone to Turkish without catastrophic forgetting, we employed a high-rank LoRA strategy (r=256, α=256) targeting all linear modules (Attention and MLP).
	* Data: ~2 million sequences sourced from Havadis (news), Temiz-OSCAR (web), and Turkish Wikipedia.
	* Result: Perplexity on the Bilkent Turkish Writings Dataset dropped significantly from 3.42 (base) to 2.75.

	### 2. Progressive Instruction-Tuning (SFT)
	To unlock generative instruction-following capabilities, we utilized a two-stage supervised fine-tuning approach:
	* Stage 1 (General Adaptation): Trained on `metunlp/LlamaTurk-Instruction-Set` for 20 epochs to establish fundamental instruction-following behaviors.
	* Stage 2 (Complex Specialization): Trained on the nuanced `turkish-nlp-suite/InstrucTurca` dataset for 8 epochs with an increased batch size, enhancing the model's ability to handle intricate, domain-specific Turkish commands.

	## 📊 Evaluation Results

	The model was evaluated on a representative subset of the CETVEL Benchmark Suite. DiffutronLM-0.3B (2nd Stage) demonstrates remarkable parameter efficiency, outperforming models up to 7x its size (e.g., Kumru-2B and TURNA-1.1B) on average scores.

	\| Benchmark \| Diffutron-1st-Stage (0.3B) \| Diffutron-2nd-Stage (0.3B) \| TURNA (1.1B) \| Kumru (2B) \| Kanarya (2B) \| Llama-3.2 (3B) \| Trendyol (7B) \| Aya-101 (13B) \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| Belebele_TR \| 22.22 \| 27.00 \| 22.56 \| 29.00 \| 28.11 \| 55.78 \| 36.22 \| 22.89 \|
	\| EXAMS_TR \| 25.95 \| 27.74 \| 23.66 \| 30.03 \| 30.03 \| 26.21 \| 28.50 \| 22.90 \|
	\| IronyTR \| 50.67 \| 52.00 \| 48.33 \| 51.00 \| 50.00 \| 50.17 \| 50.00 \| 52.17 \|
	\| News_Cat \| 23.20 \| 32.40 \| 32.80 \| 26.40 \| 66.80 \| 64.00 \| 81.20 \| 20.00 \|
	\| MNLI_TR \| 33.29 \| 32.81 \| 34.94 \| 36.42 \| 33.40 \| 34.76 \| 35.19 \| 27.90 \|
	\| STS_TR \| 17.77 \| 18.78 \| 14.21 \| 11.75 \| 12.91 \| 12.91 \| 15.52 \| 16.97 \|
	\| XCOPA_TR \| 53.80 \| 52.00 \| 55.80 \| 54.00 \| 64.20 \| 54.60 \| 61.00 \| 59.60 \|
	\| Average \| 32.41 \| 34.68 \| 33.19 \| 34.09 \| 40.78 \| 42.63 \| 43.95 \| 31.78 \|

	## 💻 Usage

	Because Diffutron is a Masked Diffusion Language Model, it requires inference strategies distinct from standard causal generation. We recommend using the `dllm` library or custom generation loops tailored for discrete diffusion.

	### 1. Install the dllm Library:
	```bash
	git clone https://github.com/Diffutron/dllm.git
	cd dllm
	pip install -e .
	```
	### 2. Chat via Interaction Mode:

	```bash
	python -u examples/bert/chat.py \
	--model_name_or_path "diffutron/DiffutronLM-0.3B-Instruct" \
	--chat True \
	--steps 64 \
	--max_new_tokens 64 \
	--temperature 0.1 \
	--block_length 32 \
	--repetition_penalty 1.2 \
	--remasking "low_confidence" \
	--stochastic_transfer False \
	--cfg_scale 0.0
	```

	For other inference modes, see [dllm](https://github.com/Diffutron/dllm) library.

	### Generation Parameters Used in Paper:
	* Longer Context: Steps: 128, Temp: 0.1, Block Length: 32, Repetition Penalty: 1.2
	* Shorter Context: Steps: 64, Remask: `low_conf`, Stochastic: `False`, CFG: 0.0

	## ⚠️ Limitations

	* Multilingual Backbone: Built upon a multilingual encoder rather than a native Turkish foundation model.
	* Context Window: Restricted to a 256-token context window for generation, limiting its use in long-form summarization or document-level generation.
	* Data Nuances: Instruction datasets rely heavily on translations or synthetic data, which may occasionally miss subtle cultural contexts.

	## 📝 Citation

	If you use Diffutron in your research, please cite our preprint:

	```bibtex
	@misc{diffutron2026,
	title={Diffutron: A Masked Diffusion Language Model for Turkish Language},
	author={Şuayp Talha Kocabay and Talha Rüzgar Akkuş},
	year={2026},
	eprint={2603.20466},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2603.20466},
	}
	```