OPDLM-8B

OPDLM-8B is a block diffusion language model (DLM) obtained by post-training an autoregressive language model (ARLM) into a diffusion language model via on-policy distillation.

Highlights

  • Converted, not pretrained from scratch: built from a strong ARLM, reusing its prior.
  • Training-efficient: ~0.066B tokens of conversion vs. ~50B tokens for from-scratch DLM training (same base ARLM).
  • Inference-efficient: parallel token decoding via block diffusion.

Model Details

  • Developed by: DIVE Lab, Texas A&M University
  • Base model: Qwen3-8B
  • Model type: Block diffusion language model (decoder-based)
  • Block size: 4
  • Parameters: ~8B
  • Language: English
  • License: MIT

Training

  • Method: On-policy distillation from a frozen ARLM teacher into a block DLM student.
  • Conversion budget: ~0.066B tokens
  • Data: opdlm_train_data

Evaluation

Benchmark Score
MMLU 70.9
MMLU-Pro 53.7
GPQA-Diamond 36.1
IFEval 50.1
GSM8K 87.1
MATH500 71.2
AIME-24 14.7
AIME-25 12.4
HumanEval 59.8
MBPP 48.7

Decoding: static (one token per step)

Citation

[FILL: BibTeX once the paper/arXiv is up]
Downloads last month
10
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for divelab/OPDLM-8B

Finetuned
Qwen/Qwen3-8B
Finetuned
(1651)
this model

Dataset used to train divelab/OPDLM-8B

Collection including divelab/OPDLM-8B