Post
Inspired by the Nemotron Diffusion recipe, check out dhara-250m: a 250M experimental language model that supports three decoding modes from one set of weights: autoregressive, block-diffusion, and self-speculation.
It is small, easy to try, and meant for exploring diffusion-style decoding and latency tradeoffs in compact LMs.
Model: codelion/dhara-250m
Try the chat demo here: codelion/dhara-chat
It is small, easy to try, and meant for exploring diffusion-style decoding and latency tradeoffs in compact LMs.
Model: codelion/dhara-250m
Try the chat demo here: codelion/dhara-chat