deepseek-ai/DeepSeek-V4-Flash Text Generation β’ 158B β’ Updated 15 days ago β’ 2.29M β’ β’ 1.17k
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap β’ Feb 26 β’ 160
view article Article **NVIDIA Earth-2 Open Models Span the Whole Weather Stack** nvidia β’ Jan 26 β’ 36
Running 105 Unlocking On-Policy Distillation for Any Model Family π 105 Visualize on-policy distillation for any model family
view article Article Diffusion Language Models: The New Paradigm ProCreations β’ Jun 10, 2025 β’ 48
Running on CPU Upgrade Featured 3.18k The Smol Training Playbook π 3.18k The secrets to building world-class LLMs