view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 338
view article Article Train AI models with Unsloth and Hugging Face Jobs for FREE +4 burtenshaw, danielhanchen, shimmyshimmer, mlabonne, davanstrien, evalstate • Feb 20 • 103
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 ggerganov, ngxson, allozaur, lysandre, victor, julien-c • Feb 20 • 507
HuggingFaceTB/SmolLM2-135M-Instruct Text Generation • 0.1B • Updated Sep 22, 2025 • 1.3M • 327
view article Article We Got Claude to Build CUDA Kernels and teach open models! +2 burtenshaw, evalstate, merve, pcuenq • Jan 28 • 156
view article Article Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model nvidia • Feb 4 • 28
view article Article The Optimal Architecture for Small Language Models codelion • Dec 26, 2025 • 121