view article Article Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL +5 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra • 5 days ago • 32
Communication Efficient LLM Pre-training with SparseLoCo Paper • 2508.15706 • Published Aug 21, 2025 • 3
Running on CPU Upgrade Featured 3.2k The Smol Training Playbook 📚 3.2k The secrets to building world-class LLMs