view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 toslali-ibm, mirinflim, qgallouedec, esnible, rganti, mudhakar • Jun 3, 2025 • 101
view article Article 🐯 Liger GRPO meets TRL +4 shisahni, kashif, smohammadi, ShirinYamani, m0m0chen, liberty4321 • May 25, 2025 • 53
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial open-r1 • Jan 31, 2025 • 51
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25, 2024 • 45
view article Article Fixing Gradient Accumulation +4 lysandre, ArthurZ, muellerzr, ydshieh, BenjaminB, pcuenq • Oct 16, 2024 • 66
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 49 items • Updated Mar 2 • 141
Korean Reward Modeling Collection Korean Datasets, Reward Models for RLHF • 15 items • Updated Mar 2 • 3
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26, 2024 • 48
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Paper • 2406.14563 • Published Jun 20, 2024 • 30
Function Calling v3 Collection Models fine-tuned for function-calling • 12 items • Updated Mar 2 • 21
Miqu-based Models Collection A collection of creative writing models based on the 'miqu-1-70b ' model. • 2 items • Updated Mar 2 • 2
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 Paper • 2405.00664 • Published May 1, 2024 • 20