DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 10 days ago • 204
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published Dec 29, 2025 • 99
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published Dec 22, 2025 • 66
The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason Paper • 2505.22653 • Published May 28, 2025 • 43