Arbitrage: Efficient Reasoning via Advantage-Aware Speculation Paper • 2512.05033 • Published Dec 4, 2025 • 17
Let's (not) just put things in Context: Test-Time Training for Long-Context LLMs Paper • 2512.13898 • Published Dec 15, 2025 • 2
$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners Paper • 2603.04304 • Published Mar 4 • 14
Reward Under Attack: Analyzing the Robustness and Hackability of Process Reward Models Paper • 2603.06621 • Published Feb 20
Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution Paper • 2604.07725 • Published Apr 10
Learning, Fast and Slow: Towards LLMs That Adapt Continually Paper • 2605.12484 • Published 3 days ago • 12
Learning, Fast and Slow: Towards LLMs That Adapt Continually Paper • 2605.12484 • Published 3 days ago • 12
The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published Oct 15, 2025 • 33
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization Paper • 2508.10395 • Published Aug 14, 2025 • 42
Overcoming Simplicity Bias in Deep Networks using a Feature Sieve Paper • 2301.13293 • Published Jan 30, 2023
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache Paper • 2502.10424 • Published Feb 5, 2025 • 1