Brick-Composer: Using MLLMs for Assembly with Diverse Bricks Paper • 2606.05445 • Published 3 days ago • 7
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 2 days ago • 34
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 109
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published May 6 • 22
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Paper • 2509.19736 • Published Sep 24, 2025 • 12
UserBench: An Interactive Gym Environment for User-Centric Agents Paper • 2507.22034 • Published Jul 29, 2025 • 30