Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published about 1 month ago • 57 • 9
CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 25 days ago • 51
OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning Paper • 2606.08572 • Published 25 days ago • 14
CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 25 days ago • 51 • 3
CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 25 days ago • 51
CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 25 days ago • 51
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills Paper • 2606.07412 • Published 27 days ago • 12
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published about 1 month ago • 57 • 9