JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 7 days ago • 168
EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models Paper • 2603.12252 • Published Mar 12 • 12
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published Feb 13 • 83
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published Dec 4, 2025 • 50
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Paper • 2512.03036 • Published Dec 2, 2025 • 22
Think Visually, Reason Textually: Vision-Language Synergy in ARC Paper • 2511.15703 • Published Nov 19, 2025 • 9
Think Visually, Reason Textually: Vision-Language Synergy in ARC Paper • 2511.15703 • Published Nov 19, 2025 • 9 • 2
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Paper • 2510.27606 • Published Oct 31, 2025 • 31
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence Paper • 2510.24693 • Published Oct 28, 2025 • 19