Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 6 days ago • 128
Qwen/Qwen3-VL-235B-A22B-Thinking Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 8.76k • • 396
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering Paper • 2503.16867 • Published Mar 21, 2025 • 12