LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 4 days ago • 115
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? Paper • 2605.08985 • Published 21 days ago • 22
GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF Image-Text-to-Text • 35B • Updated 15 days ago • 2.42k • 3
WebWorld: A Large-Scale World Model for Web Agent Training Paper • 2602.14721 • Published Feb 16 • 19