lmms-lab/LLaVA-OneVision-1.5-4B-Instruct
Image-Text-to-Text • 5B • Updated • 4.76k • 18
Feeling and building the multimodal intelligence.
A Simple Baseline for Streaming Video Understanding
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence