view article Article Compressing Time: A Comparative Study of Video VAEs in Diffusers Bekhouche β’ 1 day ago β’ 2
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images Paper β’ 2412.08802 β’ Published Dec 11, 2024 β’ 7
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann β’ 8 items β’ Updated Jun 13, 2025 β’ 219
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper β’ 2603.17187 β’ Published Mar 17 β’ 140
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20, 2025 β’ 165
view article Article SigLIP 2: A better multilingual vision language encoder +1 ariG23498, merve, qubvel-hf β’ Feb 21, 2025 β’ 215
Real-time Vision Models Collection A collection of real-time detectors. β’ 20 items β’ Updated Feb 18 β’ 23
view post Post 2673 We have published an excellent paper for Arabic CLIP model.Paper link:https://aclanthology.org/2024.arabicnlp-1.9/More information in this website:https://arabic-clip.github.io/Arabic-CLIP/All datasets, models, and demo are published to Huggingface: Arabic-Clip The codes are published to github:https://github.com/Arabic-Clip/Arabic-CLIP β€οΈ 7 7 π 2 2 π 2 2 π₯ 1 1 + Reply
3D Gaussian Splatting for Real-Time Radiance Field Rendering Paper β’ 2308.04079 β’ Published Aug 8, 2023 β’ 203