From Frames to Clips: Efficient Key Clip Selection for Long-Form Video Understanding Paper β’ 2510.02262 β’ Published Oct 2, 2025 β’ 3
view article Article How to generate text: using different decoding methods for language generation with Transformers patrickvonplaten β’ Mar 1, 2020 β’ 298
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 natolambert, LouisCastricato, lvwerra, Dahoas β’ Dec 9, 2022 β’ 414
Ring Attention with Blockwise Transformers for Near-Infinite Context Paper β’ 2310.01889 β’ Published Oct 3, 2023 β’ 13
view article Article Breaking resolution curse of vision-language models visheratin β’ Feb 24, 2024 β’ 22
view article Article Design choices for Vision Language Models in 2024 gigant β’ Apr 16, 2024 β’ 34
view article Article PaliGemma β Google's Cutting-Edge Open Vision Language Model +1 merve, andsteing, pcuenq β’ May 14, 2024 β’ 287
view article Article ColPali: Efficient Document Retrieval with Vision Language Models π manu β’ Jul 5, 2024 β’ 317