view article Article CinePile 2.0 - making stronger datasets with adversarial refinement By mfarre and 3 others β’ Oct 23, 2024 β’ 18
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? By orrzohar and 3 others β’ 9 days ago β’ 30
view article Article PaliGemma β Google's Cutting-Edge Open Vision Language Model By merve and 2 others β’ May 14, 2024 β’ 263
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann β’ 8 items β’ Updated Jun 13 β’ 152
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper β’ 2506.03147 β’ Published Jun 3 β’ 58
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others β’ May 12 β’ 491
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence β’ 15 items β’ Updated May 5 β’ 55
view article Article SigLIP 2: A better multilingual vision language encoder By ariG23498 and 2 others β’ Feb 21 β’ 174
view article Article FastRTC: The Real-Time Communication Library for Python By freddyaboulton and 1 other β’ Feb 25 β’ 171
view article Article Open-source DeepResearch β Freeing our search agents By m-ric and 4 others β’ Feb 4 β’ 1.28k
view article Article π Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker! By ariG23498 β’ Jan 29 β’ 19
view article Article Welcome to Inference Providers on the Hub π₯ By julien-c and 6 others β’ Jan 28 β’ 485
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 160
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... By srinivasbilla β’ Jan 20 β’ 70
view article Article Train 400x faster Static Embedding Models with Sentence Transformers By tomaarsen β’ Jan 15 β’ 199