66 20 25

Miquel Farré

mfarre

AI & ML interests

I like everything video

Recent Activity

new activity about 15 hours ago

tencent/HunyuanWorld-1:demo to test the model

upvoted an article 1 day ago

SmolLM3: smol, multilingual, long-context reasoner

upvoted an article 8 days ago

TimeScope: How Long Can Your Video Large Multimodal Model Go?

View all activity

Organizations

upvoted an article 1 day ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

23 days ago

• 598

upvoted an article 8 days ago

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

and 3 others •

8 days ago

• 30

upvoted a paper 28 days ago

Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens

Paper • 2506.17218 • Published Jun 20 • 27

upvoted a paper about 1 month ago

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Paper • 2502.06145 • Published Feb 10 • 17

upvoted an article 2 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

and 6 others •

May 21

• 197

upvoted an article 3 months ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

May 12

• 491

upvoted an article 4 months ago

Article

Cohere on Hugging Face Inference Providers 🔥

and 6 others •

Apr 16

• 130

upvoted a paper 4 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 196

upvoted an article 5 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

and 6 others •

Feb 20

• 287

upvoted a collection 5 months ago

SmolVLM2 📺 Smallest video LM ever 🤏🏻

Collection

11 items • Updated May 5 • 95

upvoted an article 6 months ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

and 2 others •

Jan 23

• 182

upvoted an article 7 months ago

Article

Announcing NVIDIA Cosmos World Foundation Models

and 1 other •

Jan 7

• 26

upvoted a paper 8 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

upvoted a paper 9 months ago

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published Oct 22, 2024 • 30

upvoted an article 10 months ago

Article

FineVideo: behind the scenes

and 5 others •

Sep 23, 2024

• 34

upvoted 2 articles 11 months ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

and 1 other •

Jul 18, 2024

• 74

Article

Scaling robotics datasets with video encoding

and 2 others •

Aug 27, 2024

• 40

upvoted a paper 11 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 132

Miquel Farré

AI & ML interests

Recent Activity

Organizations

mfarre's activity

SmolLM3: smol, multilingual, long-context reasoner

TimeScope: How Long Can Your Video Large Multimodal Model Go?

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, Faster, Stronger)

Cohere on Hugging Face Inference Providers 🔥

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Announcing NVIDIA Cosmos World Foundation Models

FineVideo: behind the scenes

Docmatix - a huge dataset for Document Visual Question Answering

Scaling robotics datasets with video encoding