Collections of multimodal (image+text) instruction finetuning datasets tailored for visual language models like LlaVA, Fuyu, or IDEFICS.
Victor Sanh PRO
VictorSanh
AI & ML interests
None yet
Recent Activity
upvoted
a
collection
13 days ago
Qwen3-VL
liked
a model
27 days ago
Qwen/Qwen2-VL-7B-Instruct
liked
a model
about 1 month ago
Qwen/Qwen3-VL-30B-A3B-Instruct