Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 122
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated Apr 30 • 76
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • 21 days ago • 615
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent Paper • 2506.17612 • Published Jun 21 • 61
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Paper • 2506.19851 • Published Jun 24 • 58
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Paper • 2506.18095 • Published Jun 22 • 65
Light of Normals: Unified Feature Representation for Universal Photometric Stereo Paper • 2506.18882 • Published Jun 23 • 88
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory Paper • 2507.01945 • Published 27 days ago • 74
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • May 21 • 196
view article Article (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware By derekl35 and 4 others • Jun 19 • 79
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development Paper • 2506.05010 • Published Jun 5 • 74
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published Jun 11 • 52
Sherlock: Self-Correcting Reasoning in Vision-Language Models Paper • 2505.22651 • Published May 28 • 51
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29 • 68
PixelHacker: Image Inpainting with Structural and Semantic Consistency Paper • 2504.20438 • Published Apr 29 • 44
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios Paper • 2505.03730 • Published May 6 • 28