huggingface-projects (Huggingface Projects)

sergiopaniego

updated a dataset 1 day ago

huggingface-projects/Deep-RL-Course-Certification

Viewer • Updated 1 day ago • 1.52k • 1.31k • 15

AdinaY

posted an update 3 days ago

Post

1554

Big respect to the Qwen team! They just dropped another model🔥

Qwen3-235B-A22B-Thinking-2507 🧠 new reasoning model by Qwen

Qwen/Qwen3-235B-A22B-Thinking-2507

✨ 235B total / 22B active (8 experts)
✨ 256K context window
✨ Agent-ready with tool use & <think> reasoning mode

Hope the team gets some well-deserved rest this weekend after all the massive releases 🙌

AdinaY

posted an update 3 days ago

Post

223

Ming-lite-omni v1.5 🔥 upgrade version of Ming-lite-omni, by AntGroup.

inclusionAI/Ming-Lite-Omni-1.5

✨ 20.3B / 3B active - MoE
✨ SOTA video understanding via 3D MRoPE + curriculum learning
✨ Real time speech synthesis + dialect support
✨ Enhanced multimodal generation with ID & scene consistency

AdinaY

posted an update 3 days ago

Post

1457

Qwen is on fire this week 🔥
They just released Qwen3-MT 🌍 a translation model supports 92 languages.

Demo is available on the hub.
Qwen/Qwen3-MT-Demo

✨ Highly Customizable: Supports custom terms, domain prompts, and translation memory for accurate, context-aware results.
✨ Fast and affordable: $0.5 per million tokens.

Xenova

posted an update 3 days ago

Post

1717

Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯
🗣️ Transcribe videos, meeting notes, songs and more
🔐 Runs on-device, meaning no data is sent to a server
🌎 Multilingual (8 languages)
🤗 Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥

Try it out yourself! 👇
webml-community/Voxtral-WebGPU

sayakpaul

posted an update 3 days ago

Post

672

Fast LoRA inference for Flux with Diffusers and PEFT 🚨

There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.

In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of:

1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯

We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗

Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs.

Learn the details and the full code here:
https://huggingface.co/blog/lora-fast

sergiopaniego

posted an update 4 days ago

Post

1060

Yet Another New Multimodal Fine-Tuning Recipe 🥧

🧑‍🍳 In this @HuggingFace Face Cookbook notebook, we demonstrate how to align a multimodal model (VLM) using Mixed Preference Optimization (MPO) using trl.

💡 This recipe is powered by the new MPO support in trl, enabled through a recent upgrade to the DPO trainer!

We align the multimodal model using multiple optimization objectives (losses), guided by a preference dataset (chosen vs. rejected multimodal pairs).

Check it out! ➡️ https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo

1 reply

·

andito

posted an update 4 days ago

Post

2575

Many VLMs claim to process hours of video. But can they follow the story?🤔
Today, we introduce TimeScope: The benchmark that separates true temporal understanding from marketing hype. Let's see how much VLMs really understand!⏳

We test three skills that matter for real-world use:
🔎 Localized Retrieval: Find a specific action.
🧩 Information Synthesis: Piece together scattered clues.
🏃 Fine-Grained Perception: Analyze detailed motion (e.g., count how many times a person swings an axe).

The results are in, and they're revealing. Only Gemini 2.5 pro handles 1-hour-long videos.
Performance drops sharply with duration, proving that long video understanding is still challenging. We've found the breaking points—now the community can start fixing them.📈

Want to learn more? TimeScope is 100% open-source. Benchmark your model and help us build the next generation of video AI.

📖 Blog:
https://huggingface.co/blog/timescope-video-lmm-benchmark
👩‍💻 Leaderboard & Demo: Apollo-LMMs/TimeScope
📊 Dataset: Apollo-LMMs/TimeScope
⚙️ Eval Code: https://github.com/EvolvingLMMs-Lab/lmms-eval

merve

posted an update 4 days ago

Post

630

so many open LLMs and image LoRAs dropped past week, here's some picks for you 🫡 merve/releases-july-18-687e3fbd2ab9b39c51f9238b

LLMs
> ByteDance released a bunch of translation models called Seed-X-RM (7B) ByteDance-Seed/Seed-X-RM-7B
> NVIDIA released reasoning models of which 32B surpassing the giant Qwen3-235B with cc-by-4.0 license 👏 nvidia/openreasoning-nemotron-687730dae0170059860f1f01
> LG released a new EXAONE model (32B) LGAI-EXAONE/EXAONE-4.0-32B

VLMs/any-to-any
> vidore/colqwen-omni-v0.1 is a new any-to-any retriever (MIT)
> HiDream-ai/HiDream-E1-1 is image+text in image+text out model (MIT)

LoRAs
> There's a bunch of LoRAs based on Flux Kontext, gotta check out the collection 🤠

AdinaY

posted an update 5 days ago

Post

3322

Qwen3-Coder 💻 agentic code model by Alibaba Qwen team🚀

Qwen/Qwen3-Coder-480B-A35B-Instruct

✨ 480B total, 35B activated MoE
✨ Agentic Coding + Browser Use → Top code model performance
✨ 256K context (up to 1M via Yarn) for repo-scale understanding

2 replies

·

AdinaY

posted an update 6 days ago

Post

2640

KAT-V1 🔥 a LLM that tackles overthinking by switching between reasoning and direct answers, by Kuaishou.

Kwaipilot/KAT-V1-40B

✨ 40B
✨ Step-SRPO: smarter reasoning control via RL
✨ MTP + Distillation: efficient training, lower cost

hysts

updated a Space 6 days ago

390

Llama-Vision-11B

🚀

Ask questions about images and get detailed answers

AdinaY

posted an update 6 days ago

Post

467

From paper to project page in one click🚀

AnyCoder 🔥 turns research PDFs into structured, shareable project pages in seconds!

akhaliq/anycoder

Powered by 8 SoTA open models on the hub!

merve

posted an update 6 days ago

Post

2666

Now it's possible to do RAG with any-to-any models 🔥

Learn how to search in a video dataset and generate using Tevatron/OmniEmbed-v0.1-multivent an all modality retriever, and Qwen/Qwen2.5-Omni-7B, any-to-any model in this notebook 🤝 merve/smol-vision

sergiopaniego

posted an update 9 days ago

Post

1612

🧑‍🍳 New Multimodal Fine-Tuning Recipe 🧑‍🍳

⚡️ In this new @huggingface Cookbook recipe, I walk you though the process of fine tuning a Visual Language Model (VLM) for Object Detection with Visual Grounding, using TRL.

🔍 Object detection typically involves detecting categories in images (e.g., vase).

By combining it with visual grounding, we add contextual understanding so instead of detecting just "vase", we can detect "middle vase" in an image.

VLMs are super powerful!

In this case, I use PaliGemma 2 which already supports object detection and extend it to also add visual grounding.

🤗 Check it out here: https://huggingface.co/learn/cookbook/fine_tuning_vlm_object_detection_grounding

AdinaY

posted an update 10 days ago

Post

1496

M2-Reasoning🔥 a unified multimodal model for general (math, logic) and spatial (motion, physics, orientation) reasoning, released by AntGroup.

Model:
inclusionAI/M2-Reasoning
Paper:
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning (2507.08306)

✨ 7B with MIT license
✨ 294K high quality samples via novel data pipeline
✨ Dynamic multi-task training to resolve task conflicts

giadap

posted an update 10 days ago

Post

1202

🤖 Technology means power, and whoever owns the technology owns the power.

Thrilled to share insights from my recent interview with MIT Technology Review about the growing movement toward local LLMs and what it means for AI democratization. Read here: https://www.technologyreview.com/2025/07/17/1120391/how-to-run-an-llm-on-your-laptop/

🤔 Why this matters: When we use "free" online AI services, we're often the product. Our conversations become training data, our personal stories get "cooked into" models, and our privacy becomes a commodity. But there's an alternative path forward.

💡 The power shift is real: Local LLMs aren't just about privacy; they're about redistributing AI power away from a handful of tech giants. When individuals, organizations, and even entire nations can run their own models, we're democratizing access to AI capabilities.

🤗 At Hugging Face, we're proud to be at the center of this transformation. Our platform hosts the world's largest library of freely downloadable models, making cutting-edge AI accessible to everyone -- from researchers and developers to curious individuals who want to experiment on their laptops or even smartphones.

The technical barriers that once required $$$ server racks are crumbling. Today, anyone with basic computer skills can download a model, run it locally, and maintain complete control over their AI interactions. No sudden algorithm changes, no data harvesting, no corporate gatekeeping.

This is about technical convenience, but especially about technological sovereignty. When AI power is concentrated in a few hands, we risk creating new forms of digital dependency. Local models offer a path toward genuine AI literacy and independence.

🚀 The future of AI should be open, accessible, and in the hands of the many, not the few. What are your thoughts on AI democratization? Have you experimented with local models yet?

AdinaY

posted an update 10 days ago

Post

1354

Seed-X 🔥 a suite of multilingual translation models released by ByteDance.

ByteDance-Seed/seed-x-6878753f2858bc17afa78543

✨ instruction/reinforcement learning/reward model
✨ Supports 28 languages, bidirectional translation
✨ Optimized for deployment & inference: 7B with mistral architecture
✨ Excels across domains: science, law, finance, literature & more

sergiopaniego

posted an update 10 days ago

Post

1586

Multiple NEW notebooks and scripts added to the Hugging Face Gemma recipes repo!

Thanks to the community 🫶, we're adding more and more recipes using Gemma 💎

Fine tuning for all modalities, function calling, RAG...

Repo: https://github.com/huggingface/huggingface-gemma-recipes

We're also open to new ideas from the community 🤗!

1 reply

·

merve

posted an update 10 days ago

Post

1998

all modality RAG 🔥

ColQwen-Omni is a new multimodal retrieval model that can retrieve anything (videos, audios, documents and more!)

use with transformers 🤗
read the blog https://huggingface.co/blog/manu/colqwen-omni-omnimodal-retrieval
model repository vidore/colqwen-omni-v0.1

Huggingface Projects

AI & ML interests

Recent Activity

huggingface-projects/Deep-RL-Course-Certification

Llama-Vision-11B

AI & ML interests

Recent Activity

Team members 24

huggingface-projects's activity

Llama-Vision-11B