AI & ML interests
None defined yet.
Recent Activity
View all activity
ankits0052Β
authored a
paper 4 months ago
Post
4840
π What happened in AI in 2025? π
We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!
Play with it here:
2025-ai-timeline/2025-ai-timeline
Here's my personal quarterly TL;DR:
1οΈβ£ Q1 β Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.
Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)
2οΈβ£ Q2 β Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.
Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4
3οΈβ£ Q3 β "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.
Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5
4οΈβ£ Q4 β Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!
Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 π€―
Credits
π NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline
π«‘ @reach-vb for the original idea, design and recipe
π @ariG23498 and yours truly for compiling and verifying the 2025 edition
π₯³ Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! π₯
We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!
Play with it here:
2025-ai-timeline/2025-ai-timeline
Here's my personal quarterly TL;DR:
1οΈβ£ Q1 β Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.
Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)
2οΈβ£ Q2 β Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.
Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4
3οΈβ£ Q3 β "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.
Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5
4οΈβ£ Q4 β Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!
Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 π€―
Credits
π NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline
π«‘ @reach-vb for the original idea, design and recipe
π @ariG23498 and yours truly for compiling and verifying the 2025 edition
π₯³ Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! π₯
eliebakΒ
submitted a
paper to Daily Papers 5 months ago
alozowskiΒ
authored a
paper 5 months ago
Post
12236
deepseek-ai/DeepSeek-OCR is out! π₯ my take ‡οΈ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
maheshp9Β
authored 3
papers 7 months ago
Post
7012
large AI labs open-sourced a ton of models last week π₯
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 π€
> IBM released a new Docling model with 258M params based on Granite (A2.0) π ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana π (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset π» OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash π meituan-longcat/LongCat-Flash-Thinking
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 π€
> IBM released a new Docling model with 258M params based on Granite (A2.0) π ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana π (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset π» OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash π meituan-longcat/LongCat-Flash-Thinking
Post
3537
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face π₯
> not only a document converter but also can do document question answering, understand multiple languages π€―
> best part: released with Apache 2.0 license π use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! π€
> built on SigLIP2 & granite-165M
model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo π
> not only a document converter but also can do document question answering, understand multiple languages π€―
> best part: released with Apache 2.0 license π use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! π€
> built on SigLIP2 & granite-165M
model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo π
Post
1289
a ton of image/video generation models and LLMs from big labs π₯
> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use π¬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR π
> ByteDance released bytedance-research/HuMo, video generation from any input β―οΈ
find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use π¬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR π
> ByteDance released bytedance-research/HuMo, video generation from any input β―οΈ
find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
Post
1080
fan-favorite vision LM Florence-2 is now officially supported in transformers π€
find all the models in
florence-community org π«‘
find all the models in
Post
1878
past week was great for open LLMs π₯ merve/sep-1-releases-68bede0e729c12597eefd050
> Google released google/embeddinggemma-300m, new embedding model with 300M params
> new update to Kimi-K2 just landed moonshotai/Kimi-K2-Instruct-0905 π
> OpenBMB released a new version to MiniCPM with 8B params openbmb/MiniCPM4.1-8B
also soooo many Qwen-Image & Kontext LoRAs dropped!
> Google released google/embeddinggemma-300m, new embedding model with 300M params
> new update to Kimi-K2 just landed moonshotai/Kimi-K2-Instruct-0905 π
> OpenBMB released a new version to MiniCPM with 8B params openbmb/MiniCPM4.1-8B
also soooo many Qwen-Image & Kontext LoRAs dropped!
Post
3769
upgrade your transformers π₯
it comes with insanely capable models like merve/sam2-66ac9deac6fca3bc5482fe30, microsoft/kosmos-2.5, and more π«‘
I built a notebook you can run with free Colab T4 to walk through the API for new models ππ»ββοΈ merve/smol-vision
fine-tuning will follow-up soon!
it comes with insanely capable models like merve/sam2-66ac9deac6fca3bc5482fe30, microsoft/kosmos-2.5, and more π«‘
I built a notebook you can run with free Colab T4 to walk through the API for new models ππ»ββοΈ merve/smol-vision
fine-tuning will follow-up soon!
Post
4453
Super excited to announce that our research team at Hugging Face will be doing an AMA on reddit r/LocalLLaMA.
Come ask any questions to the team behind SmolLM, FineWeb and more! And who knows, maybe thereβll be a shiny new release to talk about?
Thursday 4th September, 8AM-11AM PST π€
science
Come ask any questions to the team behind SmolLM, FineWeb and more! And who knows, maybe thereβll be a shiny new release to talk about?
Thursday 4th September, 8AM-11AM PST π€
Post
6328
large AI labs have dropped so many open models last week π₯ don't miss out on them
β Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
β OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
β MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B
find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486
β Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
β OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
β MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B
find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486
ankits0052Β
authored a
paper 9 months ago
Post
6116
first vision language model built off openai/gpt-oss-20b just dropped! π₯
InternVL3.5 comes with 32 models π€― pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ‡οΈ
InternVL3.5 comes with 32 models π€― pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ‡οΈ
Post
764
Motif 2.6B tech report is pretty insane, first time i see a model with differential attention and polynorm trained at scale!
> It's trained on 2.5T of token, with a "data mixture schedule" to continuously adjust the mixture over training.
> They use WSD with a "Simple moving average" averaging the last 6 ckpt every 8B token.
> They trained on Finemath, Fineweb2, DCLM, TxT360.
> Lot of details in the finetuning data they used, for instance they used EvolKit and did some "dataset fusion" to have more compressed knowledge into the data.
> They mention they also tried Normalized GPT, QK-Norm and Cross Layer Attention.
Motif-Technologies/Motif-2.6B
> It's trained on 2.5T of token, with a "data mixture schedule" to continuously adjust the mixture over training.
> They use WSD with a "Simple moving average" averaging the last 6 ckpt every 8B token.
> They trained on Finemath, Fineweb2, DCLM, TxT360.
> Lot of details in the finetuning data they used, for instance they used EvolKit and did some "dataset fusion" to have more compressed knowledge into the data.
> They mention they also tried Normalized GPT, QK-Norm and Cross Layer Attention.
Motif-Technologies/Motif-2.6B