Built from 7 TB of real Kaggle datasets + 20k notebooks, creating real code exec traces using Qwen3-Coder and E2B. Training on this data dramatically improves the ability to execute code and analyze data.
We (@baptistecolle@hannayukhymenko@lvwerra) have created a novel synthetic data generation pipeline with efficient scaffolding, which gives a big performance boost after training your coding agentπ₯With the help of real Kaggle notebooks and datasets we generate synthetic notebooks which aim to analyze datasets and answer factual questions about them more efficiently. We simulate a real code execution environment by prompting LLMs or with the help of E2B sandboxes. We have built a dataset of 50k+ high-quality LLM-generated notebooks which can help your agent become better at performing data analysis and question answering.
Supercharge Appleβs Shortcuts using Cloudflare Workers and Gemini within minutes (and for free, up to 1,500 requests per day) βοΈβ¨
Hello everyone, last week, while experimenting for fun, I created an API that allows you to easily access AI models (in this case, Google's) from the Shortcut app in order to analyze data from my apps and make the most of it thanks to the generative capabilities of advanced models.
It costs me nothing, and I think it might be good to share it so that others can build on it.
In README.md, you will find everything you need to get started and put your own microservice into production, which you can call from the appβs HTTP request features.
You will simply be asked to have a free Cloudflare account and an API key obtained from Google's AI Studio.
Feel free to take a look and get back to me if you encounter any problems during deployment.
Released 17 production-ready adaptive text classifiers that learn from just 100 examples per class and continuously improve without retraining.
These models achieve 93% average accuracy across enterprise use cases like email routing, fraud detection, document classification, and support ticket categorization. Built on ModernBERT with prototype memory and elastic weight consolidation.
Key benefits: 90% cost reduction vs API solutions, 90-120ms local inference, dynamic class addition, and zero vendor lock-in.
All models available under adaptive-classifier organization. Install with pip install adaptive-classifier.
RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.
GPT-4.1 dropped this week - and it puts OpenAI back in the race for coding & agentic leadership.
βοΈ API only - no ChatGPT toggle for this. π» Coding performance is back on par with Claude 3.7 Sonnet & Gemini 2.5 Pro (though Gemini still leads). πΈ Pricing: β’ Full: $3.50 / 1M tokens β’ Mini: $0.70 / 1M β’ Nano: $0.17 / 1M π Gemini 2.5 Pro = best price/perf ($3.44 / 1M) π΅ Claude 3.5 Sonnet = $6 / 1M (!)
π§ Not a "thinking" model. π Mini shines on general reasoning tasks (e.g. GPQA), but only the full model holds up in SWE-bench-verified (GitHub issue solving).
New king of open VLMs: InternVL3 takes Qwen 2.5's crown! π
InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline.
β‘οΈ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential π’ : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together.
π« The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? β€οΈ), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase π¨.
They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. π
π DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!
Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. π
π But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks. This is big news: with RL, maybe we could build good agents without the need for huge datasets.
UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).
Specifically, the reward function assesses: π― Action type accuracy: Does the predicted action match the ground truth? π Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box? π Output format: Does the model clearly articulate both its reasoning and final action?
Using just 136 carefully selected mobile tasksβcompared to 76,000 tasks for larger models like OS-AtlasβUI-R1 shows significant efficiency and improved performance: π Boosted action prediction accuracy from 76% to 89% on AndroidControl. π Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K). π Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.
The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? π§
DeepGit: Your GitHub Gold Digger! π°π Hey Hugging Face gang! Meet DeepGitβmy open-source sidekick that rips through GitHub to snag repos that fit you. Done with dead-end searches? Me too. Built it with LangGraph and some dope tricks: Embeddings grab the good stuff (HF magic, baby!)
Re-ranking nails the best picks
Snoops docs, code, and buzz in one slick flow
Drops a clean list of hidden gems π
Unearth that sneaky ML lib or Python gemβrun python app.py or langgraph dev and boom! Peek it at https://github.com/zamalali/DeepGit. Fork it, tweak it, love itβDockerβs in, HF vibes are strong. Drop a π or a crazy ideaβIβm pumped to jam with you all! πͺ
We've all become experts at clicking "I agree" without a second thought. In my latest blog post, I explore why these traditional consent models are increasingly problematic in the age of generative AI.
I found three fundamental challenges: - Scope problem: how can you know what you're agreeing to when AI could use your data in different ways? - Temporality problem: once an AI system learns from your data, good luck trying to make it "unlearn" it. - Autonomy trap: the data you share today could create systems that pigeonhole you tomorrow.
Individual users shouldn't bear all the responsibility, while big tech holds all the cards. We need better approaches to level the playing field, from collective advocacy and stronger technological safeguards to establishing "data fiduciaries" with a legal duty to protect our digital interests.
We are happy to release the OpenPII English Anonymiser βthe most powerful open-source tool for redacting sensitive info from English text.
Fine-tuned Modernbert on 5.7 million+ PII examples, itβs clocking 99%+ accuracy across emails, dates, social numbers, and more!
Why itβs a big deal: β Top-tier precision: 100% for passport numbers, 99.96% for emails*. β Totally free: MIT license for personal or commercial use. β No secrets: Full metrics shared on Hugging Face.