1 7 56

William Suffill

wsuff

wsuff

AI & ML interests

None yet

Recent Activity

liked a dataset 2 months ago

jupyter-agent/jupyter-agent-dataset

reacted to hannayukhymenko's post with 👍 2 months ago

Releasing the Jupyter Agent Dataset! 🚀 Built from 7 TB of real Kaggle datasets + 20k notebooks, creating real code exec traces using Qwen3-Coder and E2B. Training on this data dramatically improves the ability to execute code and analyze data. We (@baptistecolle @hannayukhymenko @lvwerra) have created a novel synthetic data generation pipeline with efficient scaffolding, which gives a big performance boost after training your coding agent🔥With the help of real Kaggle notebooks and datasets we generate synthetic notebooks which aim to analyze datasets and answer factual questions about them more efficiently. We simulate a real code execution environment by prompting LLMs or with the help of E2B sandboxes. We have built a dataset of 50k+ high-quality LLM-generated notebooks which can help your agent become better at performing data analysis and question answering. Link: https://huggingface.co/datasets/data-agents/jupyter-agent-dataset

reacted to louisbrulenaudet's post with 👍 2 months ago

Supercharge Apple’s Shortcuts using Cloudflare Workers and Gemini within minutes (and for free, up to 1,500 requests per day) ☁️✨ Hello everyone, last week, while experimenting for fun, I created an API that allows you to easily access AI models (in this case, Google's) from the Shortcut app in order to analyze data from my apps and make the most of it thanks to the generative capabilities of advanced models. It costs me nothing, and I think it might be good to share it so that others can build on it. In README.md, you will find everything you need to get started and put your own microservice into production, which you can call from the app’s HTTP request features. You will simply be asked to have a free Cloudflare account and an API key obtained from Google's AI Studio. Feel free to take a look and get back to me if you encounter any problems during deployment. Here is the GitHub repo where you can find all the source code and run it on your own: https://github.com/louisbrulenaudet/genai-api

View all activity

Organizations

None yet

liked a dataset 2 months ago

jupyter-agent/jupyter-agent-dataset

Viewer • Updated Sep 10 • 95.8k • 1.61k • 149

reacted to hannayukhymenko's post with 👍 2 months ago

Post

2830

Releasing the Jupyter Agent Dataset! 🚀

Built from 7 TB of real Kaggle datasets + 20k notebooks, creating real code exec traces using Qwen3-Coder and E2B.
Training on this data dramatically improves the ability to execute code and analyze data.

We ( @baptistecolle @hannayukhymenko @lvwerra ) have created a novel synthetic data generation pipeline with efficient scaffolding, which gives a big performance boost after training your coding agent🔥With the help of real Kaggle notebooks and datasets we generate synthetic notebooks which aim to analyze datasets and answer factual questions about them more efficiently. We simulate a real code execution environment by prompting LLMs or with the help of E2B sandboxes. We have built a dataset of 50k+ high-quality LLM-generated notebooks which can help your agent become better at performing data analysis and question answering.

Link: https://huggingface.co/datasets/data-agents/jupyter-agent-dataset

3 replies

reacted to louisbrulenaudet's post with 👍 2 months ago

Post

6112

Supercharge Apple’s Shortcuts using Cloudflare Workers and Gemini within minutes (and for free, up to 1,500 requests per day) ☁️✨

Hello everyone, last week, while experimenting for fun, I created an API that allows you to easily access AI models (in this case, Google's) from the Shortcut app in order to analyze data from my apps and make the most of it thanks to the generative capabilities of advanced models.

It costs me nothing, and I think it might be good to share it so that others can build on it.

In README.md, you will find everything you need to get started and put your own microservice into production, which you can call from the app’s HTTP request features.

You will simply be asked to have a free Cloudflare account and an API key obtained from Google's AI Studio.

Feel free to take a look and get back to me if you encounter any problems during deployment.

Here is the GitHub repo where you can find all the source code and run it on your own: https://github.com/louisbrulenaudet/genai-api

reacted to codelion's post with 👀 3 months ago

Post

4706

Released 17 production-ready adaptive text classifiers that learn from just 100 examples per class and continuously improve without retraining.

These models achieve 93% average accuracy across enterprise use cases like email routing, fraud detection, document classification, and support ticket categorization. Built on ModernBERT with prototype memory and elastic weight consolidation.

Key benefits: 90% cost reduction vs API solutions, 90-120ms local inference, dynamic class addition, and zero vendor lock-in.

All models available under adaptive-classifier organization. Install with pip install adaptive-classifier.

Full technical details: https://huggingface.co/blog/codelion/enterprise-ready-classifiers
Code: https://github.com/codelion/adaptive-classifier

2 replies

liked a model 3 months ago

unsloth/gpt-oss-20b-GGUF

Text Generation • 21B • Updated Sep 24 • 216k • 454

liked 2 Spaces 6 months ago

893

TTS Arena V2

🏆

Vote on the latest TTS models!

WebApp1K Models Leaderboard

🥇

Display leaderboard for model performance metrics

reacted to Kseniase's post with 👍 7 months ago

Post

7675

11 new types of RAG

RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.

Here are 11 latest RAG types:

1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization

2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store

3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors

4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks

5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts

6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation

7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers

8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways

To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai

Subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further 👇

1 reply

reacted to eugenesiow's post with 👀 7 months ago

Post

1837

GPT-4.1 dropped this week - and it puts OpenAI back in the race for coding & agentic leadership.

⚙️ API only - no ChatGPT toggle for this.
💻 Coding performance is back on par with Claude 3.7 Sonnet & Gemini 2.5 Pro (though Gemini still leads).
💸 Pricing:
• Full: $3.50 / 1M tokens
• Mini: $0.70 / 1M
• Nano: $0.17 / 1M
👉 Gemini 2.5 Pro = best price/perf ($3.44 / 1M)
😵 Claude 3.5 Sonnet = $6 / 1M (!)

🧠 Not a "thinking" model.
📊 Mini shines on general reasoning tasks (e.g. GPQA), but only the full model holds up in SWE-bench-verified (GitHub issue solving).

reacted to m-ric's post with 🚀 7 months ago

Post

3099

New king of open VLMs: InternVL3 takes Qwen 2.5's crown! 👑

InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline.

➡️ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential 🔢 : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together.

💫 The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? ❤️), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase 🎨.

They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. 👑

2 replies

liked a model 7 months ago

agentica-org/DeepCoder-14B-Preview

Text Generation • 15B • Updated May 11 • 1.12k • • 680

reacted to merterbak's post with 👀 7 months ago

Post

4769

Qwen 3 can launch very soon. 👀

https://github.com/ggml-org/llama.cpp/pull/12828

3 replies

reacted to MikeDoes's post with 🔥 7 months ago

Post

2798

🚀 We are quite excited to announce the Ai4Privacy Python library! 🎉

pip install ai4privacy to anonymize short english text with OpenPII Masking 500k labels

📊 Day 5/7 of PII Masking 1M announcements complete! ⏰

reacted to m-ric's post with ❤️ 7 months ago

Post

2442

🚀 DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!

Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. 📚

👉 But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks.
This is big news: with RL, maybe we could build good agents without the need for huge datasets.

UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).

Specifically, the reward function assesses:
🎯 Action type accuracy: Does the predicted action match the ground truth?
📍 Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box?
📑 Output format: Does the model clearly articulate both its reasoning and final action?

Using just 136 carefully selected mobile tasks—compared to 76,000 tasks for larger models like OS-Atlas—UI-R1 shows significant efficiency and improved performance:
📈 Boosted action prediction accuracy from 76% to 89% on AndroidControl.
🌐 Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K).
🔍 Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.

The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? 🧐

Read the full paper here 👉 UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (2503.21620)

reacted to zamal's post with 👍 7 months ago

Post

2616

DeepGit: Your GitHub Gold Digger! 💰🚀
Hey Hugging Face gang! Meet DeepGit—my open-source sidekick that rips through GitHub to snag repos that fit you. Done with dead-end searches? Me too. Built it with LangGraph and some dope tricks:
Embeddings grab the good stuff (HF magic, baby!)

Re-ranking nails the best picks

Snoops docs, code, and buzz in one slick flow

Drops a clean list of hidden gems 💎

Unearth that sneaky ML lib or Python gem—run python app.py or langgraph dev and boom! Peek it at https://github.com/zamalali/DeepGit. Fork it, tweak it, love it—Docker’s in, HF vibes are strong. Drop a 🌟 or a crazy idea—I’m pumped to jam with you all! 🪂

reacted to giadap's post with 🔥 7 months ago

Post

2376

We've all become experts at clicking "I agree" without a second thought. In my latest blog post, I explore why these traditional consent models are increasingly problematic in the age of generative AI.

I found three fundamental challenges:
- Scope problem: how can you know what you're agreeing to when AI could use your data in different ways?
- Temporality problem: once an AI system learns from your data, good luck trying to make it "unlearn" it.
- Autonomy trap: the data you share today could create systems that pigeonhole you tomorrow.

Individual users shouldn't bear all the responsibility, while big tech holds all the cards. We need better approaches to level the playing field, from collective advocacy and stronger technological safeguards to establishing "data fiduciaries" with a legal duty to protect our digital interests.

Available here: https://huggingface.co/blog/giadap/beyond-consent

reacted to MikeDoes's post with 🚀 8 months ago

Post

2732

🚀 Ai4Privacy Team is excited to unveil PII-Masking-1M, our most significant release yet! 🎉

This publication series 📦 includes datasets 📊, models 🤖, and applications ⚙️ to advance PII masking with AI systems 🛡️

Starting on Monday with daily posts at 7 PM CET ⏰

reacted to chansung's post with ❤️ 8 months ago

Post

2671

Mistral AI Small 3.1 24B is not only commercial free but also the best model in a single GPU deployment.

I packed up all the information you need to know in a single picture. Hope this helps! :)

1 reply

reacted to MikeDoes's post with 🚀 8 months ago

Post

2115

#PII Masking Tech that does not **** around!

We are happy to release the OpenPII English Anonymiser —the most powerful open-source tool for redacting sensitive info from English text.

Fine-tuned Modernbert on 5.7 million+ PII examples, it’s clocking 99%+ accuracy across emails, dates, social numbers, and more!

Why it’s a big deal:
✅ Top-tier precision: 100% for passport numbers, 99.96% for emails*.
✅ Totally free: MIT license for personal or commercial use.
✅ No secrets: Full metrics shared on Hugging Face.

#AI #OpenSource #DataSecurity @huggingface

Day 2 out 7 of PII-Masking-1M Announcements Complete!

*Accuracies reported from the new OpenPII-500k dataset

ai4privacy/llama-ai4privacy-english-anonymiser-openpii

liked a model 8 months ago

ai4privacy/llama-ai4privacy-english-anonymiser-openpii

Token Classification • 0.1B • Updated Jun 5 • 451 • 17

William Suffill

AI & ML interests

Recent Activity

Organizations

wsuff's activity

TTS Arena V2

WebApp1K Models Leaderboard