Upload folder using huggingface_hub
Browse files- 02_evaluate-hub-model.md +83 -0
- 03_publish-hub-dataset.md +74 -0
- 04_sft-finetune-hub.md +37 -0
- README.md +101 -1
02_evaluate-hub-model.md
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Week 1: Evaluate a Hub Model
|
| 2 |
+
|
| 3 |
+
**Goal:** Add evaluation results to model cards across the Hub. Together, we're building a distributed leaderboard of open source model performance.
|
| 4 |
+
|
| 5 |
+
>[!NOTE]
|
| 6 |
+
> Bonus XP for contributing to the leaderboard application. Open a PR [on the hub](https://huggingface.co/spaces/humanitys-last-hackathon/distributed-leaderboard/discussions) or [on GitHub](https://github.com/huggingface/skills/blob/main/apps/evals-leaderboard/app.py) to get your XP.
|
| 7 |
+
|
| 8 |
+
## Why This Matters
|
| 9 |
+
|
| 10 |
+
Model cards without evaluation data are hard to compare. By adding structured eval results to `model-index` metadata, we make models searchable, sortable, and easier to choose between. Your contributions power leaderboards and help the community find the best models for their needs. Also, by doing this in a distributed way, we can share our evaluation results with the community.
|
| 11 |
+
|
| 12 |
+
## The Skill
|
| 13 |
+
|
| 14 |
+
Use `hf_model_evaluation/` for this quest. Key capabilities:
|
| 15 |
+
|
| 16 |
+
- Extract evaluation tables from existing README content
|
| 17 |
+
- Import benchmark scores from Artificial Analysis
|
| 18 |
+
- Run your own evals with inspect-ai on HF Jobs
|
| 19 |
+
- Update model-index metadata (Papers with Code compatible)
|
| 20 |
+
|
| 21 |
+
```bash
|
| 22 |
+
# Preview what would be extracted
|
| 23 |
+
python hf_model_evaluation/scripts/evaluation_manager.py extract-readme \
|
| 24 |
+
--repo-id "model-author/model-name" --dry-run
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
## XP Tiers
|
| 28 |
+
|
| 29 |
+
### π’ Starter β 50 XP
|
| 30 |
+
|
| 31 |
+
**Extract evaluation results from one benchmark and update its model card.**
|
| 32 |
+
|
| 33 |
+
1. Pick a Hub model without evaluation data from *trending models* on the hub
|
| 34 |
+
2. Use the skill to extract or add a benchmark score
|
| 35 |
+
3. Create a PR (or push directly if you own the model)
|
| 36 |
+
|
| 37 |
+
**What counts:** One model, one dataset, metric visible in model card metadata.
|
| 38 |
+
|
| 39 |
+
### π Standard β 100 XP
|
| 40 |
+
|
| 41 |
+
**Import scores from third-party benchmarks like Artificial Analysis.**
|
| 42 |
+
|
| 43 |
+
1. Find a model with benchmark data on external sites
|
| 44 |
+
2. Use `import-aa` to fetch scores from Artificial Analysis API
|
| 45 |
+
3. Create a PR with properly attributed evaluation data
|
| 46 |
+
|
| 47 |
+
**What counts:** Undefined benchmark scores and merged PRs.
|
| 48 |
+
|
| 49 |
+
```bash
|
| 50 |
+
AA_API_KEY="your-key" python hf_model_evaluation/scripts/evaluation_manager.py import-aa \
|
| 51 |
+
--creator-slug "anthropic" --model-name "claude-sonnet-4" \
|
| 52 |
+
--repo-id "target/model" --create-pr
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
### π¦ Advanced β 200 XP
|
| 56 |
+
|
| 57 |
+
**Run your own evaluation with inspect-ai and publish results.**
|
| 58 |
+
|
| 59 |
+
1. Choose an eval task (MMLU, GSM8K, HumanEval, etc.)
|
| 60 |
+
2. Run the evaluation on HF Jobs infrastructure
|
| 61 |
+
3. Update the model card with your results and methodology
|
| 62 |
+
|
| 63 |
+
**What counts:** Original eval run and merged PR.
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
HF_TOKEN=$HF_TOKEN hf jobs uv run hf_model_evaluation/scripts/inspect_eval_uv.py \
|
| 67 |
+
--flavor a10g-small --secret HF_TOKEN=$HF_TOKEN \
|
| 68 |
+
-- --model "meta-llama/Llama-2-7b-hf" --task "mmlu"
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Tips
|
| 72 |
+
|
| 73 |
+
- Always use `--dry-run` first to preview changes before pushing
|
| 74 |
+
- Check for transposed tables where models are rows and benchmarks are columns
|
| 75 |
+
- Be careful with PRs for models you don't own β most maintainers appreciate eval contributions but be respectful.
|
| 76 |
+
- Manually validate the extracted scores and close PRs if needed.
|
| 77 |
+
|
| 78 |
+
## Resources
|
| 79 |
+
|
| 80 |
+
- [SKILL.md](../hf_model_evaluation/SKILL.md) β Full skill documentation
|
| 81 |
+
- [Example Usage](../hf_model_evaluation/examples/USAGE_EXAMPLES.md) β Worked examples
|
| 82 |
+
- [Metric Mapping](../hf_model_evaluation/examples/metric_mapping.json) β Standard metric types
|
| 83 |
+
|
03_publish-hub-dataset.md
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Week 2: Publish a Hub Dataset
|
| 2 |
+
|
| 3 |
+
Create and share high-quality datasets on the Hub. Good data is the foundation of good modelsβhelp the community by contributing datasets others can train on.
|
| 4 |
+
|
| 5 |
+
## Why This Matters
|
| 6 |
+
|
| 7 |
+
The best open source models are built on openly available datasets. By publishing well-documented, properly structured datasets, you're directly enabling the next generation of model development. Quality matters more than quantity.
|
| 8 |
+
|
| 9 |
+
## The Skill
|
| 10 |
+
|
| 11 |
+
Use `hf_dataset_creator/` for this quest. Key capabilities:
|
| 12 |
+
|
| 13 |
+
- Initialize dataset repos with proper structure
|
| 14 |
+
- Multi-format support: chat, classification, QA, completion, tabular
|
| 15 |
+
- Template-based validation for data quality
|
| 16 |
+
- Streaming uploads without downloading entire datasets
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
+
# Quick setup with a template
|
| 20 |
+
python hf_dataset_creator/scripts/dataset_manager.py quick_setup \
|
| 21 |
+
--repo_id "your-username/dataset-name" --template chat
|
| 22 |
+
```
|
| 23 |
+
|
| 24 |
+
## XP Tiers
|
| 25 |
+
|
| 26 |
+
### π’ Starter β 50 XP
|
| 27 |
+
|
| 28 |
+
**Upload a small, clean dataset with a complete dataset card.**
|
| 29 |
+
|
| 30 |
+
1. Create a dataset with β€1,000 rows
|
| 31 |
+
2. Write a dataset card covering: license, splits, and data provenance
|
| 32 |
+
3. Upload to the Hub under the hackathon organization (or your own account)
|
| 33 |
+
|
| 34 |
+
**What counts:** Clean data, clear documentation, proper licensing.
|
| 35 |
+
|
| 36 |
+
```bash
|
| 37 |
+
python hf_dataset_creator/scripts/dataset_manager.py init \
|
| 38 |
+
--repo_id "humanitys-last-hackathon/your-dataset-name"
|
| 39 |
+
|
| 40 |
+
python hf_dataset_creator/scripts/dataset_manager.py add_rows \
|
| 41 |
+
--repo_id "humanitys-last-hackathon/your-dataset-name" \
|
| 42 |
+
--template classification \
|
| 43 |
+
--rows_json "$(cat your_data.json)"
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
### π Standard β 100 XP
|
| 47 |
+
|
| 48 |
+
**Publish a conversational dataset with a complete dataset card.**
|
| 49 |
+
|
| 50 |
+
1. Create a dataset with β€1,000 rows
|
| 51 |
+
2. Write a dataset card covering: license and splits.
|
| 52 |
+
3. Upload to the Hub under the hackathon organization.
|
| 53 |
+
|
| 54 |
+
**What counts:** Clean data, clear documentation, proper licensing.
|
| 55 |
+
|
| 56 |
+
### π¦ Advanced β 200 XP
|
| 57 |
+
|
| 58 |
+
**Translate a dataset into multiple languages and publish it on the Hub.**
|
| 59 |
+
|
| 60 |
+
1. Find a dataset on the Hub
|
| 61 |
+
2. Translate the dataset into multiple languages
|
| 62 |
+
3. Publish the translated datasets on the Hub under the hackathon organization
|
| 63 |
+
|
| 64 |
+
**What counts:** Translated datasets and merged PRs.
|
| 65 |
+
|
| 66 |
+
## Resources
|
| 67 |
+
|
| 68 |
+
- [SKILL.md](../hf_dataset_creator/SKILL.md) β Full skill documentation
|
| 69 |
+
- [Templates](../hf_dataset_creator/templates/) β JSON templates for each format
|
| 70 |
+
- [Examples](../hf_dataset_creator/examples/) β Sample data and system prompts
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
**Next Quest:** [Supervised Fine-Tuning](04_sft-finetune-hub.md)
|
04_sft-finetune-hub.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Week 3: Supervised Fine-Tuning on the Hub
|
| 2 |
+
|
| 3 |
+
Fine-tune and share models on the Hub. Take a base model, train it on your data, and publish the result for the community to use.
|
| 4 |
+
|
| 5 |
+
## Why This Matters
|
| 6 |
+
|
| 7 |
+
Fine-tuning is how we adapt foundation models to specific tasks. By sharing fine-tuned modelsβalong with your training methodologyβyou're giving the community ready-to-use solutions and reproducible recipes they can learn from.
|
| 8 |
+
|
| 9 |
+
## The Skill
|
| 10 |
+
|
| 11 |
+
Use `hf-llm-trainer/` for this quest. Key capabilities:
|
| 12 |
+
|
| 13 |
+
- **SFT** (Supervised Fine-Tuning) β Standard instruction tuning
|
| 14 |
+
- **DPO** (Direct Preference Optimization) β Alignment from preference data
|
| 15 |
+
- **GRPO** (Group Relative Policy Optimization) β Online RL training
|
| 16 |
+
- Cloud GPU training on HF Jobsβno local setup required
|
| 17 |
+
- Trackio integration for real-time monitoring
|
| 18 |
+
- GGUF conversion for local deployment
|
| 19 |
+
|
| 20 |
+
Your coding agent uses `hf_jobs()` to submit training scripts directly to HF infrastructure.
|
| 21 |
+
|
| 22 |
+
## XP Tiers
|
| 23 |
+
|
| 24 |
+
We'll announce the XP tiers for this quest soon.
|
| 25 |
+
|
| 26 |
+
## Resources
|
| 27 |
+
|
| 28 |
+
- [SKILL.md](../hf-llm-trainer/SKILL.md) β Full skill documentation
|
| 29 |
+
- [SFT Example](../hf-llm-trainer/scripts/train_sft_example.py) β Production SFT template
|
| 30 |
+
- [DPO Example](../hf-llm-trainer/scripts/train_dpo_example.py) β Production DPO template
|
| 31 |
+
- [GRPO Example](../hf-llm-trainer/scripts/train_grpo_example.py) β Production GRPO template
|
| 32 |
+
- [Training Methods](../hf-llm-trainer/references/training_methods.md) β Method selection guide
|
| 33 |
+
- [Hardware Guide](../hf-llm-trainer/references/hardware_guide.md) β GPU selection
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
**All quests complete?** Head back to [01_start.md](01_start.md) for the full schedule and leaderboard info.
|
README.md
CHANGED
|
@@ -7,4 +7,104 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Humanity's Last Hackathon (of 2025)
|
| 11 |
+
|
| 12 |
+
Welcome to our hackathon!
|
| 13 |
+
|
| 14 |
+
Whether youβre a tooled up ML engineer, a classicist NLP dev, or an AGI pilled vibe coder, this hackathon is going to be hard work! Weβre going to take the latest and greatest coding agents
|
| 15 |
+
and use them to level up open source AI. After all, **why use December to relax and spend time with loved ones, when you can solve AI for all humanity?** Jokes aside, this hackathon is not
|
| 16 |
+
about learning skills from zero or breaking things down in their simplest components. Itβs about collaborating, shipping, and making a difference for the open source community.
|
| 17 |
+
|
| 18 |
+
## What We're Building
|
| 19 |
+
|
| 20 |
+
Over four weeks, we're using coding agents to level up the open source AI ecosystem:
|
| 21 |
+
|
| 22 |
+
- **Week 1** β Evaluate models and build a distributed leaderboard
|
| 23 |
+
- **Week 2** β Create high-quality datasets for the community
|
| 24 |
+
- **Week 3** β Fine-tune and share models on the Hub
|
| 25 |
+
- **Week 4** β Sprint to the finish line together
|
| 26 |
+
|
| 27 |
+
Every contribution earns XP. Top contributors make the leaderboard. Winners get prizes!
|
| 28 |
+
|
| 29 |
+
Here's the schedule:
|
| 30 |
+
|
| 31 |
+
| Date | Event | Link |
|
| 32 |
+
|------|-------|------|
|
| 33 |
+
| Dec 2 (Mon) | Week 1 Quest Released | [Evaluate a Hub Model](02_evaluate-hub-model.md) |
|
| 34 |
+
| Dec 4 (Wed) | Livestream 1 | TBA |
|
| 35 |
+
| Dec 9 (Mon) | Week 2 Quest Released | [Publish a Hub Dataset](03_publish-hub-dataset.md) |
|
| 36 |
+
| Dec 11 (Wed) | Livestream 2 | TBA |
|
| 37 |
+
| Dec 16 (Mon) | Week 3 Quest Released | [Supervised Fine-Tuning](04_sft-finetune-hub.md) |
|
| 38 |
+
| Dec 18 (Wed) | Livestream 3 | TBA |
|
| 39 |
+
| Dec 23 (Mon) | Week 4 Community Sprint | TBA |
|
| 40 |
+
| Dec 31 (Tue) | Hackathon Ends | TBA
|
| 41 |
+
|
| 42 |
+
## Getting Started
|
| 43 |
+
|
| 44 |
+
### 1. Join the Organization
|
| 45 |
+
|
| 46 |
+
Join [humanitys-last-hackathon](https://huggingface.co/organizations/humanitys-last-hackathon/share/KrqrmBxkETjvevFbfkXeezcyMbgMjjMaOp) on Hugging Face. This is where your contributions will be tracked and updated on the leaderboard.
|
| 47 |
+
|
| 48 |
+
### 2. Set Up Your Coding Agent
|
| 49 |
+
|
| 50 |
+
Use whatever coding agent you prefer:
|
| 51 |
+
|
| 52 |
+
- **Claude Code** β `claude` in your terminal
|
| 53 |
+
- **Codex** β `codex` CLI
|
| 54 |
+
- **Gemini CLI** β `gemini` in your terminal
|
| 55 |
+
- **Cursor / Windsurf** β IDE-based agents
|
| 56 |
+
- **Open source** β aider, continue, etc.
|
| 57 |
+
|
| 58 |
+
The skills in this repo work with any agent that can read markdown instructions and run Python scripts. To install the skills, follow the instructions in the [README](../README.md).
|
| 59 |
+
|
| 60 |
+
### 3. Get Your HF Token
|
| 61 |
+
|
| 62 |
+
Most quests require a Hugging Face token with write access:
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
# mac/linux
|
| 66 |
+
curl -LsSf https://hf.co/cli/install.sh | bash
|
| 67 |
+
|
| 68 |
+
# windows
|
| 69 |
+
powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
|
| 70 |
+
|
| 71 |
+
# Login (creates/stores your token)
|
| 72 |
+
hf auth login
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
This will set your `HF_TOKEN` environment variable.
|
| 76 |
+
|
| 77 |
+
### 4. Clone the Skills Repo
|
| 78 |
+
|
| 79 |
+
```bash
|
| 80 |
+
git clone https://github.com/huggingface/skills.git
|
| 81 |
+
cd skills
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
Point your coding agent at the relevant configuration. Check the [README](../README.md) for instructions on how to use the skills with your coding agent.
|
| 85 |
+
|
| 86 |
+
## Your First Quest
|
| 87 |
+
|
| 88 |
+
**Week 1 is live!** Head to [02_evaluate-hub-model.md](02_evaluate-hub-model.md) to start evaluating models and climb the leaderboard.
|
| 89 |
+
|
| 90 |
+
## Earning XP
|
| 91 |
+
|
| 92 |
+
Each quest has three tiers:
|
| 93 |
+
|
| 94 |
+
| Tier | What it takes | XP |
|
| 95 |
+
|------|---------------|-----|
|
| 96 |
+
| π’ | Complete the basics | 50-75 XP |
|
| 97 |
+
| π | Go deeper with more features | 100-125 XP |
|
| 98 |
+
| π¦ | Ship something impressive | 200-225 XP |
|
| 99 |
+
|
| 100 |
+
You can complete multiple tiers, and you can complete the same quest multiple times with different models/datasets/spaces.
|
| 101 |
+
|
| 102 |
+
## Getting Help
|
| 103 |
+
|
| 104 |
+
- [Discord](https://discord.com/channels/879548962464493619/1442881667986624554) β Join the Hugging Face Discord for real-time help
|
| 105 |
+
- [Livestreams](https://www.youtube.com/@HuggingFace/streams) β Weekly streams with walkthroughs and Q&A
|
| 106 |
+
- [Issues](https://github.com/huggingface/skills/issues) β Open an issue in this repo if you're stuck
|
| 107 |
+
|
| 108 |
+
To join the Hackathon, join the organization on the hub and setup your coding agent.
|
| 109 |
+
|
| 110 |
+
Ready? Let's ship some AI. π
|