burtenshaw HF Staff commited on
Commit
5506306
Β·
verified Β·
1 Parent(s): d1c1dfd

Upload folder using huggingface_hub

Browse files
02_evaluate-hub-model.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Week 1: Evaluate a Hub Model
2
+
3
+ **Goal:** Add evaluation results to model cards across the Hub. Together, we're building a distributed leaderboard of open source model performance.
4
+
5
+ >[!NOTE]
6
+ > Bonus XP for contributing to the leaderboard application. Open a PR [on the hub](https://huggingface.co/spaces/humanitys-last-hackathon/distributed-leaderboard/discussions) or [on GitHub](https://github.com/huggingface/skills/blob/main/apps/evals-leaderboard/app.py) to get your XP.
7
+
8
+ ## Why This Matters
9
+
10
+ Model cards without evaluation data are hard to compare. By adding structured eval results to `model-index` metadata, we make models searchable, sortable, and easier to choose between. Your contributions power leaderboards and help the community find the best models for their needs. Also, by doing this in a distributed way, we can share our evaluation results with the community.
11
+
12
+ ## The Skill
13
+
14
+ Use `hf_model_evaluation/` for this quest. Key capabilities:
15
+
16
+ - Extract evaluation tables from existing README content
17
+ - Import benchmark scores from Artificial Analysis
18
+ - Run your own evals with inspect-ai on HF Jobs
19
+ - Update model-index metadata (Papers with Code compatible)
20
+
21
+ ```bash
22
+ # Preview what would be extracted
23
+ python hf_model_evaluation/scripts/evaluation_manager.py extract-readme \
24
+ --repo-id "model-author/model-name" --dry-run
25
+ ```
26
+
27
+ ## XP Tiers
28
+
29
+ ### 🐒 Starter β€” 50 XP
30
+
31
+ **Extract evaluation results from one benchmark and update its model card.**
32
+
33
+ 1. Pick a Hub model without evaluation data from *trending models* on the hub
34
+ 2. Use the skill to extract or add a benchmark score
35
+ 3. Create a PR (or push directly if you own the model)
36
+
37
+ **What counts:** One model, one dataset, metric visible in model card metadata.
38
+
39
+ ### πŸ• Standard β€” 100 XP
40
+
41
+ **Import scores from third-party benchmarks like Artificial Analysis.**
42
+
43
+ 1. Find a model with benchmark data on external sites
44
+ 2. Use `import-aa` to fetch scores from Artificial Analysis API
45
+ 3. Create a PR with properly attributed evaluation data
46
+
47
+ **What counts:** Undefined benchmark scores and merged PRs.
48
+
49
+ ```bash
50
+ AA_API_KEY="your-key" python hf_model_evaluation/scripts/evaluation_manager.py import-aa \
51
+ --creator-slug "anthropic" --model-name "claude-sonnet-4" \
52
+ --repo-id "target/model" --create-pr
53
+ ```
54
+
55
+ ### 🦁 Advanced β€” 200 XP
56
+
57
+ **Run your own evaluation with inspect-ai and publish results.**
58
+
59
+ 1. Choose an eval task (MMLU, GSM8K, HumanEval, etc.)
60
+ 2. Run the evaluation on HF Jobs infrastructure
61
+ 3. Update the model card with your results and methodology
62
+
63
+ **What counts:** Original eval run and merged PR.
64
+
65
+ ```bash
66
+ HF_TOKEN=$HF_TOKEN hf jobs uv run hf_model_evaluation/scripts/inspect_eval_uv.py \
67
+ --flavor a10g-small --secret HF_TOKEN=$HF_TOKEN \
68
+ -- --model "meta-llama/Llama-2-7b-hf" --task "mmlu"
69
+ ```
70
+
71
+ ## Tips
72
+
73
+ - Always use `--dry-run` first to preview changes before pushing
74
+ - Check for transposed tables where models are rows and benchmarks are columns
75
+ - Be careful with PRs for models you don't own β€” most maintainers appreciate eval contributions but be respectful.
76
+ - Manually validate the extracted scores and close PRs if needed.
77
+
78
+ ## Resources
79
+
80
+ - [SKILL.md](../hf_model_evaluation/SKILL.md) β€” Full skill documentation
81
+ - [Example Usage](../hf_model_evaluation/examples/USAGE_EXAMPLES.md) β€” Worked examples
82
+ - [Metric Mapping](../hf_model_evaluation/examples/metric_mapping.json) β€” Standard metric types
83
+
03_publish-hub-dataset.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Week 2: Publish a Hub Dataset
2
+
3
+ Create and share high-quality datasets on the Hub. Good data is the foundation of good modelsβ€”help the community by contributing datasets others can train on.
4
+
5
+ ## Why This Matters
6
+
7
+ The best open source models are built on openly available datasets. By publishing well-documented, properly structured datasets, you're directly enabling the next generation of model development. Quality matters more than quantity.
8
+
9
+ ## The Skill
10
+
11
+ Use `hf_dataset_creator/` for this quest. Key capabilities:
12
+
13
+ - Initialize dataset repos with proper structure
14
+ - Multi-format support: chat, classification, QA, completion, tabular
15
+ - Template-based validation for data quality
16
+ - Streaming uploads without downloading entire datasets
17
+
18
+ ```bash
19
+ # Quick setup with a template
20
+ python hf_dataset_creator/scripts/dataset_manager.py quick_setup \
21
+ --repo_id "your-username/dataset-name" --template chat
22
+ ```
23
+
24
+ ## XP Tiers
25
+
26
+ ### 🐒 Starter β€” 50 XP
27
+
28
+ **Upload a small, clean dataset with a complete dataset card.**
29
+
30
+ 1. Create a dataset with ≀1,000 rows
31
+ 2. Write a dataset card covering: license, splits, and data provenance
32
+ 3. Upload to the Hub under the hackathon organization (or your own account)
33
+
34
+ **What counts:** Clean data, clear documentation, proper licensing.
35
+
36
+ ```bash
37
+ python hf_dataset_creator/scripts/dataset_manager.py init \
38
+ --repo_id "humanitys-last-hackathon/your-dataset-name"
39
+
40
+ python hf_dataset_creator/scripts/dataset_manager.py add_rows \
41
+ --repo_id "humanitys-last-hackathon/your-dataset-name" \
42
+ --template classification \
43
+ --rows_json "$(cat your_data.json)"
44
+ ```
45
+
46
+ ### πŸ• Standard β€” 100 XP
47
+
48
+ **Publish a conversational dataset with a complete dataset card.**
49
+
50
+ 1. Create a dataset with ≀1,000 rows
51
+ 2. Write a dataset card covering: license and splits.
52
+ 3. Upload to the Hub under the hackathon organization.
53
+
54
+ **What counts:** Clean data, clear documentation, proper licensing.
55
+
56
+ ### 🦁 Advanced β€” 200 XP
57
+
58
+ **Translate a dataset into multiple languages and publish it on the Hub.**
59
+
60
+ 1. Find a dataset on the Hub
61
+ 2. Translate the dataset into multiple languages
62
+ 3. Publish the translated datasets on the Hub under the hackathon organization
63
+
64
+ **What counts:** Translated datasets and merged PRs.
65
+
66
+ ## Resources
67
+
68
+ - [SKILL.md](../hf_dataset_creator/SKILL.md) β€” Full skill documentation
69
+ - [Templates](../hf_dataset_creator/templates/) β€” JSON templates for each format
70
+ - [Examples](../hf_dataset_creator/examples/) β€” Sample data and system prompts
71
+
72
+ ---
73
+
74
+ **Next Quest:** [Supervised Fine-Tuning](04_sft-finetune-hub.md)
04_sft-finetune-hub.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Week 3: Supervised Fine-Tuning on the Hub
2
+
3
+ Fine-tune and share models on the Hub. Take a base model, train it on your data, and publish the result for the community to use.
4
+
5
+ ## Why This Matters
6
+
7
+ Fine-tuning is how we adapt foundation models to specific tasks. By sharing fine-tuned modelsβ€”along with your training methodologyβ€”you're giving the community ready-to-use solutions and reproducible recipes they can learn from.
8
+
9
+ ## The Skill
10
+
11
+ Use `hf-llm-trainer/` for this quest. Key capabilities:
12
+
13
+ - **SFT** (Supervised Fine-Tuning) β€” Standard instruction tuning
14
+ - **DPO** (Direct Preference Optimization) β€” Alignment from preference data
15
+ - **GRPO** (Group Relative Policy Optimization) β€” Online RL training
16
+ - Cloud GPU training on HF Jobsβ€”no local setup required
17
+ - Trackio integration for real-time monitoring
18
+ - GGUF conversion for local deployment
19
+
20
+ Your coding agent uses `hf_jobs()` to submit training scripts directly to HF infrastructure.
21
+
22
+ ## XP Tiers
23
+
24
+ We'll announce the XP tiers for this quest soon.
25
+
26
+ ## Resources
27
+
28
+ - [SKILL.md](../hf-llm-trainer/SKILL.md) β€” Full skill documentation
29
+ - [SFT Example](../hf-llm-trainer/scripts/train_sft_example.py) β€” Production SFT template
30
+ - [DPO Example](../hf-llm-trainer/scripts/train_dpo_example.py) β€” Production DPO template
31
+ - [GRPO Example](../hf-llm-trainer/scripts/train_grpo_example.py) β€” Production GRPO template
32
+ - [Training Methods](../hf-llm-trainer/references/training_methods.md) β€” Method selection guide
33
+ - [Hardware Guide](../hf-llm-trainer/references/hardware_guide.md) β€” GPU selection
34
+
35
+ ---
36
+
37
+ **All quests complete?** Head back to [01_start.md](01_start.md) for the full schedule and leaderboard info.
README.md CHANGED
@@ -7,4 +7,104 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # Humanity's Last Hackathon (of 2025)
11
+
12
+ Welcome to our hackathon!
13
+
14
+ Whether you’re a tooled up ML engineer, a classicist NLP dev, or an AGI pilled vibe coder, this hackathon is going to be hard work! We’re going to take the latest and greatest coding agents
15
+ and use them to level up open source AI. After all, **why use December to relax and spend time with loved ones, when you can solve AI for all humanity?** Jokes aside, this hackathon is not
16
+ about learning skills from zero or breaking things down in their simplest components. It’s about collaborating, shipping, and making a difference for the open source community.
17
+
18
+ ## What We're Building
19
+
20
+ Over four weeks, we're using coding agents to level up the open source AI ecosystem:
21
+
22
+ - **Week 1** β€” Evaluate models and build a distributed leaderboard
23
+ - **Week 2** β€” Create high-quality datasets for the community
24
+ - **Week 3** β€” Fine-tune and share models on the Hub
25
+ - **Week 4** β€” Sprint to the finish line together
26
+
27
+ Every contribution earns XP. Top contributors make the leaderboard. Winners get prizes!
28
+
29
+ Here's the schedule:
30
+
31
+ | Date | Event | Link |
32
+ |------|-------|------|
33
+ | Dec 2 (Mon) | Week 1 Quest Released | [Evaluate a Hub Model](02_evaluate-hub-model.md) |
34
+ | Dec 4 (Wed) | Livestream 1 | TBA |
35
+ | Dec 9 (Mon) | Week 2 Quest Released | [Publish a Hub Dataset](03_publish-hub-dataset.md) |
36
+ | Dec 11 (Wed) | Livestream 2 | TBA |
37
+ | Dec 16 (Mon) | Week 3 Quest Released | [Supervised Fine-Tuning](04_sft-finetune-hub.md) |
38
+ | Dec 18 (Wed) | Livestream 3 | TBA |
39
+ | Dec 23 (Mon) | Week 4 Community Sprint | TBA |
40
+ | Dec 31 (Tue) | Hackathon Ends | TBA
41
+
42
+ ## Getting Started
43
+
44
+ ### 1. Join the Organization
45
+
46
+ Join [humanitys-last-hackathon](https://huggingface.co/organizations/humanitys-last-hackathon/share/KrqrmBxkETjvevFbfkXeezcyMbgMjjMaOp) on Hugging Face. This is where your contributions will be tracked and updated on the leaderboard.
47
+
48
+ ### 2. Set Up Your Coding Agent
49
+
50
+ Use whatever coding agent you prefer:
51
+
52
+ - **Claude Code** β€” `claude` in your terminal
53
+ - **Codex** β€” `codex` CLI
54
+ - **Gemini CLI** β€” `gemini` in your terminal
55
+ - **Cursor / Windsurf** β€” IDE-based agents
56
+ - **Open source** β€” aider, continue, etc.
57
+
58
+ The skills in this repo work with any agent that can read markdown instructions and run Python scripts. To install the skills, follow the instructions in the [README](../README.md).
59
+
60
+ ### 3. Get Your HF Token
61
+
62
+ Most quests require a Hugging Face token with write access:
63
+
64
+ ```bash
65
+ # mac/linux
66
+ curl -LsSf https://hf.co/cli/install.sh | bash
67
+
68
+ # windows
69
+ powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
70
+
71
+ # Login (creates/stores your token)
72
+ hf auth login
73
+ ```
74
+
75
+ This will set your `HF_TOKEN` environment variable.
76
+
77
+ ### 4. Clone the Skills Repo
78
+
79
+ ```bash
80
+ git clone https://github.com/huggingface/skills.git
81
+ cd skills
82
+ ```
83
+
84
+ Point your coding agent at the relevant configuration. Check the [README](../README.md) for instructions on how to use the skills with your coding agent.
85
+
86
+ ## Your First Quest
87
+
88
+ **Week 1 is live!** Head to [02_evaluate-hub-model.md](02_evaluate-hub-model.md) to start evaluating models and climb the leaderboard.
89
+
90
+ ## Earning XP
91
+
92
+ Each quest has three tiers:
93
+
94
+ | Tier | What it takes | XP |
95
+ |------|---------------|-----|
96
+ | 🐒 | Complete the basics | 50-75 XP |
97
+ | πŸ• | Go deeper with more features | 100-125 XP |
98
+ | 🦁 | Ship something impressive | 200-225 XP |
99
+
100
+ You can complete multiple tiers, and you can complete the same quest multiple times with different models/datasets/spaces.
101
+
102
+ ## Getting Help
103
+
104
+ - [Discord](https://discord.com/channels/879548962464493619/1442881667986624554) β€” Join the Hugging Face Discord for real-time help
105
+ - [Livestreams](https://www.youtube.com/@HuggingFace/streams) β€” Weekly streams with walkthroughs and Q&A
106
+ - [Issues](https://github.com/huggingface/skills/issues) β€” Open an issue in this repo if you're stuck
107
+
108
+ To join the Hackathon, join the organization on the hub and setup your coding agent.
109
+
110
+ Ready? Let's ship some AI. πŸš€