Spaces:

hf-skills
/

README

Running

App Files Files Community

burtenshaw HF Staff commited on 8 days ago

Commit

5506306

verified ·

1 Parent(s): d1c1dfd

Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

02_evaluate-hub-model.md +83 -0
03_publish-hub-dataset.md +74 -0
04_sft-finetune-hub.md +37 -0
README.md +101 -1

02_evaluate-hub-model.md ADDED Viewed

	@@ -0,0 +1,83 @@

+# Week 1: Evaluate a Hub Model
+**Goal:** Add evaluation results to model cards across the Hub. Together, we're building a distributed leaderboard of open source model performance.
+>[!NOTE]
+> Bonus XP for contributing to the leaderboard application. Open a PR [on the hub](https://huggingface.co/spaces/humanitys-last-hackathon/distributed-leaderboard/discussions) or [on GitHub](https://github.com/huggingface/skills/blob/main/apps/evals-leaderboard/app.py) to get your XP.
+## Why This Matters
+Model cards without evaluation data are hard to compare. By adding structured eval results to `model-index` metadata, we make models searchable, sortable, and easier to choose between. Your contributions power leaderboards and help the community find the best models for their needs. Also, by doing this in a distributed way, we can share our evaluation results with the community.
+## The Skill
+Use `hf_model_evaluation/` for this quest. Key capabilities:
+- Extract evaluation tables from existing README content
+- Import benchmark scores from Artificial Analysis
+- Run your own evals with inspect-ai on HF Jobs
+- Update model-index metadata (Papers with Code compatible)
+```bash
+# Preview what would be extracted
+python hf_model_evaluation/scripts/evaluation_manager.py extract-readme \
+  --repo-id "model-author/model-name" --dry-run
+```
+## XP Tiers
+### 🐢 Starter — 50 XP
+**Extract evaluation results from one benchmark and update its model card.**
+1. Pick a Hub model without evaluation data from *trending models* on the hub
+2. Use the skill to extract or add a benchmark score
+3. Create a PR (or push directly if you own the model)
+**What counts:** One model, one dataset, metric visible in model card metadata.
+### 🐕 Standard — 100 XP
+**Import scores from third-party benchmarks like Artificial Analysis.**
+1. Find a model with benchmark data on external sites
+2. Use `import-aa` to fetch scores from Artificial Analysis API
+3. Create a PR with properly attributed evaluation data
+**What counts:** Undefined benchmark scores and merged PRs.
+```bash
+AA_API_KEY="your-key" python hf_model_evaluation/scripts/evaluation_manager.py import-aa \
+  --creator-slug "anthropic" --model-name "claude-sonnet-4" \
+  --repo-id "target/model" --create-pr
+```
+### 🦁 Advanced — 200 XP
+**Run your own evaluation with inspect-ai and publish results.**
+1. Choose an eval task (MMLU, GSM8K, HumanEval, etc.)
+2. Run the evaluation on HF Jobs infrastructure
+3. Update the model card with your results and methodology
+**What counts:** Original eval run and merged PR.
+```bash
+HF_TOKEN=$HF_TOKEN hf jobs uv run hf_model_evaluation/scripts/inspect_eval_uv.py \
+  --flavor a10g-small --secret HF_TOKEN=$HF_TOKEN \
+  -- --model "meta-llama/Llama-2-7b-hf" --task "mmlu"
+```
+## Tips
+- Always use `--dry-run` first to preview changes before pushing
+- Check for transposed tables where models are rows and benchmarks are columns
+- Be careful with PRs for models you don't own — most maintainers appreciate eval contributions but be respectful.
+- Manually validate the extracted scores and close PRs if needed.
+## Resources
+- [SKILL.md](../hf_model_evaluation/SKILL.md) — Full skill documentation
+- [Example Usage](../hf_model_evaluation/examples/USAGE_EXAMPLES.md) — Worked examples
+- [Metric Mapping](../hf_model_evaluation/examples/metric_mapping.json) — Standard metric types

03_publish-hub-dataset.md ADDED Viewed

	@@ -0,0 +1,74 @@

+# Week 2: Publish a Hub Dataset
+Create and share high-quality datasets on the Hub. Good data is the foundation of good models—help the community by contributing datasets others can train on.
+## Why This Matters
+The best open source models are built on openly available datasets. By publishing well-documented, properly structured datasets, you're directly enabling the next generation of model development. Quality matters more than quantity.
+## The Skill
+Use `hf_dataset_creator/` for this quest. Key capabilities:
+- Initialize dataset repos with proper structure
+- Multi-format support: chat, classification, QA, completion, tabular
+- Template-based validation for data quality
+- Streaming uploads without downloading entire datasets
+```bash
+# Quick setup with a template
+python hf_dataset_creator/scripts/dataset_manager.py quick_setup \
+  --repo_id "your-username/dataset-name" --template chat
+```
+## XP Tiers
+### 🐢 Starter — 50 XP
+**Upload a small, clean dataset with a complete dataset card.**
+1. Create a dataset with ≤1,000 rows
+2. Write a dataset card covering: license, splits, and data provenance
+3. Upload to the Hub under the hackathon organization (or your own account)
+**What counts:** Clean data, clear documentation, proper licensing.
+```bash
+python hf_dataset_creator/scripts/dataset_manager.py init \
+  --repo_id "humanitys-last-hackathon/your-dataset-name"
+python hf_dataset_creator/scripts/dataset_manager.py add_rows \
+  --repo_id "humanitys-last-hackathon/your-dataset-name" \
+  --template classification \
+  --rows_json "$(cat your_data.json)"
+```
+### 🐕 Standard — 100 XP
+**Publish a conversational dataset with a complete dataset card.**
+1. Create a dataset with ≤1,000 rows
+2. Write a dataset card covering: license and splits.
+3. Upload to the Hub under the hackathon organization.
+**What counts:** Clean data, clear documentation, proper licensing.
+### 🦁 Advanced — 200 XP
+**Translate a dataset into multiple languages and publish it on the Hub.**
+1. Find a dataset on the Hub
+2. Translate the dataset into multiple languages
+3. Publish the translated datasets on the Hub under the hackathon organization
+**What counts:** Translated datasets and merged PRs.
+## Resources
+- [SKILL.md](../hf_dataset_creator/SKILL.md) — Full skill documentation
+- [Templates](../hf_dataset_creator/templates/) — JSON templates for each format
+- [Examples](../hf_dataset_creator/examples/) — Sample data and system prompts
+---
+**Next Quest:** [Supervised Fine-Tuning](04_sft-finetune-hub.md)

04_sft-finetune-hub.md ADDED Viewed

	@@ -0,0 +1,37 @@

+# Week 3: Supervised Fine-Tuning on the Hub
+Fine-tune and share models on the Hub. Take a base model, train it on your data, and publish the result for the community to use.
+## Why This Matters
+Fine-tuning is how we adapt foundation models to specific tasks. By sharing fine-tuned models—along with your training methodology—you're giving the community ready-to-use solutions and reproducible recipes they can learn from.
+## The Skill
+Use `hf-llm-trainer/` for this quest. Key capabilities:
+- **SFT** (Supervised Fine-Tuning) — Standard instruction tuning
+- **DPO** (Direct Preference Optimization) — Alignment from preference data
+- **GRPO** (Group Relative Policy Optimization) — Online RL training
+- Cloud GPU training on HF Jobs—no local setup required
+- Trackio integration for real-time monitoring
+- GGUF conversion for local deployment
+Your coding agent uses `hf_jobs()` to submit training scripts directly to HF infrastructure.
+## XP Tiers
+We'll announce the XP tiers for this quest soon.
+## Resources
+- [SKILL.md](../hf-llm-trainer/SKILL.md) — Full skill documentation
+- [SFT Example](../hf-llm-trainer/scripts/train_sft_example.py) — Production SFT template
+- [DPO Example](../hf-llm-trainer/scripts/train_dpo_example.py) — Production DPO template
+- [GRPO Example](../hf-llm-trainer/scripts/train_grpo_example.py) — Production GRPO template
+- [Training Methods](../hf-llm-trainer/references/training_methods.md) — Method selection guide
+- [Hardware Guide](../hf-llm-trainer/references/hardware_guide.md) — GPU selection
+---
+**All quests complete?** Head back to [01_start.md](01_start.md) for the full schedule and leaderboard info.

README.md CHANGED Viewed

@@ -7,4 +7,104 @@ sdk: static
 pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

 pinned: false
 ---
+# Humanity's Last Hackathon (of 2025)
+Welcome to our hackathon!
+Whether you’re a tooled up ML engineer, a classicist NLP dev, or an AGI pilled vibe coder, this hackathon is going to be hard work! We’re going to take the latest and greatest coding agents
+and use them to level up open source AI. After all, **why use December to relax and spend time with loved ones, when you can solve AI for all humanity?** Jokes aside, this hackathon is not
+about learning skills from zero or breaking things down in their simplest components. It’s about collaborating, shipping, and making a difference for the open source community.
+## What We're Building
+Over four weeks, we're using coding agents to level up the open source AI ecosystem:
+- **Week 1** — Evaluate models and build a distributed leaderboard
+- **Week 2** — Create high-quality datasets for the community
+- **Week 3** — Fine-tune and share models on the Hub
+- **Week 4** — Sprint to the finish line together
+Every contribution earns XP. Top contributors make the leaderboard. Winners get prizes!
+Here's the schedule:
+| Date | Event | Link |
+|------|-------|------|
+| Dec 2 (Mon) | Week 1 Quest Released | [Evaluate a Hub Model](02_evaluate-hub-model.md) |
+| Dec 4 (Wed) | Livestream 1 | TBA |
+| Dec 9 (Mon) | Week 2 Quest Released | [Publish a Hub Dataset](03_publish-hub-dataset.md) |
+| Dec 11 (Wed) | Livestream 2 | TBA |
+| Dec 16 (Mon) | Week 3 Quest Released | [Supervised Fine-Tuning](04_sft-finetune-hub.md) |
+| Dec 18 (Wed) | Livestream 3 | TBA |
+| Dec 23 (Mon) | Week 4 Community Sprint | TBA |
+| Dec 31 (Tue) | Hackathon Ends | TBA
+## Getting Started
+### 1. Join the Organization
+Join [humanitys-last-hackathon](https://huggingface.co/organizations/humanitys-last-hackathon/share/KrqrmBxkETjvevFbfkXeezcyMbgMjjMaOp) on Hugging Face. This is where your contributions will be tracked and updated on the leaderboard.
+### 2. Set Up Your Coding Agent
+Use whatever coding agent you prefer:
+- **Claude Code** — `claude` in your terminal
+- **Codex** — `codex` CLI
+- **Gemini CLI** — `gemini` in your terminal
+- **Cursor / Windsurf** — IDE-based agents
+- **Open source** — aider, continue, etc.
+The skills in this repo work with any agent that can read markdown instructions and run Python scripts. To install the skills, follow the instructions in the [README](../README.md).
+### 3. Get Your HF Token
+Most quests require a Hugging Face token with write access:
+```bash
+# mac/linux
+curl -LsSf https://hf.co/cli/install.sh | bash
+# windows
+powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
+# Login (creates/stores your token)
+hf auth login
+```
+This will set your `HF_TOKEN` environment variable.
+### 4. Clone the Skills Repo
+```bash
+git clone https://github.com/huggingface/skills.git
+cd skills
+```
+Point your coding agent at the relevant configuration. Check the [README](../README.md) for instructions on how to use the skills with your coding agent.
+## Your First Quest
+**Week 1 is live!** Head to [02_evaluate-hub-model.md](02_evaluate-hub-model.md) to start evaluating models and climb the leaderboard.
+## Earning XP
+Each quest has three tiers:
+| Tier | What it takes | XP |
+|------|---------------|-----|
+| 🐢 | Complete the basics | 50-75 XP |
+| 🐕 | Go deeper with more features | 100-125 XP |
+| 🦁 | Ship something impressive | 200-225 XP |
+You can complete multiple tiers, and you can complete the same quest multiple times with different models/datasets/spaces.
+## Getting Help
+- [Discord](https://discord.com/channels/879548962464493619/1442881667986624554) — Join the Hugging Face Discord for real-time help
+- [Livestreams](https://www.youtube.com/@HuggingFace/streams) — Weekly streams with walkthroughs and Q&A
+- [Issues](https://github.com/huggingface/skills/issues) — Open an issue in this repo if you're stuck
+To join the Hackathon, join the organization on the hub and setup your coding agent.
+Ready? Let's ship some AI. 🚀