Spaces:
Running
A newer version of the Gradio SDK is available:
5.45.0
Development
Design Decisions
We specifically opt for a single-space leaderboard for simplicity. We solve the issue of keeping the gradio UI interactive while models are evaluating by using multiprocessing instead of a separate space. Leaderboard entries are persisted in a Huggingface Dataset to avoid paying for persistent storage. Tasks are deliberately ephemeral.
Local Setup
Prerequisites
- Python 3.10
- Git
- A love for speech recognition! π€
Quick Installation
- Make sure git-lfs is installed (https://git-lfs.com)
git lfs install
- Clone this repository:
git clone https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN
- Setup your environment:
# Create a virtual environment with Python 3.10
python3.10 -m venv venv
# Activate the virtual environment
. ./venv/bin/activate
# use `deactivate` to exit out of it
# Install the required dependencies
pip install -r requirements_lock.txt
# Add a HF_TOKEN with access to your backing dataset (in app/hf.py) and any models you want to be able to run
huggingface-cli login
- Launch the leaderboard:
. ./scripts/run-dev.sh # development mode (auto-reloads)
. ./scripts/run-prod.sh # production mode (no auto-reloads)
- Visit
http://localhost:7860
in your browser and see the magic! β¨
Adding New Datasets
The datasets are pre-processed into a single dataset stored in app/data/test
with three columns: audio (16 kHz), ipa, and dataset (original source). This is done using the scripts/sample_test_set.py
file. To add new datasets, add them to this script. Beware that existing leaderboard entries will need to be recalculated. You can do this locally by accessing the dataset corresponding to LEADERBOARD_ID
stored in app/hf.py
.
Adding/Removing Dependencies
- Activate the virtual environment with
. ./venv/bin/activate
- Add the dependency to
requirements.txt
(or remove it) - Make sure you have no unused dependencies with
pipx run deptry .
(if necessarypython -m pip install pipx
) - Run
pip install -r requirements.txt
- Freeze the dependencies with
pip freeze > requirements_lock.txt
Forking Into Your Own Leaderboard
- Navigate to the space, click the three dots on the right and select
Duplicate this Space
- Modify the
LEADERBOARD_ID
inapp/hf.py
to be some dataset that you own that the new space can use to store data. You don't need to create the dataset but if you do, it should be empty. - Open the settings in your new space and add a new secret
HF_TOKEN
. You can create it here. It just needs read access to all models you want to add to the leaderboard and write access to the private backing dataset specified byLEADERBOARD_ID
. - Submit some models and enjoy!
File Structure
The two most imporant files are app/app.py
for the main gradio UI and app/tasks.py
for the background tasks that evaluate models.
IPA-Transcription-EN/
βββ README.md # General information about the leaderboard
βββ CONTRIBUTING.md # Contribution guidelines
βββ DEVELOPMENT.md # Development setup and design decisions
βββ requirements.txt # Python dependencies
βββ requirements_lock.txt # Locked dependencies
βββ scripts # Helper scripts
β βββ sample_test_set.py # Compute the combined test set
β βββ run-prod.sh # Run the leaderboard in production mode
β βββ run-dev.sh # Run the leaderboard in development mode
βββ venv # Virtual environment
βββ app/ # All application code lives here
β βββ data/ # Phoneme transcription test set
β βββ app.py # Main Gradio UI
β βββ hf.py # Interface with the Huggingface API
β βββ inference.py # Model inference
β βββ metrics.py # Evaluation metrics
β βββ tasks.py # Background tasks for model evaluation
βββ img/ # Images for README and other documentation