--- title: SATE emoji: ⚡ colorFrom: purple colorTo: blue sdk: docker pinned: false license: apache-2.0 short_description: Speech Annotatin and Transcription Enhancer --- # SATE: Speech Annotation and Transcription Enhancer (MVP) This is the **Minimum Viable Product (MVP)** version of **SATE**, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application. --- ## Overview - **Main Entry**: `main_socket.py` - **Input**: Entire audio file (`.mp3`, `.wav`, etc.) - **Output**: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables. - **Preprocessing**: - Audio segmentation - Speaker diarization - Transcription using Crisper Whisper - **Annotation**: - Pause - Repetition - Filler Words - Syllable Structure - Mispronunciation Sequence (PLM container is needed) - **Feature Extraction** --- ## Getting Started #### Installation ##### 1. Clone the repo ```bash git clone https://github.com/SwenHou/SATE.git ``` ##### 2. Install packages ```bash conda env create -f environment_sate_0.11.yml ``` ##### 3. Start Inference API in your Local Computer Setup your Huggingface Token: ```bash export HF_TOKEN= ``` Start API: ```bash python main_socket.py ``` #### Usage ##### 1. Get Annotations ```bash curl -X POST http://localhost:7860/process \ -F "audio_file=@" \ -F "device=cuda" \ -F "pause_threshold=0.25" ``` The annotation file is also available in `SATE/session_data/` --- ## 🐳 Use Docker ### 1. Build Docker Image Tn `Dockerfile`: Delete `ENV HF_HOME=/data/.huggingface` and add `ENV HF_TOKEN=` Run the following command in the project root directory: ```bash docker build -t sate_0.11 . ``` ### 2. Run the Docker Container ```bash docker run --gpus all -it --rm \ -p 7860:7860 \ sate_0.11 ``` ### 3. Usage The usage is same as using local API, but the annotation file will be deleted after container exits. ```bash curl -X POST http://localhost:7860/process \ -F "audio_file=@" \ -F "device=cuda" \ -F "pause_threshold=0.25" ``` --- ## 🤗 Use API from Hugging Face Spaces ```bash curl -X POST https://Sven33-SATE.hf.space/process \ -F "audio_file=@" \ -F "device=cuda" \ -F "pause_threshold=0.25" ``` ##### Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE` Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode. For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes.