---
title: SATE
emoji: ⚡
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
short_description: Speech Annotatin and Transcription Enhancer
---


# SATE: Speech Annotation and Transcription Enhancer (MVP)

This is the **Minimum Viable Product (MVP)** version of **SATE**, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application.

---

## Overview

- **Main Entry**: `main_socket.py`
- **Input**: Entire audio file (`.mp3`, `.wav`, etc.)
- **Output**: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables.

- **Preprocessing**:
  - Audio segmentation
  - Speaker diarization
  - Transcription using Crisper Whisper

- **Annotation**:
  - Pause
  - Repetition
  - Filler Words
  - Syllable Structure
  - Mispronunciation Sequence (PLM container is needed)

- **Feature Extraction**

---


## Getting Started

#### Installation

##### 1. Clone the repo
```bash
git clone https://github.com/SwenHou/SATE.git
```
##### 2. Install packages
```bash
conda env create -f environment_sate_0.11.yml
```
##### 3. Start Inference API in your Local Computer
Setup your Huggingface Token:
```bash
export HF_TOKEN=<your_token_here>
```
Start API:
```bash
python main_socket.py
```
#### Usage
##### 1. Get Annotations

```bash
curl -X POST http://localhost:7860/process \
  -F "audio_file=@<your local path to audio file>" \
  -F "device=cuda" \
  -F "pause_threshold=0.25"
```
The annotation file is also available in `SATE/session_data/`

---


## 🐳 Use Docker

### 1. Build Docker Image
Tn `Dockerfile`:
Delete `ENV HF_HOME=/data/.huggingface` and add `ENV HF_TOKEN=<your_token_here>` 

Run the following command in the project root directory:

```bash
docker build -t sate_0.11 .
```

### 2. Run the Docker Container
```bash
docker run --gpus all -it --rm \
  -p 7860:7860 \
  sate_0.11
```

### 3. Usage
The usage is same as using local API, but the annotation file will be deleted after container exits.

```bash
curl -X POST http://localhost:7860/process \
  -F "audio_file=@<your local path to audio file>" \
  -F "device=cuda" \
  -F "pause_threshold=0.25"
```


---


## 🤗 Use API from Hugging Face Spaces

```bash
curl -X POST https://Sven33-SATE.hf.space/process \
  -F "audio_file=@<your local path to audio file>" \
  -F "device=cuda" \
  -F "pause_threshold=0.25"
```
##### Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE`

Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode. 

For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes.