File size: 2,869 Bytes
9857c7b 75814d9 9d6e044 75814d9 9d6e044 75814d9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
---
title: SATE
emoji: ⚡
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
short_description: Speech Annotatin and Transcription Enhancer
---
# SATE: Speech Annotation and Transcription Enhancer (MVP)
This is the **Minimum Viable Product (MVP)** version of **SATE**, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application.
---
## Overview
- **Main Entry**: `main_socket.py`
- **Input**: Entire audio file (`.mp3`, `.wav`, etc.)
- **Output**: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables.
- **Preprocessing**:
- Audio segmentation
- Speaker diarization
- Transcription using Crisper Whisper
- **Annotation**:
- Pause
- Repetition
- Filler Words
- Syllable Structure
- Mispronunciation Sequence (PLM container is needed)
- **Feature Extraction**
---
## Getting Started
#### Installation
##### 1. Clone the repo
```bash
git clone https://github.com/SwenHou/SATE.git
```
##### 2. Install packages
```bash
conda env create -f environment_sate_0.11.yml
```
##### 3. Start Inference API in your Local Computer
Setup your Huggingface Token:
```bash
export HF_TOKEN=<your_token_here>
```
Start API:
```bash
python main_socket.py
```
#### Usage
##### 1. Get Annotations
```bash
curl -X POST http://localhost:7860/process \
-F "audio_file=@<your local path to audio file>" \
-F "device=cuda" \
-F "pause_threshold=0.25"
```
The annotation file is also available in `SATE/session_data/`
---
## 🐳 Use Docker
### 1. Build Docker Image
Tn `Dockerfile`:
Delete `ENV HF_HOME=/data/.huggingface` and add `ENV HF_TOKEN=<your_token_here>`
Run the following command in the project root directory:
```bash
docker build -t sate_0.11 .
```
### 2. Run the Docker Container
```bash
docker run --gpus all -it --rm \
-p 7860:7860 \
sate_0.11
```
### 3. Usage
The usage is same as using local API, but the annotation file will be deleted after container exits.
```bash
curl -X POST http://localhost:7860/process \
-F "audio_file=@<your local path to audio file>" \
-F "device=cuda" \
-F "pause_threshold=0.25"
```
---
## 🤗 Use API from Hugging Face Spaces
```bash
curl -X POST https://Sven33-SATE.hf.space/process \
-F "audio_file=@<your local path to audio file>" \
-F "device=cuda" \
-F "pause_threshold=0.25"
```
##### Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE`
Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode.
For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes. |