File size: 4,243 Bytes
5c5677c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
# Dataset Preparation Interface for Fine-tuning Whisper
A web-based interface for preparing audio datasets to fine-tune OpenAI's Whisper model. This tool helps in recording, managing, and organizing voice recordings with their corresponding transcriptions, with support for cloud storage and authentication.
## Features
- π User authentication via Pocketbase
- βοΈ Cloud storage support (Hugging Face Datasets)
- π Multi-language support with native names
- π€ Modern Material Design recording interface
- π CSV transcript file support
- π― Session-based recording workflow
- π Advanced recording controls
- β¨οΈ Keyboard shortcuts for efficiency
- π Progress tracking and navigation
- πΎ Local and cloud metadata management
- π¨ Responsive, mobile-friendly UI
## Getting Started
1. Create a transcript CSV file with your content:
```csv
transcript
"First sentence to record"
"Second sentence to record"
# For multi-language support:
transcript_en,transcript_es
"English sentence","Spanish sentence"
```
2. Start the Flask application:
```bash
python app.py
```
3. Access the interface:
```
http://localhost:5000
```
## Usage
1. **Authentication**
- Sign in using your Google account
2. **Session Setup**
- Upload your transcript CSV
- Select language and recording location
- Enter speaker details
- Click "Start Session"
3. **Recording**
- Use on-screen controls or keyboard shortcuts:
- `R`: Start recording / Stop recording
- `Space`: Play recording
- `Enter`: Save recording
- `Backspace`: Re-record
- `β`: Previous transcript
- `β`: Skip current
- Navigate using row numbers
- Adjust transcript font size as needed
## Data Storage
Recordings are stored in language-specific directories:
- **Storage**:
```
datasets/
βββ en/
β βββ audio/
β β βββ {user_prefix}_{YYYYMMDD_HHMMSS}.wav
β β βββ ...
β βββ en.parquet # English recordings metadata
βββ es/
β βββ audio/
β β βββ {user_prefix}_{YYYYMMDD_HHMMSS}.wav
β β βββ ...
β βββ es.parquet # Spanish recordings metadata
βββ stats.json # Global recording statistics
```
## Technical Details
### Audio Recording
- Browser Recording Format: 48kHz mono WebM
- Storage Format: 16bit mono WAV
- Maximum Duration: 30 seconds
- Audio Processing: WebM -> WAV conversion with sample rate adjustment
- Channels: 1 (mono)
### Data Management
- Metadata Organization:
- stats.json: Global recording statistics
- {language_code}.parquet: Language-specific metadata files
- File Naming: `{user_id_prefix}_{YYYYMMDD_HHMMSS}.{format}`
- Unicode Handling: NFC normalization for text
### Authentication
- Provider: Pocketbase with Google OAuth
- Session Management: Server-side Flask sessions
### Languages
- Support: 74 languages with native names
- Codes: ISO 639-1 standard
- CSV Format:
- Single language: `transcript` column
- Multi-language: `transcript_${lang_code}` columns
### Upload Management
- Queue System: Background worker thread
- Status Tracking: Real-time upload status polling
- Error Handling: Automatic retries with timeout
- Progress Updates: Toast notifications
- Temporary Storage: ./temp folder for conversions
### Frontend Features
- Keyboard Shortcuts: Recording and navigation
- Real-time Status: Progress tracking and notifications
### Security
- Authentication Required: All routes except static/login
- File Validation: MIME type and extension checking
- Secure Context: HTTPS recommended
### Performance
- Upload Queue: Asynchronous processing
- Audio Conversion: Server-side processing
- Session Caching: Browser storage optimization
- Progress Tracking: Real-time websocket updates
## Browser Support
- Chrome (recommended)
- Brave
- Edge
- Safari
## Known Limitations
- Requires microphone permissions
- Internet connection needed
- Maximum recording duration: 30 seconds
- File size limits based on storage backend
|