File size: 4,243 Bytes
5c5677c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# Dataset Preparation Interface for Fine-tuning Whisper

A web-based interface for preparing audio datasets to fine-tune OpenAI's Whisper model. This tool helps in recording, managing, and organizing voice recordings with their corresponding transcriptions, with support for cloud storage and authentication.

## Features

- πŸ” User authentication via Pocketbase
- ☁️ Cloud storage support (Hugging Face Datasets)
- 🌐 Multi-language support with native names
- 🎀 Modern Material Design recording interface
- πŸ“ CSV transcript file support
- 🎯 Session-based recording workflow
- πŸ”„ Advanced recording controls
- ⌨️ Keyboard shortcuts for efficiency
- πŸ“Š Progress tracking and navigation
- πŸ’Ύ Local and cloud metadata management
- 🎨 Responsive, mobile-friendly UI

## Getting Started

1. Create a transcript CSV file with your content:

```csv

transcript

"First sentence to record"

"Second sentence to record"

# For multi-language support:

transcript_en,transcript_es

"English sentence","Spanish sentence"

```

2. Start the Flask application:

```bash

python app.py

```

3. Access the interface:

```

http://localhost:5000

```

## Usage

1. **Authentication**

   - Sign in using your Google account
2. **Session Setup**

   - Upload your transcript CSV
   - Select language and recording location
   - Enter speaker details
   - Click "Start Session"
3. **Recording**

   - Use on-screen controls or keyboard shortcuts:
     - `R`: Start recording / Stop recording
     - `Space`: Play recording
     - `Enter`: Save recording
     - `Backspace`: Re-record
     - `←`: Previous transcript
     - `β†’`: Skip current
   - Navigate using row numbers
   - Adjust transcript font size as needed

## Data Storage

Recordings are stored in language-specific directories:

- **Storage**:

  ```

  datasets/

  β”œβ”€β”€ en/

  β”‚   β”œβ”€β”€ audio/

  β”‚   β”‚   β”œβ”€β”€ {user_prefix}_{YYYYMMDD_HHMMSS}.wav

  β”‚   β”‚   └── ...

  β”‚   └── en.parquet         # English recordings metadata

  β”œβ”€β”€ es/

  β”‚   β”œβ”€β”€ audio/

  β”‚   β”‚   β”œβ”€β”€ {user_prefix}_{YYYYMMDD_HHMMSS}.wav

  β”‚   β”‚   └── ...

  β”‚   └── es.parquet         # Spanish recordings metadata

  └── stats.json             # Global recording statistics

  ```

## Technical Details

### Audio Recording

- Browser Recording Format: 48kHz mono WebM
- Storage Format: 16bit mono WAV
- Maximum Duration: 30 seconds
- Audio Processing: WebM -> WAV conversion with sample rate adjustment
- Channels: 1 (mono)

### Data Management

- Metadata Organization:
  - stats.json: Global recording statistics
  - {language_code}.parquet: Language-specific metadata files

- File Naming: `{user_id_prefix}_{YYYYMMDD_HHMMSS}.{format}`

- Unicode Handling: NFC normalization for text



### Authentication



- Provider: Pocketbase with Google OAuth

- Session Management: Server-side Flask sessions



### Languages



- Support: 74 languages with native names

- Codes: ISO 639-1 standard

- CSV Format:

  - Single language: `transcript` column

  - Multi-language: `transcript_${lang_code}` columns



### Upload Management



- Queue System: Background worker thread

- Status Tracking: Real-time upload status polling

- Error Handling: Automatic retries with timeout

- Progress Updates: Toast notifications

- Temporary Storage: ./temp folder for conversions



### Frontend Features



- Keyboard Shortcuts: Recording and navigation

- Real-time Status: Progress tracking and notifications



### Security



- Authentication Required: All routes except static/login

- File Validation: MIME type and extension checking

- Secure Context: HTTPS recommended



### Performance



- Upload Queue: Asynchronous processing

- Audio Conversion: Server-side processing

- Session Caching: Browser storage optimization

- Progress Tracking: Real-time websocket updates



## Browser Support



- Chrome (recommended)

- Brave

- Edge

- Safari



## Known Limitations



- Requires microphone permissions

- Internet connection needed

- Maximum recording duration: 30 seconds

- File size limits based on storage backend