File size: 9,858 Bytes
74708f4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 |
# Comprehensive Logging Guide
The Video Transcription Service now includes detailed step-by-step logging to help you monitor and debug transcription progress.
## π― **What You Can Track**
### Complete Transcription Journey
- β
File upload and validation
- β
Video processing steps
- β
Whisper model loading
- β
Audio extraction progress
- β
Transcription inference
- β
Results and cleanup
- β
Error handling and debugging
### Real-time Progress Monitoring
- π Processing times for each step
- π File sizes and durations
- π Language detection
- π Text length and previews
- β οΈ Warnings and errors
## π **Quick Start**
### Basic Logging (Default)
```bash
python main.py
```
### Debug Mode (Detailed Logs)
```bash
DEBUG=true python main.py
```
### Log to File
```bash
LOG_TO_FILE=true python main.py
```
### Combined (Debug + File)
```bash
DEBUG=true LOG_TO_FILE=true python main.py
```
## π **Real-time Monitoring**
### Monitor Service Health
```bash
python log_monitor.py test
```
### Upload and Monitor Video
```bash
python log_monitor.py upload video.mp4
```
### Monitor Existing Transcription
```bash
python log_monitor.py monitor 123
```
## π **Sample Log Output**
### Service Startup
```
2024-01-15 10:30:00 - main - INFO - π Starting Video Transcription Service
2024-01-15 10:30:00 - main - INFO - ==================================================
2024-01-15 10:30:00 - main - INFO - π Service Configuration:
2024-01-15 10:30:00 - main - INFO - π€ Whisper Model: base
2024-01-15 10:30:00 - main - INFO - π Max File Size: 100MB
2024-01-15 10:30:00 - main - INFO - π Cleanup Interval: 3.5 hours
2024-01-15 10:30:00 - main - INFO - π¦ Rate Limit: 10 requests/minute
2024-01-15 10:30:00 - main - INFO - π Host: 0.0.0.0:8000
2024-01-15 10:30:00 - main - INFO - π Supported Formats: .mp4, .avi, .mov, .mkv, .wmv, .flv, .webm, .m4v
2024-01-15 10:30:00 - main - INFO - ==================================================
```
### File Upload Process
```
2024-01-15 10:30:15 - main - INFO - π Starting transcription request for file: video.mp4
2024-01-15 10:30:15 - main - INFO - π Language specified: auto-detect
2024-01-15 10:30:15 - main - INFO - π Validating file: video.mp4
2024-01-15 10:30:15 - main - INFO - π File extension: .mp4
2024-01-15 10:30:15 - main - INFO - β
File format validation passed: .mp4
2024-01-15 10:30:15 - main - INFO - π Reading file content for size validation...
2024-01-15 10:30:15 - main - INFO - π File size: 25.34MB (max: 100MB)
2024-01-15 10:30:15 - main - INFO - β
File size validation passed: 25.34MB
```
### Storage Operations
```
2024-01-15 10:30:15 - storage - INFO - π Creating new transcription entry with ID: 1
2024-01-15 10:30:15 - storage - INFO - π Language: auto-detect
2024-01-15 10:30:15 - storage - INFO - β
Transcription 1 created successfully
2024-01-15 10:30:15 - storage - INFO - π Total active transcriptions: 1
```
### Video Processing
```
2024-01-15 10:30:15 - transcription_service - INFO - π¬ Starting video transcription for ID: 1
2024-01-15 10:30:15 - transcription_service - INFO - π Video size: 25.34MB
2024-01-15 10:30:15 - transcription_service - INFO - π Language: auto-detect
2024-01-15 10:30:15 - transcription_service - INFO - π Updating status to PROCESSING for ID: 1
```
### Model Loading (First Time)
```
2024-01-15 10:30:15 - transcription_service - INFO - π€ Loading Whisper model: base
2024-01-15 10:30:15 - transcription_service - INFO - π₯ This may take 30-60 seconds for first-time download...
2024-01-15 10:30:45 - transcription_service - INFO - β
Whisper model loaded successfully in 30.2 seconds
```
### Audio Extraction
```
2024-01-15 10:30:45 - transcription_service - INFO - π΅ Extracting audio from video for transcription 1
2024-01-15 10:30:45 - transcription_service - INFO - π Creating temporary video file...
2024-01-15 10:30:45 - transcription_service - INFO - π Temporary files created - Video: /tmp/xyz.tmp, Audio: /tmp/abc.wav
2024-01-15 10:30:45 - transcription_service - INFO - π΅ Running FFmpeg to extract audio...
2024-01-15 10:30:45 - transcription_service - INFO - π§ Configuring FFmpeg for audio extraction...
2024-01-15 10:30:45 - transcription_service - INFO - - Codec: PCM 16-bit
2024-01-15 10:30:45 - transcription_service - INFO - - Channels: 1 (mono)
2024-01-15 10:30:45 - transcription_service - INFO - - Sample rate: 16kHz
2024-01-15 10:30:48 - transcription_service - INFO - β
FFmpeg audio extraction completed
2024-01-15 10:30:48 - transcription_service - INFO - β
Audio extraction successful - Size: 8.45MB
2024-01-15 10:30:48 - transcription_service - INFO - β
Audio extraction completed in 3.1 seconds
```
### Transcription Process
```
2024-01-15 10:30:48 - transcription_service - INFO - π£οΈ Starting audio transcription for ID 1
2024-01-15 10:30:48 - transcription_service - INFO - π£οΈ Starting Whisper transcription...
2024-01-15 10:30:48 - transcription_service - INFO - π΅ Audio file: /tmp/abc.wav
2024-01-15 10:30:48 - transcription_service - INFO - π Language: auto-detect
2024-01-15 10:30:48 - transcription_service - INFO - β‘ Running transcription in background thread...
2024-01-15 10:30:48 - transcription_service - INFO - π€ Preparing Whisper transcription options...
2024-01-15 10:30:48 - transcription_service - INFO - π Language: auto-detect
2024-01-15 10:30:48 - transcription_service - INFO - π― Starting Whisper model inference...
2024-01-15 10:31:15 - transcription_service - INFO - β
Whisper inference completed in 27.3 seconds
2024-01-15 10:31:15 - transcription_service - INFO - π Text length: 1247 characters
2024-01-15 10:31:15 - transcription_service - INFO - π Detected language: en
2024-01-15 10:31:15 - transcription_service - INFO - β±οΈ Audio duration: 180.50 seconds
2024-01-15 10:31:15 - transcription_service - INFO - π Text preview: Hello, welcome to this video tutorial where we'll be discussing...
```
### Completion
```
2024-01-15 10:31:15 - transcription_service - INFO - β
Transcription completed in 27.3 seconds
2024-01-15 10:31:15 - transcription_service - INFO - πΎ Saving transcription results for ID 1
2024-01-15 10:31:15 - storage - INFO - π Updated transcription 1
2024-01-15 10:31:15 - storage - INFO - π Status changed: processing β completed
2024-01-15 10:31:15 - storage - INFO - π Text updated: Hello, welcome to this video tutorial where we'll...
2024-01-15 10:31:15 - transcription_service - INFO - π§Ή Cleaning up temporary audio file
2024-01-15 10:31:15 - transcription_service - INFO - π Transcription 1 completed successfully in 60.2 seconds total
```
## π§ **Log Levels**
### INFO (Default)
- Service startup/shutdown
- Request processing
- Status updates
- Completion messages
### DEBUG (Detailed)
- File validation details
- Temporary file paths
- FFmpeg configuration
- Model loading progress
- Memory usage info
### WARNING
- Large file warnings
- Performance issues
- Non-critical errors
### ERROR
- Processing failures
- File format issues
- System errors
- Transcription failures
## π **Log Files**
When `LOG_TO_FILE=true`, logs are saved to:
```
transcription_service_YYYYMMDD_HHMMSS.log
```
Example: `transcription_service_20240115_103000.log`
## π οΈ **Troubleshooting with Logs**
### Common Issues and Log Patterns
**1. NumPy Compatibility Error**
```
ERROR - A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6
```
**Solution:** Run `python fix_numpy.py`
**2. FFmpeg Not Found**
```
ERROR - FFmpeg audio extraction failed: [Errno 2] No such file or directory: 'ffmpeg'
```
**Solution:** Install FFmpeg for your OS
**3. File Too Large**
```
ERROR - File too large: 150.5MB > 100MB
```
**Solution:** Compress video or increase limit in config.py
**4. Model Loading Issues**
```
ERROR - Failed to load Whisper model: [Errno 28] No space left on device
```
**Solution:** Free up disk space or use smaller model
**5. Memory Issues**
```
ERROR - Process killed (signal 9)
```
**Solution:** Use smaller files or increase available memory
## π― **Performance Monitoring**
### Key Metrics to Watch
- **Model Loading Time**: Should be 15-60 seconds (first time only)
- **Audio Extraction**: Usually 1-5 seconds per minute of video
- **Transcription Speed**: Varies by model and content (typically 0.1-0.5x real-time)
- **Memory Usage**: Monitor for large files
- **Active Transcriptions**: Track concurrent processing
### Optimization Tips
- Use `tiny` model for faster processing
- Compress videos before upload
- Monitor memory usage with large files
- Use DEBUG mode to identify bottlenecks
## π **Integration Examples**
### Parse Logs Programmatically
```python
import re
from datetime import datetime
def parse_transcription_logs(log_file):
with open(log_file, 'r') as f:
for line in f:
if 'Transcription' in line and 'completed successfully' in line:
# Extract transcription ID and time
match = re.search(r'Transcription (\d+) completed.*in ([\d.]+) seconds', line)
if match:
tid, duration = match.groups()
print(f"ID {tid}: {duration}s")
```
### Monitor API Programmatically
```python
import requests
import time
def monitor_service():
while True:
try:
response = requests.get('http://localhost:8000/health')
health = response.json()
print(f"Active: {health.get('active_transcriptions', 0)}")
time.sleep(30)
except Exception as e:
print(f"Service down: {e}")
time.sleep(60)
```
---
**With comprehensive logging, you now have complete visibility into your transcription service! π**
|