tubeMate / LOGGING_GUIDE.md
ShivamPansuriya
Add application file
74708f4
# Comprehensive Logging Guide
The Video Transcription Service now includes detailed step-by-step logging to help you monitor and debug transcription progress.
## 🎯 **What You Can Track**
### Complete Transcription Journey
- βœ… File upload and validation
- βœ… Video processing steps
- βœ… Whisper model loading
- βœ… Audio extraction progress
- βœ… Transcription inference
- βœ… Results and cleanup
- βœ… Error handling and debugging
### Real-time Progress Monitoring
- πŸ“Š Processing times for each step
- πŸ“ File sizes and durations
- 🌐 Language detection
- πŸ“ Text length and previews
- ⚠️ Warnings and errors
## πŸš€ **Quick Start**
### Basic Logging (Default)
```bash
python main.py
```
### Debug Mode (Detailed Logs)
```bash
DEBUG=true python main.py
```
### Log to File
```bash
LOG_TO_FILE=true python main.py
```
### Combined (Debug + File)
```bash
DEBUG=true LOG_TO_FILE=true python main.py
```
## πŸ“Š **Real-time Monitoring**
### Monitor Service Health
```bash
python log_monitor.py test
```
### Upload and Monitor Video
```bash
python log_monitor.py upload video.mp4
```
### Monitor Existing Transcription
```bash
python log_monitor.py monitor 123
```
## πŸ“‹ **Sample Log Output**
### Service Startup
```
2024-01-15 10:30:00 - main - INFO - πŸš€ Starting Video Transcription Service
2024-01-15 10:30:00 - main - INFO - ==================================================
2024-01-15 10:30:00 - main - INFO - πŸ“‹ Service Configuration:
2024-01-15 10:30:00 - main - INFO - πŸ€– Whisper Model: base
2024-01-15 10:30:00 - main - INFO - πŸ“ Max File Size: 100MB
2024-01-15 10:30:00 - main - INFO - πŸ•’ Cleanup Interval: 3.5 hours
2024-01-15 10:30:00 - main - INFO - 🚦 Rate Limit: 10 requests/minute
2024-01-15 10:30:00 - main - INFO - 🌐 Host: 0.0.0.0:8000
2024-01-15 10:30:00 - main - INFO - πŸ“ Supported Formats: .mp4, .avi, .mov, .mkv, .wmv, .flv, .webm, .m4v
2024-01-15 10:30:00 - main - INFO - ==================================================
```
### File Upload Process
```
2024-01-15 10:30:15 - main - INFO - πŸš€ Starting transcription request for file: video.mp4
2024-01-15 10:30:15 - main - INFO - 🌐 Language specified: auto-detect
2024-01-15 10:30:15 - main - INFO - πŸ“ Validating file: video.mp4
2024-01-15 10:30:15 - main - INFO - πŸ” File extension: .mp4
2024-01-15 10:30:15 - main - INFO - βœ… File format validation passed: .mp4
2024-01-15 10:30:15 - main - INFO - πŸ“Š Reading file content for size validation...
2024-01-15 10:30:15 - main - INFO - πŸ“ File size: 25.34MB (max: 100MB)
2024-01-15 10:30:15 - main - INFO - βœ… File size validation passed: 25.34MB
```
### Storage Operations
```
2024-01-15 10:30:15 - storage - INFO - πŸ“ Creating new transcription entry with ID: 1
2024-01-15 10:30:15 - storage - INFO - 🌐 Language: auto-detect
2024-01-15 10:30:15 - storage - INFO - βœ… Transcription 1 created successfully
2024-01-15 10:30:15 - storage - INFO - πŸ“Š Total active transcriptions: 1
```
### Video Processing
```
2024-01-15 10:30:15 - transcription_service - INFO - 🎬 Starting video transcription for ID: 1
2024-01-15 10:30:15 - transcription_service - INFO - πŸ“Š Video size: 25.34MB
2024-01-15 10:30:15 - transcription_service - INFO - 🌐 Language: auto-detect
2024-01-15 10:30:15 - transcription_service - INFO - πŸ“ Updating status to PROCESSING for ID: 1
```
### Model Loading (First Time)
```
2024-01-15 10:30:15 - transcription_service - INFO - πŸ€– Loading Whisper model: base
2024-01-15 10:30:15 - transcription_service - INFO - πŸ“₯ This may take 30-60 seconds for first-time download...
2024-01-15 10:30:45 - transcription_service - INFO - βœ… Whisper model loaded successfully in 30.2 seconds
```
### Audio Extraction
```
2024-01-15 10:30:45 - transcription_service - INFO - 🎡 Extracting audio from video for transcription 1
2024-01-15 10:30:45 - transcription_service - INFO - πŸ“ Creating temporary video file...
2024-01-15 10:30:45 - transcription_service - INFO - πŸ“ Temporary files created - Video: /tmp/xyz.tmp, Audio: /tmp/abc.wav
2024-01-15 10:30:45 - transcription_service - INFO - 🎡 Running FFmpeg to extract audio...
2024-01-15 10:30:45 - transcription_service - INFO - πŸ”§ Configuring FFmpeg for audio extraction...
2024-01-15 10:30:45 - transcription_service - INFO - - Codec: PCM 16-bit
2024-01-15 10:30:45 - transcription_service - INFO - - Channels: 1 (mono)
2024-01-15 10:30:45 - transcription_service - INFO - - Sample rate: 16kHz
2024-01-15 10:30:48 - transcription_service - INFO - βœ… FFmpeg audio extraction completed
2024-01-15 10:30:48 - transcription_service - INFO - βœ… Audio extraction successful - Size: 8.45MB
2024-01-15 10:30:48 - transcription_service - INFO - βœ… Audio extraction completed in 3.1 seconds
```
### Transcription Process
```
2024-01-15 10:30:48 - transcription_service - INFO - πŸ—£οΈ Starting audio transcription for ID 1
2024-01-15 10:30:48 - transcription_service - INFO - πŸ—£οΈ Starting Whisper transcription...
2024-01-15 10:30:48 - transcription_service - INFO - 🎡 Audio file: /tmp/abc.wav
2024-01-15 10:30:48 - transcription_service - INFO - 🌐 Language: auto-detect
2024-01-15 10:30:48 - transcription_service - INFO - ⚑ Running transcription in background thread...
2024-01-15 10:30:48 - transcription_service - INFO - πŸ€– Preparing Whisper transcription options...
2024-01-15 10:30:48 - transcription_service - INFO - 🌐 Language: auto-detect
2024-01-15 10:30:48 - transcription_service - INFO - 🎯 Starting Whisper model inference...
2024-01-15 10:31:15 - transcription_service - INFO - βœ… Whisper inference completed in 27.3 seconds
2024-01-15 10:31:15 - transcription_service - INFO - πŸ“ Text length: 1247 characters
2024-01-15 10:31:15 - transcription_service - INFO - 🌐 Detected language: en
2024-01-15 10:31:15 - transcription_service - INFO - ⏱️ Audio duration: 180.50 seconds
2024-01-15 10:31:15 - transcription_service - INFO - πŸ“„ Text preview: Hello, welcome to this video tutorial where we'll be discussing...
```
### Completion
```
2024-01-15 10:31:15 - transcription_service - INFO - βœ… Transcription completed in 27.3 seconds
2024-01-15 10:31:15 - transcription_service - INFO - πŸ’Ύ Saving transcription results for ID 1
2024-01-15 10:31:15 - storage - INFO - πŸ“ Updated transcription 1
2024-01-15 10:31:15 - storage - INFO - πŸ”„ Status changed: processing β†’ completed
2024-01-15 10:31:15 - storage - INFO - πŸ“„ Text updated: Hello, welcome to this video tutorial where we'll...
2024-01-15 10:31:15 - transcription_service - INFO - 🧹 Cleaning up temporary audio file
2024-01-15 10:31:15 - transcription_service - INFO - πŸŽ‰ Transcription 1 completed successfully in 60.2 seconds total
```
## πŸ”§ **Log Levels**
### INFO (Default)
- Service startup/shutdown
- Request processing
- Status updates
- Completion messages
### DEBUG (Detailed)
- File validation details
- Temporary file paths
- FFmpeg configuration
- Model loading progress
- Memory usage info
### WARNING
- Large file warnings
- Performance issues
- Non-critical errors
### ERROR
- Processing failures
- File format issues
- System errors
- Transcription failures
## πŸ“ **Log Files**
When `LOG_TO_FILE=true`, logs are saved to:
```
transcription_service_YYYYMMDD_HHMMSS.log
```
Example: `transcription_service_20240115_103000.log`
## πŸ› οΈ **Troubleshooting with Logs**
### Common Issues and Log Patterns
**1. NumPy Compatibility Error**
```
ERROR - A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6
```
**Solution:** Run `python fix_numpy.py`
**2. FFmpeg Not Found**
```
ERROR - FFmpeg audio extraction failed: [Errno 2] No such file or directory: 'ffmpeg'
```
**Solution:** Install FFmpeg for your OS
**3. File Too Large**
```
ERROR - File too large: 150.5MB > 100MB
```
**Solution:** Compress video or increase limit in config.py
**4. Model Loading Issues**
```
ERROR - Failed to load Whisper model: [Errno 28] No space left on device
```
**Solution:** Free up disk space or use smaller model
**5. Memory Issues**
```
ERROR - Process killed (signal 9)
```
**Solution:** Use smaller files or increase available memory
## 🎯 **Performance Monitoring**
### Key Metrics to Watch
- **Model Loading Time**: Should be 15-60 seconds (first time only)
- **Audio Extraction**: Usually 1-5 seconds per minute of video
- **Transcription Speed**: Varies by model and content (typically 0.1-0.5x real-time)
- **Memory Usage**: Monitor for large files
- **Active Transcriptions**: Track concurrent processing
### Optimization Tips
- Use `tiny` model for faster processing
- Compress videos before upload
- Monitor memory usage with large files
- Use DEBUG mode to identify bottlenecks
## πŸ“Š **Integration Examples**
### Parse Logs Programmatically
```python
import re
from datetime import datetime
def parse_transcription_logs(log_file):
with open(log_file, 'r') as f:
for line in f:
if 'Transcription' in line and 'completed successfully' in line:
# Extract transcription ID and time
match = re.search(r'Transcription (\d+) completed.*in ([\d.]+) seconds', line)
if match:
tid, duration = match.groups()
print(f"ID {tid}: {duration}s")
```
### Monitor API Programmatically
```python
import requests
import time
def monitor_service():
while True:
try:
response = requests.get('http://localhost:8000/health')
health = response.json()
print(f"Active: {health.get('active_transcriptions', 0)}")
time.sleep(30)
except Exception as e:
print(f"Service down: {e}")
time.sleep(60)
```
---
**With comprehensive logging, you now have complete visibility into your transcription service! πŸŽ‰**