Spaces:

shivam98
/

tubeMate

Sleeping

App Files Files Community

ShivamPansuriya commited on Jun 16

Commit

74708f4

1 Parent(s): c927402

Add application file

Browse files

Files changed (37) hide show

DEPLOYMENT.md +277 -0
Dockerfile +41 -0
HF_DEPLOYMENT_SUMMARY.md +221 -0
HF_MIGRATION_GUIDE.md +301 -0
LOGGING_GUIDE.md +277 -0
QUICKSTART.md +168 -0
README.md +365 -10
README_HF.md +154 -0
RESTART_TROUBLESHOOTING.md +295 -0
app.py +343 -0
config.py +34 -0
deploy_to_hf.py +190 -0
example_client.py +166 -0
fix_numpy.py +130 -0
hf_api_client.py +255 -0
hf_spaces_deploy/.gitignore +14 -0
hf_spaces_deploy/README.md +154 -0
hf_spaces_deploy/app.py +343 -0
hf_spaces_deploy/config.py +34 -0
hf_spaces_deploy/logging_config.py +136 -0
hf_spaces_deploy/models.py +34 -0
hf_spaces_deploy/requirements.txt +14 -0
hf_spaces_deploy/restart_handler.py +165 -0
hf_spaces_deploy/storage.py +158 -0
hf_spaces_deploy/transcription_service.py +304 -0
log_monitor.py +195 -0
logging_config.py +136 -0
main.py +295 -0
models.py +34 -0
requirements.txt +14 -0
restart_handler.py +165 -0
setup.py +148 -0
start.py +113 -0
start_robust.py +155 -0
storage.py +158 -0
test_api.py +130 -0
transcription_service.py +304 -0

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,277 @@

+# Deployment Guide
+This guide covers deploying the Video Transcription Service to Render.com's free tier.
+## Prerequisites
+1. **GitHub Account**: Your code needs to be in a GitHub repository
+2. **Render Account**: Sign up at [render.com](https://render.com) (free)
+3. **Git**: Installed on your local machine
+## Step-by-Step Deployment
+### 1. Prepare Your Repository
+```bash
+# Initialize git repository (if not already done)
+git init
+# Add all files
+git add .
+# Commit changes
+git commit -m "Initial commit - Video Transcription Service"
+# Add your GitHub repository as remote
+git remote add origin https://github.com/yourusername/your-repo-name.git
+# Push to GitHub
+git push -u origin main
+```
+### 2. Deploy to Render
+1. **Go to Render Dashboard**
+   - Visit [dashboard.render.com](https://dashboard.render.com)
+   - Sign in with your GitHub account
+2. **Create New Web Service**
+   - Click "New +" button
+   - Select "Web Service"
+   - Choose "Build and deploy from a Git repository"
+3. **Connect Repository**
+   - Select your GitHub repository
+   - Click "Connect"
+4. **Configure Service**
+   - **Name**: `video-transcription-service` (or your preferred name)
+   - **Environment**: `Docker`
+   - **Region**: Choose closest to your users
+   - **Branch**: `main`
+   - **Dockerfile Path**: `./Dockerfile`
+5. **Advanced Settings**
+   - **Plan**: Free (automatically selected)
+   - **Environment Variables**: None needed (auto-configured)
+   - **Health Check Path**: `/health`
+   - **Auto-Deploy**: Yes (recommended)
+6. **Deploy**
+   - Click "Create Web Service"
+   - Render will start building your service
+### 3. Monitor Deployment
+1. **Build Process**
+   - Watch the build logs in real-time
+   - First build takes 5-10 minutes (installing dependencies)
+   - Look for "Build successful" message
+2. **Deployment Status**
+   - Service will show "Live" when ready
+   - Initial startup may take 30-60 seconds (loading AI model)
+3. **Test Your Service**
+   - Your service URL: `https://your-service-name.onrender.com`
+   - API docs: `https://your-service-name.onrender.com/docs`
+   - Health check: `https://your-service-name.onrender.com/health`
+## Configuration Details
+### Automatic Configuration
+The service is pre-configured for Render's free tier:
+- **Port**: Automatically uses `$PORT` environment variable
+- **Memory**: Optimized for 512MB limit
+- **CPU**: Efficient processing for shared CPU
+- **Storage**: No persistent storage (in-memory only)
+- **Health Checks**: Configured at `/health` endpoint
+### Free Tier Limitations
+**Resource Limits:**
+- 512MB RAM
+- Shared CPU
+- 750 hours/month (service sleeps after 15min inactivity)
+- No persistent storage
+**Service Behavior:**
+- **Cold Starts**: 30-60 seconds after sleep
+- **File Size**: 100MB maximum per video
+- **Processing**: Sequential (one video at a time)
+- **Storage**: 3.5 hours maximum per transcription
+## Troubleshooting
+### Common Build Issues
+1. **Out of Memory During Build**
+   ```
+   Error: Process killed (out of memory)
+   ```
+   - This is rare but can happen with large dependencies
+   - Try pushing smaller commits
+   - Contact Render support if persistent
+2. **FFmpeg Installation Failed**
+   ```
+   E: Unable to locate package ffmpeg
+   ```
+   - Check Dockerfile has correct apt-get commands
+   - Ensure base image is correct (python:3.11-slim)
+3. **Python Package Installation Failed**
+   ```
+   ERROR: Could not install packages
+   ```
+   - Check requirements.txt syntax
+   - Ensure all package names are correct
+   - Try removing version pins if needed
+### Runtime Issues
+1. **Service Won't Start**
+   - Check runtime logs in Render dashboard
+   - Look for Python import errors
+   - Verify all dependencies are installed
+2. **Health Check Failing**
+   ```
+   Health check failed
+   ```
+   - Service might be taking too long to start
+   - Check if Whisper model is loading correctly
+   - Verify `/health` endpoint is accessible
+3. **Out of Memory at Runtime**
+   ```
+   Process killed (signal 9)
+   ```
+   - Large video files can cause this
+   - Reduce MAX_FILE_SIZE in config.py
+   - Use smaller Whisper model (tiny instead of base)
+4. **Slow Processing**
+   - First request loads AI model (30-60 seconds)
+   - Subsequent requests are faster
+   - Consider using smaller model for speed
+### Service Sleeping
+**Free Tier Behavior:**
+- Service sleeps after 15 minutes of inactivity
+- First request after sleep takes 30-60 seconds
+- This is normal for free tier
+**Solutions:**
+- Upgrade to paid plan for always-on service
+- Use external monitoring to keep service awake
+- Inform users about potential cold start delays
+## Monitoring and Maintenance
+### Logs
+Access logs in Render dashboard:
+1. Go to your service
+2. Click "Logs" tab
+3. Monitor for errors and performance
+### Metrics
+Monitor service health:
+- Response times
+- Error rates
+- Memory usage
+- Active transcriptions
+### Updates
+Deploy updates automatically:
+1. Push changes to GitHub
+2. Render auto-deploys from main branch
+3. Monitor deployment in dashboard
+## Scaling Considerations
+### Free Tier Optimization
+**Current Setup:**
+- Single instance
+- 512MB RAM
+- Shared CPU
+- In-memory storage
+**Optimization Tips:**
+- Use smaller Whisper model for speed
+- Implement request queuing
+- Add request size validation
+- Monitor memory usage
+### Upgrade Path
+**Paid Plans Offer:**
+- More RAM (1GB+)
+- Dedicated CPU
+- Always-on service
+- Multiple instances
+- Persistent storage options
+## Security
+### Current Security Features
+- Rate limiting (10 requests/minute)
+- File size validation
+- File type validation
+- No persistent file storage
+- Automatic cleanup
+### Additional Security (Optional)
+- API key authentication
+- HTTPS only (automatic on Render)
+- Request logging
+- IP whitelisting
+- CORS configuration
+## Support
+### Getting Help
+1. **Render Support**
+   - Free tier includes community support
+   - Check Render documentation
+   - Use Render community forum
+2. **Service Issues**
+   - Check service logs first
+   - Verify configuration
+   - Test with smaller files
+3. **API Issues**
+   - Use `/docs` endpoint for testing
+   - Check request format
+   - Verify file types and sizes
+### Useful Commands
+```bash
+# Test your deployed service
+curl https://your-service.onrender.com/health
+# Upload test video
+curl -X POST "https://your-service.onrender.com/transcribe" \
+  -F "[email protected]"
+# Check transcription status
+curl "https://your-service.onrender.com/transcribe/1"
+```
+---
+**Your service is now live and ready to transcribe videos! 🎉**
+Share your service URL with users or integrate it into your applications.

Dockerfile ADDED Viewed

	@@ -0,0 +1,41 @@

+# Use Python 3.11 slim image for better performance
+FROM python:3.11-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    ffmpeg \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better caching
+COPY requirements.txt .
+# Install Python dependencies with NumPy compatibility fix
+RUN pip install --no-cache-dir "numpy<2.0.0" && \
+    pip install --no-cache-dir -r requirements.txt
+# Set environment variables for optimal performance
+ENV WHISPER_MODEL=tiny
+ENV MODEL_PRELOAD=true
+ENV DEBUG=false
+ENV PYTHONUNBUFFERED=1
+# Copy application code
+COPY . .
+# Create non-root user for security
+RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
+USER appuser
+# Expose port
+EXPOSE 8000
+# Health check
+HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
+    CMD python -c "import requests; requests.get('http://localhost:8000/health')"
+# Run the application with robust startup
+CMD ["python", "start_robust.py"]

HF_DEPLOYMENT_SUMMARY.md ADDED Viewed

	@@ -0,0 +1,221 @@

+# 🎉 Hugging Face Spaces Deployment - Complete Solution
+Your Video Transcription Service is now ready for deployment to Hugging Face Spaces with **full API compatibility** and enhanced features!
+## ✅ **What You Get**
+### **🌐 Dual Interface**
+- **Beautiful Gradio Web UI** - User-friendly interface for manual uploads
+- **Full REST API** - Programmatic access identical to your current FastAPI service
+- **Simultaneous Access** - Both interfaces work at the same time
+### **🚀 Enhanced Features**
+- **Higher Resource Limits** - 16GB RAM vs 512MB on Render
+- **Better Performance** - Dedicated CPU cores
+- **Larger File Support** - Up to 200MB videos
+- **GPU Option Available** - For heavy workloads
+- **Community Integration** - Easy sharing and discovery
+### **🔧 Preserved Functionality**
+- ✅ All existing API endpoints (`/api/transcribe`, `/api/transcribe/{id}`, `/api/health`)
+- ✅ Multiple video format support
+- ✅ Language detection/specification
+- ✅ Progress tracking and logging
+- ✅ Error handling
+- ✅ Automatic cleanup after 3-4 hours
+- ✅ Rate limiting and validation
+## 📁 **Deployment Package Ready**
+All files are prepared in `hf_spaces_deploy/`:
+```
+hf_spaces_deploy/
+├── app.py                    # Gradio + FastAPI hybrid interface
+├── requirements.txt          # HF Spaces optimized dependencies
+├── README.md                 # HF Spaces documentation with API examples
+├── config.py                 # HF-optimized configuration
+├── models.py                 # Data models
+├── storage.py                # Storage management
+├── transcription_service.py  # Core transcription logic
+├── logging_config.py         # Logging configuration
+└── restart_handler.py        # Performance optimization
+```
+## 🚀 **Quick Deployment Steps**
+### **1. Create Hugging Face Space**
+- Go to https://huggingface.co/spaces
+- Click "Create new Space"
+- Name: `video-transcription`
+- SDK: **Gradio**
+- Visibility: **Public** (for API access)
+### **2. Deploy via Git**
+```bash
+cd hf_spaces_deploy
+git init
+git add .
+git commit -m "Deploy Video Transcription Service"
+git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+git push -u origin main
+```
+### **3. Wait for Build**
+- Monitor logs in HF Spaces dashboard
+- Build takes 5-10 minutes
+- Model downloads automatically
+## 🌐 **API Compatibility Confirmed**
+### **Identical Endpoints**
+Your existing API calls work unchanged:
+```python
+# OLD (Render.com)
+BASE_URL = "https://your-service.onrender.com"
+# NEW (HF Spaces) - Just change the URL!
+BASE_URL = "https://username-spacename.hf.space"
+# All endpoints remain the same:
+POST /api/transcribe
+GET /api/transcribe/{id}
+GET /api/health
+```
+### **Example API Usage**
+```python
+import requests
+# Upload video (same as before)
+with open('video.mp4', 'rb') as f:
+    response = requests.post(
+        'https://username-spacename.hf.space/api/transcribe',
+        files={'file': f},
+        data={'language': 'en'}
+    )
+transcription_id = response.json()['id']
+# Check status (same as before)
+result = requests.get(f'https://username-spacename.hf.space/api/transcribe/{transcription_id}')
+print(result.json())
+```
+### **Enhanced API Client**
+Use the new HF-optimized client:
+```python
+from hf_api_client import HFTranscriptionClient
+client = HFTranscriptionClient("https://username-spacename.hf.space")
+result = client.transcribe_and_wait("video.mp4")
+print(result['text'])
+```
+## 🎯 **Key Advantages**
+| Feature | Render.com | Hugging Face Spaces |
+|---------|------------|-------------------|
+| **Memory** | 512MB | 16GB (32GB upgrade) |
+| **CPU** | Shared | 2-8 vCPU dedicated |
+| **File Size** | 100MB | 200MB |
+| **Interface** | API only | Gradio + API |
+| **GPU** | None | T4 available |
+| **Community** | Limited | Built-in sharing |
+| **Reliability** | Cold starts | Better uptime |
+## 📊 **Testing Your Deployment**
+### **Web Interface Test**
+1. Visit: `https://username-spacename.hf.space`
+2. Upload a test video
+3. Verify transcription works
+4. Check status updates
+### **API Test**
+```bash
+# Health check
+curl "https://username-spacename.hf.space/api/health"
+# Upload test
+curl -X POST "https://username-spacename.hf.space/api/transcribe" \
+  -F "[email protected]" \
+  -F "language=en"
+# Status check
+curl "https://username-spacename.hf.space/api/transcribe/1"
+```
+### **Python Client Test**
+```bash
+python hf_api_client.py https://username-spacename.hf.space test_video.mp4
+```
+## 🔧 **Performance Optimization**
+### **Hardware Options**
+- **CPU basic** (free) - 2 vCPU, 16GB RAM
+- **CPU upgrade** ($0.05/hour) - 8 vCPU, 32GB RAM
+- **GPU T4** ($0.60/hour) - For heavy workloads
+### **Model Selection**
+```python
+# Environment variables in Space settings:
+WHISPER_MODEL=tiny    # Fastest (39MB)
+WHISPER_MODEL=base    # Balanced (74MB) - Default
+WHISPER_MODEL=small   # Best quality (244MB)
+```
+## 🎉 **Migration Benefits**
+### **Immediate Improvements**
+- ✅ **32x More Memory** (16GB vs 512MB)
+- ✅ **Dedicated CPU** vs shared
+- ✅ **2x Larger Files** (200MB vs 100MB)
+- ✅ **Beautiful Web Interface** + API
+- ✅ **Better Reliability** and uptime
+- ✅ **Community Features** and sharing
+### **Future Possibilities**
+- 🚀 **GPU Acceleration** for faster processing
+- 📈 **Scaling Options** with better hardware
+- 🌐 **Community Integration** and discovery
+- 🔧 **Advanced Features** with HF ecosystem
+## 📋 **Next Steps**
+1. **Deploy to HF Spaces** using the prepared files
+2. **Test both interfaces** (web + API)
+3. **Update your applications** with new URLs
+4. **Monitor performance** and optimize as needed
+5. **Share with community** if desired
+## 🎯 **Success Criteria**
+Your migration is successful when:
+- [ ] ✅ Web interface loads and works
+- [ ] ✅ API endpoints respond correctly
+- [ ] ✅ Video transcription completes successfully
+- [ ] ✅ Both small and large files process
+- [ ] ✅ Multiple concurrent requests work
+- [ ] ✅ Error handling functions properly
+- [ ] ✅ Automatic cleanup operates
+- [ ] ✅ Performance meets or exceeds Render.com
+---
+## 🎊 **Congratulations!**
+You now have a **production-ready Video Transcription Service** on Hugging Face Spaces with:
+- 🌐 **Beautiful Gradio interface** for users
+- 🔗 **Full API compatibility** for applications
+- 🚀 **Enhanced performance** and reliability
+- 📈 **Scalability options** for growth
+- 🎯 **All existing features** preserved and improved
+**Your service will be live at: `https://username-spacename.hf.space`**
+**Ready to deploy? Follow the steps in `HF_MIGRATION_GUIDE.md`! 🚀**

HF_MIGRATION_GUIDE.md ADDED Viewed

	@@ -0,0 +1,301 @@

+# 🚀 Hugging Face Spaces Migration Guide
+Complete guide to migrate your Video Transcription Service from Render.com to Hugging Face Spaces with enhanced features and API access.
+## 🎯 **Why Hugging Face Spaces?**
+### **Advantages over Render.com:**
+- ✅ **Higher Resource Limits**: More memory and CPU
+- ✅ **Better Performance**: Optimized for ML workloads
+- ✅ **Free GPU Access**: Available for intensive tasks
+- ✅ **Gradio Integration**: Beautiful web interface
+- ✅ **Community Features**: Easy sharing and discovery
+- ✅ **Persistent Storage**: Better file handling
+- ✅ **API + Web Interface**: Both available simultaneously
+## 📋 **Pre-Migration Checklist**
+- [ ] Hugging Face account created
+- [ ] Git installed locally
+- [ ] Python environment ready
+- [ ] Test video files prepared
+- [ ] Current service functionality documented
+## 🛠️ **Step 1: Prepare Deployment Files**
+Run the automated preparation script:
+```bash
+python deploy_to_hf.py
+```
+This creates a `hf_spaces_deploy/` directory with all necessary files:
+- `app.py` - Gradio + FastAPI hybrid interface
+- `requirements.txt` - HF Spaces optimized dependencies
+- `README.md` - HF Spaces documentation
+- `config.py` - HF-optimized configuration
+- All supporting modules
+## 🌐 **Step 2: Create Hugging Face Space**
+1. **Go to Hugging Face Spaces**
+   - Visit: https://huggingface.co/spaces
+   - Click "Create new Space"
+2. **Configure Your Space**
+   - **Name**: `video-transcription` (or your choice)
+   - **SDK**: Select "Gradio"
+   - **Hardware**: Start with "CPU basic" (free)
+   - **Visibility**: Public (for API access) or Private
+3. **Create Space**
+   - Click "Create Space"
+   - Note your Space URL: `https://username-spacename.hf.space`
+## 📤 **Step 3: Deploy to Hugging Face Spaces**
+### **Option A: Git Deployment (Recommended)**
+```bash
+cd hf_spaces_deploy
+git init
+git add .
+git commit -m "Initial deployment of Video Transcription Service"
+git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+git push -u origin main
+```
+### **Option B: Web Upload**
+1. Go to your Space page
+2. Click "Files" tab
+3. Upload all files from `hf_spaces_deploy/`
+4. Ensure `app.py` is in the root directory
+## ⏳ **Step 4: Monitor Deployment**
+1. **Check Build Logs**
+   - Go to "Logs" tab in your Space
+   - Monitor the build process (5-10 minutes)
+   - Look for successful model download
+2. **Expected Log Output**
+   ```
+   🚀 Starting Video Transcription Service on Hugging Face Spaces
+   🤖 Loading Whisper model for Hugging Face Spaces...
+   ✅ Model 'base' preloaded in 45.2 seconds
+   🚀 Starting FastAPI service...
+   Running on local URL: http://0.0.0.0:7860
+   ```
+3. **Troubleshoot Issues**
+   - Build failures: Check requirements.txt
+   - Memory issues: Switch to "CPU upgrade" hardware
+   - Model loading issues: Try `WHISPER_MODEL=tiny`
+## ✅ **Step 5: Test Your Deployment**
+### **Test Web Interface**
+1. **Visit Your Space**
+   - URL: `https://username-spacename.hf.space`
+   - Should see Gradio interface
+2. **Upload Test Video**
+   - Use "Upload & Transcribe" tab
+   - Select a small test video (< 50MB)
+   - Choose language or use auto-detect
+   - Click "Start Transcription"
+3. **Check Results**
+   - Note the transcription ID
+   - Use "Check Status" tab to monitor progress
+   - Verify transcription completes successfully
+### **Test API Functionality**
+1. **Health Check**
+   ```bash
+   curl "https://username-spacename.hf.space/api/health"
+   ```
+2. **Upload Video via API**
+   ```bash
+   curl -X POST "https://username-spacename.hf.space/api/transcribe" \
+     -F "file=@test_video.mp4" \
+     -F "language=en"
+   ```
+3. **Check Status via API**
+   ```bash
+   curl "https://username-spacename.hf.space/api/transcribe/1"
+   ```
+4. **Use Python Client**
+   ```bash
+   python hf_api_client.py https://username-spacename.hf.space test_video.mp4
+   ```
+## 🔧 **Step 6: Optimize Performance**
+### **Hardware Upgrades**
+If you experience performance issues:
+1. **Go to Space Settings**
+2. **Hardware → Upgrade**
+3. **Options:**
+   - CPU basic (free) - 2 vCPU, 16GB RAM
+   - CPU upgrade ($0.05/hour) - 8 vCPU, 32GB RAM
+   - GPU T4 small ($0.60/hour) - For heavy workloads
+### **Model Optimization**
+Adjust model size based on your needs:
+```python
+# In Space settings, add environment variable:
+WHISPER_MODEL=tiny    # Fastest, good quality
+WHISPER_MODEL=base    # Balanced (default)
+WHISPER_MODEL=small   # Better quality, slower
+```
+## 📊 **Step 7: Compare Features**
+| Feature | Render.com | Hugging Face Spaces |
+|---------|------------|-------------------|
+| **Memory** | 512MB | 16GB (basic) / 32GB (upgrade) |
+| **CPU** | Shared | 2-8 vCPU dedicated |
+| **Storage** | Ephemeral | Persistent |
+| **GPU** | None | T4 available |
+| **Interface** | API only | Gradio + API |
+| **Community** | Limited | Built-in sharing |
+| **Cost** | Free tier limited | More generous free tier |
+## 🔄 **Step 8: Migration Validation**
+### **Functionality Checklist**
+- [ ] ✅ Web interface loads correctly
+- [ ] ✅ Video upload works (multiple formats)
+- [ ] ✅ Language detection/selection works
+- [ ] ✅ Transcription processing completes
+- [ ] ✅ Results display correctly
+- [ ] ✅ API endpoints respond correctly
+- [ ] ✅ Status checking works
+- [ ] ✅ Error handling functions
+- [ ] ✅ Automatic cleanup operates
+- [ ] ✅ Logging provides good visibility
+### **Performance Validation**
+- [ ] ✅ Model loads within 2-3 minutes
+- [ ] ✅ First transcription completes successfully
+- [ ] ✅ Subsequent transcriptions are faster
+- [ ] ✅ Large files (up to 200MB) process correctly
+- [ ] ✅ Multiple concurrent requests handled
+- [ ] ✅ Memory usage stays within limits
+## 🌐 **Step 9: Update Your Applications**
+### **Update API Endpoints**
+Replace your Render.com URLs:
+```python
+# Old Render.com URL
+OLD_URL = "https://your-service.onrender.com"
+# New HF Spaces URL
+NEW_URL = "https://username-spacename.hf.space"
+# API endpoints remain the same:
+# POST /api/transcribe
+# GET /api/transcribe/{id}
+# GET /api/health
+```
+### **Update Client Code**
+```python
+# Use the new HF API client
+from hf_api_client import HFTranscriptionClient
+client = HFTranscriptionClient("https://username-spacename.hf.space")
+result = client.transcribe_and_wait("video.mp4")
+```
+## 🎉 **Step 10: Go Live**
+### **Share Your Space**
+1. **Make Public** (if desired)
+   - Space Settings → Visibility → Public
+2. **Add to Profile**
+   - Pin to your HF profile
+   - Add description and tags
+3. **Share URL**
+   - Web interface: `https://username-spacename.hf.space`
+   - API base: `https://username-spacename.hf.space/api`
+### **Monitor Usage**
+- Check Space analytics
+- Monitor resource usage
+- Review user feedback
+- Update documentation as needed
+## 🔧 **Troubleshooting**
+### **Common Issues**
+1. **Build Fails**
+   ```
+   Solution: Check requirements.txt, ensure all dependencies are compatible
+   ```
+2. **Model Loading Timeout**
+   ```
+   Solution: Upgrade to CPU upgrade hardware or use WHISPER_MODEL=tiny
+   ```
+3. **API Not Accessible**
+   ```
+   Solution: Ensure Space is Public and FastAPI is running on port 7860
+   ```
+4. **Memory Issues**
+   ```
+   Solution: Upgrade hardware or reduce MAX_FILE_SIZE in config
+   ```
+## 📞 **Support Resources**
+- **HF Spaces Documentation**: https://huggingface.co/docs/hub/spaces
+- **Gradio Documentation**: https://gradio.app/docs/
+- **Community Forum**: https://discuss.huggingface.co/
+- **Your Space Logs**: Available in Space dashboard
+## 🎯 **Next Steps**
+After successful migration:
+1. **Decommission Render.com** service
+2. **Update documentation** with new URLs
+3. **Notify users** of the migration
+4. **Monitor performance** and optimize as needed
+5. **Consider GPU upgrade** for heavy workloads
+---
+**🎉 Congratulations! Your Video Transcription Service is now running on Hugging Face Spaces with enhanced capabilities and better performance!**
+**Key Benefits Achieved:**
+- ✅ Higher resource limits
+- ✅ Beautiful Gradio web interface
+- ✅ Full API compatibility maintained
+- ✅ Better community integration
+- ✅ More reliable performance
+- ✅ Future GPU upgrade path

LOGGING_GUIDE.md ADDED Viewed

	@@ -0,0 +1,277 @@

+# Comprehensive Logging Guide
+The Video Transcription Service now includes detailed step-by-step logging to help you monitor and debug transcription progress.
+## 🎯 **What You Can Track**
+### Complete Transcription Journey
+- ✅ File upload and validation
+- ✅ Video processing steps
+- ✅ Whisper model loading
+- ✅ Audio extraction progress
+- ✅ Transcription inference
+- ✅ Results and cleanup
+- ✅ Error handling and debugging
+### Real-time Progress Monitoring
+- 📊 Processing times for each step
+- 📏 File sizes and durations
+- 🌐 Language detection
+- 📝 Text length and previews
+- ⚠️ Warnings and errors
+## 🚀 **Quick Start**
+### Basic Logging (Default)
+```bash
+python main.py
+```
+### Debug Mode (Detailed Logs)
+```bash
+DEBUG=true python main.py
+```
+### Log to File
+```bash
+LOG_TO_FILE=true python main.py
+```
+### Combined (Debug + File)
+```bash
+DEBUG=true LOG_TO_FILE=true python main.py
+```
+## 📊 **Real-time Monitoring**
+### Monitor Service Health
+```bash
+python log_monitor.py test
+```
+### Upload and Monitor Video
+```bash
+python log_monitor.py upload video.mp4
+```
+### Monitor Existing Transcription
+```bash
+python log_monitor.py monitor 123
+```
+## 📋 **Sample Log Output**
+### Service Startup
+```
+2024-01-15 10:30:00 - main - INFO - 🚀 Starting Video Transcription Service
+2024-01-15 10:30:00 - main - INFO - ==================================================
+2024-01-15 10:30:00 - main - INFO - 📋 Service Configuration:
+2024-01-15 10:30:00 - main - INFO -    🤖 Whisper Model: base
+2024-01-15 10:30:00 - main - INFO -    📏 Max File Size: 100MB
+2024-01-15 10:30:00 - main - INFO -    🕒 Cleanup Interval: 3.5 hours
+2024-01-15 10:30:00 - main - INFO -    🚦 Rate Limit: 10 requests/minute
+2024-01-15 10:30:00 - main - INFO -    🌐 Host: 0.0.0.0:8000
+2024-01-15 10:30:00 - main - INFO -    📁 Supported Formats: .mp4, .avi, .mov, .mkv, .wmv, .flv, .webm, .m4v
+2024-01-15 10:30:00 - main - INFO - ==================================================
+```
+### File Upload Process
+```
+2024-01-15 10:30:15 - main - INFO - 🚀 Starting transcription request for file: video.mp4
+2024-01-15 10:30:15 - main - INFO - 🌐 Language specified: auto-detect
+2024-01-15 10:30:15 - main - INFO - 📁 Validating file: video.mp4
+2024-01-15 10:30:15 - main - INFO - 🔍 File extension: .mp4
+2024-01-15 10:30:15 - main - INFO - ✅ File format validation passed: .mp4
+2024-01-15 10:30:15 - main - INFO - 📊 Reading file content for size validation...
+2024-01-15 10:30:15 - main - INFO - 📏 File size: 25.34MB (max: 100MB)
+2024-01-15 10:30:15 - main - INFO - ✅ File size validation passed: 25.34MB
+```
+### Storage Operations
+```
+2024-01-15 10:30:15 - storage - INFO - 📝 Creating new transcription entry with ID: 1
+2024-01-15 10:30:15 - storage - INFO - 🌐 Language: auto-detect
+2024-01-15 10:30:15 - storage - INFO - ✅ Transcription 1 created successfully
+2024-01-15 10:30:15 - storage - INFO - 📊 Total active transcriptions: 1
+```
+### Video Processing
+```
+2024-01-15 10:30:15 - transcription_service - INFO - 🎬 Starting video transcription for ID: 1
+2024-01-15 10:30:15 - transcription_service - INFO - 📊 Video size: 25.34MB
+2024-01-15 10:30:15 - transcription_service - INFO - 🌐 Language: auto-detect
+2024-01-15 10:30:15 - transcription_service - INFO - 📝 Updating status to PROCESSING for ID: 1
+```
+### Model Loading (First Time)
+```
+2024-01-15 10:30:15 - transcription_service - INFO - 🤖 Loading Whisper model: base
+2024-01-15 10:30:15 - transcription_service - INFO - 📥 This may take 30-60 seconds for first-time download...
+2024-01-15 10:30:45 - transcription_service - INFO - ✅ Whisper model loaded successfully in 30.2 seconds
+```
+### Audio Extraction
+```
+2024-01-15 10:30:45 - transcription_service - INFO - 🎵 Extracting audio from video for transcription 1
+2024-01-15 10:30:45 - transcription_service - INFO - 📁 Creating temporary video file...
+2024-01-15 10:30:45 - transcription_service - INFO - 📁 Temporary files created - Video: /tmp/xyz.tmp, Audio: /tmp/abc.wav
+2024-01-15 10:30:45 - transcription_service - INFO - 🎵 Running FFmpeg to extract audio...
+2024-01-15 10:30:45 - transcription_service - INFO - 🔧 Configuring FFmpeg for audio extraction...
+2024-01-15 10:30:45 - transcription_service - INFO -    - Codec: PCM 16-bit
+2024-01-15 10:30:45 - transcription_service - INFO -    - Channels: 1 (mono)
+2024-01-15 10:30:45 - transcription_service - INFO -    - Sample rate: 16kHz
+2024-01-15 10:30:48 - transcription_service - INFO - ✅ FFmpeg audio extraction completed
+2024-01-15 10:30:48 - transcription_service - INFO - ✅ Audio extraction successful - Size: 8.45MB
+2024-01-15 10:30:48 - transcription_service - INFO - ✅ Audio extraction completed in 3.1 seconds
+```
+### Transcription Process
+```
+2024-01-15 10:30:48 - transcription_service - INFO - 🗣️ Starting audio transcription for ID 1
+2024-01-15 10:30:48 - transcription_service - INFO - 🗣️ Starting Whisper transcription...
+2024-01-15 10:30:48 - transcription_service - INFO - 🎵 Audio file: /tmp/abc.wav
+2024-01-15 10:30:48 - transcription_service - INFO - 🌐 Language: auto-detect
+2024-01-15 10:30:48 - transcription_service - INFO - ⚡ Running transcription in background thread...
+2024-01-15 10:30:48 - transcription_service - INFO - 🤖 Preparing Whisper transcription options...
+2024-01-15 10:30:48 - transcription_service - INFO - 🌐 Language: auto-detect
+2024-01-15 10:30:48 - transcription_service - INFO - 🎯 Starting Whisper model inference...
+2024-01-15 10:31:15 - transcription_service - INFO - ✅ Whisper inference completed in 27.3 seconds
+2024-01-15 10:31:15 - transcription_service - INFO - 📝 Text length: 1247 characters
+2024-01-15 10:31:15 - transcription_service - INFO - 🌐 Detected language: en
+2024-01-15 10:31:15 - transcription_service - INFO - ⏱️ Audio duration: 180.50 seconds
+2024-01-15 10:31:15 - transcription_service - INFO - 📄 Text preview: Hello, welcome to this video tutorial where we'll be discussing...
+```
+### Completion
+```
+2024-01-15 10:31:15 - transcription_service - INFO - ✅ Transcription completed in 27.3 seconds
+2024-01-15 10:31:15 - transcription_service - INFO - 💾 Saving transcription results for ID 1
+2024-01-15 10:31:15 - storage - INFO - 📝 Updated transcription 1
+2024-01-15 10:31:15 - storage - INFO - 🔄 Status changed: processing → completed
+2024-01-15 10:31:15 - storage - INFO - 📄 Text updated: Hello, welcome to this video tutorial where we'll...
+2024-01-15 10:31:15 - transcription_service - INFO - 🧹 Cleaning up temporary audio file
+2024-01-15 10:31:15 - transcription_service - INFO - 🎉 Transcription 1 completed successfully in 60.2 seconds total
+```
+## 🔧 **Log Levels**
+### INFO (Default)
+- Service startup/shutdown
+- Request processing
+- Status updates
+- Completion messages
+### DEBUG (Detailed)
+- File validation details
+- Temporary file paths
+- FFmpeg configuration
+- Model loading progress
+- Memory usage info
+### WARNING
+- Large file warnings
+- Performance issues
+- Non-critical errors
+### ERROR
+- Processing failures
+- File format issues
+- System errors
+- Transcription failures
+## 📁 **Log Files**
+When `LOG_TO_FILE=true`, logs are saved to:
+```
+transcription_service_YYYYMMDD_HHMMSS.log
+```
+Example: `transcription_service_20240115_103000.log`
+## 🛠️ **Troubleshooting with Logs**
+### Common Issues and Log Patterns
+**1. NumPy Compatibility Error**
+```
+ERROR - A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6
+```
+**Solution:** Run `python fix_numpy.py`
+**2. FFmpeg Not Found**
+```
+ERROR - FFmpeg audio extraction failed: [Errno 2] No such file or directory: 'ffmpeg'
+```
+**Solution:** Install FFmpeg for your OS
+**3. File Too Large**
+```
+ERROR - File too large: 150.5MB > 100MB
+```
+**Solution:** Compress video or increase limit in config.py
+**4. Model Loading Issues**
+```
+ERROR - Failed to load Whisper model: [Errno 28] No space left on device
+```
+**Solution:** Free up disk space or use smaller model
+**5. Memory Issues**
+```
+ERROR - Process killed (signal 9)
+```
+**Solution:** Use smaller files or increase available memory
+## 🎯 **Performance Monitoring**
+### Key Metrics to Watch
+- **Model Loading Time**: Should be 15-60 seconds (first time only)
+- **Audio Extraction**: Usually 1-5 seconds per minute of video
+- **Transcription Speed**: Varies by model and content (typically 0.1-0.5x real-time)
+- **Memory Usage**: Monitor for large files
+- **Active Transcriptions**: Track concurrent processing
+### Optimization Tips
+- Use `tiny` model for faster processing
+- Compress videos before upload
+- Monitor memory usage with large files
+- Use DEBUG mode to identify bottlenecks
+## 📊 **Integration Examples**
+### Parse Logs Programmatically
+```python
+import re
+from datetime import datetime
+def parse_transcription_logs(log_file):
+    with open(log_file, 'r') as f:
+        for line in f:
+            if 'Transcription' in line and 'completed successfully' in line:
+                # Extract transcription ID and time
+                match = re.search(r'Transcription (\d+) completed.*in ([\d.]+) seconds', line)
+                if match:
+                    tid, duration = match.groups()
+                    print(f"ID {tid}: {duration}s")
+```
+### Monitor API Programmatically
+```python
+import requests
+import time
+def monitor_service():
+    while True:
+        try:
+            response = requests.get('http://localhost:8000/health')
+            health = response.json()
+            print(f"Active: {health.get('active_transcriptions', 0)}")
+            time.sleep(30)
+        except Exception as e:
+            print(f"Service down: {e}")
+            time.sleep(60)
+```
+---
+**With comprehensive logging, you now have complete visibility into your transcription service! 🎉**

QUICKSTART.md ADDED Viewed

	@@ -0,0 +1,168 @@

+# Quick Start Guide
+Get your Video Transcription Service running in 5 minutes!
+## 🚀 Option 1: Automated Setup (Recommended)
+```bash
+# 1. Run the setup script
+python setup.py
+# 2. Activate virtual environment
+# Windows:
+venv\Scripts\activate
+# macOS/Linux:
+source venv/bin/activate
+# 3. Start the service (robust startup prevents restarts)
+python start_robust.py
+```
+## 🛠️ Option 2: Manual Setup
+```bash
+# 1. Create virtual environment
+python -m venv venv
+# 2. Activate virtual environment
+# Windows:
+venv\Scripts\activate
+# macOS/Linux:
+source venv/bin/activate
+# 3. Install dependencies
+pip install -r requirements.txt
+# 4. Install FFmpeg
+# Windows: Download from https://ffmpeg.org/download.html
+# macOS: brew install ffmpeg
+# Linux: sudo apt-get install ffmpeg
+# 5. Start the service
+python start_robust.py  # Prevents restarts
+# OR
+python main.py         # Standard startup
+```
+## 🧪 Test Your Service
+### Option A: Web Interface
+1. Open http://localhost:8000/docs
+2. Click "Try it out" on POST /transcribe
+3. Upload a video file
+4. Copy the returned ID
+5. Use GET /transcribe/{id} to check status
+### Option B: Command Line
+```bash
+# Test with example client
+python example_client.py your_video.mp4
+# Or test the API directly
+python test_api.py your_video.mp4
+# Monitor transcription progress in real-time
+python log_monitor.py upload your_video.mp4
+```
+### Option C: cURL
+```bash
+# Upload video
+curl -X POST "http://localhost:8000/transcribe" \
+  -F "file=@your_video.mp4" \
+  -F "language=en"
+# Check status (replace 1 with your ID)
+curl "http://localhost:8000/transcribe/1"
+```
+## 🌐 Deploy to Render.com
+```bash
+# 1. Push to GitHub
+git init
+git add .
+git commit -m "Initial commit"
+git remote add origin https://github.com/yourusername/your-repo.git
+git push -u origin main
+# 2. Go to render.com
+# 3. Create new Web Service
+# 4. Connect your GitHub repo
+# 5. Deploy!
+```
+## 📋 What You Get
+- **Free transcription** using OpenAI Whisper
+- **No API limits** - completely free
+- **Multiple formats** - MP4, AVI, MOV, etc.
+- **Auto language detection** or specify language
+- **REST API** with automatic documentation
+- **Rate limiting** and error handling
+- **Ready for production** deployment
+## 🔧 Configuration
+Edit `config.py` to customize:
+- File size limits
+- Supported formats
+- Whisper model size
+- Rate limiting
+- Cleanup intervals
+## 📊 Monitoring & Logging
+**Enable detailed logging:**
+```bash
+DEBUG=true python main.py
+```
+**Monitor transcription progress:**
+```bash
+# Test service
+python log_monitor.py test
+# Upload and monitor
+python log_monitor.py upload video.mp4
+# Monitor existing transcription
+python log_monitor.py monitor 123
+```
+**Log to file:**
+```bash
+LOG_TO_FILE=true python main.py
+```
+## 📖 Need Help?
+- **Full documentation**: See README.md
+- **Deployment guide**: See DEPLOYMENT.md
+- **API docs**: http://localhost:8000/docs (when running)
+- **Health check**: http://localhost:8000/health
+## 🎯 Common Issues
+**"Service keeps restarting"**
+- Run: `python start_robust.py` for automatic optimization
+- See: [RESTART_TROUBLESHOOTING.md](RESTART_TROUBLESHOOTING.md)
+**"NumPy compatibility error"**
+- Run: `python fix_numpy.py` to fix automatically
+**"FFmpeg not found"**
+- Install FFmpeg for your OS (see setup instructions)
+**"File too large"**
+- Default limit is 100MB (configurable in config.py)
+**"Service sleeping on Render"**
+- Free tier sleeps after 15min inactivity (normal behavior)
+**"Slow first request"**
+- AI model loads on first use (30-60 seconds)
+---
+**Ready to transcribe? Your service is now running at http://localhost:8000! 🎉**

README.md CHANGED Viewed

@@ -1,12 +1,367 @@
----
-title: TubeMate
-emoji: 💻
-colorFrom: green
-colorTo: indigo
-sdk: gradio
-sdk_version: 5.34.0
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Video Transcription Service
+A free, production-ready video transcription service built with FastAPI and OpenAI Whisper. Designed for deployment on Render.com's free tier with no transcription limits.
+## Features
+- 🎥 **Multiple Video Formats**: Supports MP4, AVI, MOV, MKV, WMV, FLV, WebM, M4V
+- 🗣️ **Free Speech-to-Text**: Uses OpenAI Whisper (completely free, no API limits)
+- 🌐 **REST API**: Simple endpoints for uploading and retrieving transcriptions
+- ⚡ **Async Processing**: Non-blocking transcription for better performance
+- 🛡️ **Rate Limiting**: Built-in protection against abuse
+- 🧹 **Auto Cleanup**: Automatic removal of old transcriptions (3.5 hours)
+- 📝 **Auto Documentation**: Interactive API docs at `/docs`
+- 🚀 **Render Ready**: Optimized for Render.com free tier deployment
+## Quick Start
+### Local Development
+1. **Clone and Setup**
+   ```bash
+   git clone <your-repo-url>
+   cd transcriber
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   pip install -r requirements.txt
+   ```
+2. **Install FFmpeg**
+   - **Windows**: Download from https://ffmpeg.org/download.html
+   - **macOS**: `brew install ffmpeg`
+   - **Linux**: `sudo apt-get install ffmpeg`
+3. **Run the Service**
+   ```bash
+   # Robust startup (recommended - prevents restarts)
+   python start_robust.py
+   # Or standard startup
+   python main.py
+   ```
+4. **Access the API**
+   - Service: http://localhost:8000
+   - Documentation: http://localhost:8000/docs
+   - Health Check: http://localhost:8000/health
+### Logging and Monitoring
+The service provides comprehensive step-by-step logging to track transcription progress:
+**Enable Debug Logging:**
+```bash
+DEBUG=true python main.py
+```
+**Enable File Logging:**
+```bash
+LOG_TO_FILE=true python main.py
+```
+**Sample Log Output:**
+```
+2024-01-15 10:30:00 - main - INFO - 🚀 Starting transcription request for file: video.mp4
+2024-01-15 10:30:00 - main - INFO - 🌐 Language specified: auto-detect
+2024-01-15 10:30:00 - main - INFO - 📁 Validating file: video.mp4
+2024-01-15 10:30:00 - main - INFO - 🔍 File extension: .mp4
+2024-01-15 10:30:00 - main - INFO - ✅ File format validation passed: .mp4
+2024-01-15 10:30:00 - main - INFO - 📊 Reading file content for size validation...
+2024-01-15 10:30:00 - main - INFO - 📏 File size: 25.3MB (max: 100MB)
+2024-01-15 10:30:00 - main - INFO - ✅ File size validation passed: 25.3MB
+2024-01-15 10:30:00 - storage - INFO - 📝 Creating new transcription entry with ID: 1
+2024-01-15 10:30:00 - transcription_service - INFO - 🎬 Starting video transcription for ID: 1
+2024-01-15 10:30:00 - transcription_service - INFO - 🤖 Loading Whisper model: base
+2024-01-15 10:30:15 - transcription_service - INFO - ✅ Whisper model loaded successfully in 15.2 seconds
+2024-01-15 10:30:15 - transcription_service - INFO - 🎵 Extracting audio from video for transcription 1
+2024-01-15 10:30:18 - transcription_service - INFO - ✅ Audio extraction completed in 3.1 seconds
+2024-01-15 10:30:18 - transcription_service - INFO - 🗣️ Starting audio transcription for ID 1
+2024-01-15 10:30:45 - transcription_service - INFO - ✅ Transcription completed in 27.3 seconds
+2024-01-15 10:30:45 - transcription_service - INFO - 📝 Transcribed text length: 1247 characters
+2024-01-15 10:30:45 - transcription_service - INFO - 🌐 Detected language: en
+2024-01-15 10:30:45 - transcription_service - INFO - 🎉 Transcription 1 completed successfully in 45.6 seconds total
+```
+### Deploy to Render.com
+1. **Push to GitHub**
+   ```bash
+   git init
+   git add .
+   git commit -m "Initial commit"
+   git remote add origin <your-github-repo-url>
+   git push -u origin main
+   ```
+2. **Deploy on Render**
+   - Go to [Render.com](https://render.com)
+   - Click "New +" → "Web Service"
+   - Connect your GitHub repository
+   - Render will automatically detect the `render.yaml` configuration
+   - Click "Deploy"
+3. **Configuration**
+   - The service will automatically use the free tier
+   - No environment variables needed (all configured automatically)
+   - Health checks are configured at `/health`
+## API Documentation
+### Base URL
+- Local: `http://localhost:8000`
+- Render: `https://your-service-name.onrender.com`
+### Endpoints
+#### 1. Upload Video for Transcription
+**POST** `/transcribe`
+Upload a video file and get a transcription ID.
+**Request:**
+- **Content-Type**: `multipart/form-data`
+- **file**: Video file (required) - Max 100MB
+- **language**: Language code (optional) - e.g., 'en', 'es', 'fr'
+**Response:**
+```json
+{
+  "id": 123,
+  "status": "pending",
+  "message": "Transcription started. Use the ID to check status.",
+  "created_at": "2024-01-15T10:30:00Z"
+}
+```
+**Example using curl:**
+```bash
+curl -X POST "http://localhost:8000/transcribe" \
+  -F "[email protected]" \
+  -F "language=en"
+```
+**Example using Python:**
+```python
+import requests
+with open('video.mp4', 'rb') as f:
+    response = requests.post(
+        'http://localhost:8000/transcribe',
+        files={'file': f},
+        data={'language': 'en'}  # optional
+    )
+result = response.json()
+transcription_id = result['id']
+```
+#### 2. Get Transcription Status/Results
+**GET** `/transcribe/{id}`
+Check transcription status and retrieve results.
+**Response:**
+```json
+{
+  "id": 123,
+  "status": "completed",
+  "text": "Hello, this is the transcribed text from your video...",
+  "language": "en",
+  "duration": 45.6,
+  "created_at": "2024-01-15T10:30:00Z",
+  "completed_at": "2024-01-15T10:32:15Z",
+  "error_message": null
+}
+```
+**Status Values:**
+- `pending`: Transcription queued
+- `processing`: Currently transcribing
+- `completed`: Transcription finished successfully
+- `failed`: Transcription failed (check error_message)
+**Example:**
+```bash
+curl "http://localhost:8000/transcribe/123"
+```
+#### 3. Health Check
+**GET** `/health`
+Check service health and get statistics.
+**Response:**
+```json
+{
+  "status": "healthy",
+  "timestamp": 5,
+  "active_transcriptions": 2
+}
+```
+### Error Handling
+All errors return a consistent format:
+```json
+{
+  "id": 0,
+  "error": "error_type",
+  "message": "Human readable error message"
+}
+```
+**Common Error Codes:**
+- `400`: Bad request (invalid file, unsupported format)
+- `413`: File too large (>100MB)
+- `404`: Transcription not found or expired
+- `429`: Rate limit exceeded (>10 requests/minute)
+- `500`: Internal server error
+## Supported Languages
+Whisper supports 99+ languages including:
+- English (en)
+- Spanish (es)
+- French (fr)
+- German (de)
+- Italian (it)
+- Portuguese (pt)
+- Russian (ru)
+- Japanese (ja)
+- Korean (ko)
+- Chinese (zh)
+- Arabic (ar)
+- Hindi (hi)
+Leave `language` empty for automatic detection.
+## Limitations
+### Free Tier Constraints
+- **File Size**: 100MB maximum per video
+- **Rate Limiting**: 10 requests per minute per IP
+- **Storage**: Results stored for 3.5 hours only
+- **Processing**: Sequential processing (one video at a time)
+- **Cold Starts**: First request may take 30-60 seconds
+### Technical Limitations
+- **Video Length**: Longer videos take more time to process
+- **Memory**: Large videos may fail on free tier (512MB RAM limit)
+- **CPU**: Processing speed limited by free tier CPU allocation
+## Troubleshooting
+### Common Issues
+1. **Service Restarts/Memory Issues**
+   ```
+   Process killed (signal 9) or frequent restarts
+   ```
+   **Solution:**
+   ```bash
+   # Use robust startup (automatically optimizes settings)
+   python start_robust.py
+   # Or manually use tiny model
+   WHISPER_MODEL=tiny MODEL_PRELOAD=true python main.py
+   ```
+   **See:** [RESTART_TROUBLESHOOTING.md](RESTART_TROUBLESHOOTING.md)
+2. **NumPy Compatibility Error**
+   ```
+   A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6
+   ```
+   **Solution:**
+   ```bash
+   python fix_numpy.py
+   ```
+   Or manually:
+   ```bash
+   pip uninstall numpy
+   pip install 'numpy<2.0.0'
+   pip install --force-reinstall torch torchaudio openai-whisper
+   ```
+2. **"File too large" Error**
+   - Compress your video or use a shorter clip
+   - Maximum file size is 100MB
+3. **"Unsupported file format" Error**
+   - Convert to supported format: MP4, AVI, MOV, MKV, WMV, FLV, WebM, M4V
+4. **Slow Processing**
+   - First request loads the AI model (30-60 seconds)
+   - Subsequent requests are faster
+   - Longer videos take more time
+5. **"Transcription not found" Error**
+   - Transcriptions expire after 3.5 hours
+   - Check if the ID is correct
+6. **Rate Limit Exceeded**
+   - Wait 1 minute before making more requests
+   - Maximum 10 requests per minute per IP
+### Render.com Specific
+1. **Service Sleeping**
+   - Free tier services sleep after 15 minutes of inactivity
+   - First request after sleep takes 30-60 seconds
+2. **Build Failures**
+   - Check build logs in Render dashboard
+   - Ensure all dependencies are in requirements.txt
+3. **Memory Issues**
+   - Free tier has 512MB RAM limit
+   - Large videos may cause out-of-memory errors
+## Development
+### Project Structure
+```
+transcriber/
+├── main.py                 # FastAPI application
+├── transcription_service.py # Core transcription logic
+├── storage.py              # In-memory storage manager
+├── models.py               # Pydantic data models
+├── config.py               # Configuration settings
+├── requirements.txt        # Python dependencies
+├── Dockerfile              # Container configuration
+├── render.yaml             # Render deployment config
+└── README.md               # This file
+```
+### Adding Features
+1. **New Video Formats**: Add to `ALLOWED_EXTENSIONS` in `config.py`
+2. **Different Models**: Change `WHISPER_MODEL` in `config.py`
+3. **Longer Storage**: Modify `CLEANUP_INTERVAL_HOURS` in `config.py`
+4. **Rate Limits**: Adjust `RATE_LIMIT_REQUESTS` in `config.py`
+### Testing
+```bash
+# Install test dependencies
+pip install pytest httpx
+# Run tests (create test files as needed)
+pytest
+```
+## License
+MIT License - feel free to use for any purpose.
+## Support
+- 📖 **Documentation**: Visit `/docs` endpoint for interactive API docs
+- 🐛 **Issues**: Report bugs via GitHub issues
+- 💡 **Features**: Suggest improvements via GitHub discussions
 ---
+**Ready to transcribe? Upload your first video at `/docs` or use the API endpoints above!**

README_HF.md ADDED Viewed

	@@ -0,0 +1,154 @@

+---
+title: Video Transcription Service
+emoji: 🎬
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 4.44.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# 🎬 Video Transcription Service
+A powerful video transcription service using OpenAI Whisper, deployed on Hugging Face Spaces with both web interface and API access.
+## ✨ Features
+- 🎥 **Multiple Video Formats**: MP4, AVI, MOV, MKV, WMV, FLV, WebM, M4V
+- 🗣️ **Free Speech-to-Text**: OpenAI Whisper (no API limits)
+- 🌐 **Language Support**: 99+ languages with auto-detection
+- 📱 **Dual Interface**: Web UI + REST API
+- ⚡ **Fast Processing**: Optimized for Hugging Face Spaces
+- 🧹 **Auto Cleanup**: Results stored for 3.5 hours
+## 🚀 Quick Start
+### Web Interface
+1. Upload your video file
+2. Select language (or use auto-detect)
+3. Click "Start Transcription"
+4. Use the transcription ID to check status
+### API Access
+**Upload Video:**
+```bash
+curl -X POST "https://your-space-name.hf.space/api/transcribe" \
+  -F "[email protected]" \
+  -F "language=en"
+```
+**Check Status:**
+```bash
+curl "https://your-space-name.hf.space/api/transcribe/123"
+```
+**Python Example:**
+```python
+import requests
+# Upload video
+with open('video.mp4', 'rb') as f:
+    response = requests.post(
+        'https://your-space-name.hf.space/api/transcribe',
+        files={'file': f},
+        data={'language': 'en'}
+    )
+result = response.json()
+transcription_id = result['id']
+# Check status
+import time
+while True:
+    status_response = requests.get(
+        f'https://your-space-name.hf.space/api/transcribe/{transcription_id}'
+    )
+    status = status_response.json()
+    if status['status'] == 'completed':
+        print("Transcription:", status['text'])
+        break
+    elif status['status'] == 'failed':
+        print("Error:", status['error_message'])
+        break
+    else:
+        print("Status:", status['status'])
+        time.sleep(10)
+```
+## 📋 API Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/transcribe` | POST | Upload video for transcription |
+| `/api/transcribe/{id}` | GET | Get transcription status/results |
+| `/api/health` | GET | Service health check |
+## 🌐 Supported Languages
+Auto-detection or specify: English, Spanish, French, German, Italian, Portuguese, Russian, Japanese, Korean, Chinese, Arabic, Hindi, and 87+ more languages.
+## 📏 Limitations
+- **File Size**: 100MB maximum per video
+- **Processing**: Sequential (one video at a time)
+- **Storage**: Results expire after 3.5 hours
+- **Rate Limiting**: Built-in protection against abuse
+## 🔧 Technical Details
+- **Model**: OpenAI Whisper (base model for accuracy)
+- **Backend**: FastAPI + Gradio
+- **Processing**: Async with real-time status updates
+- **Storage**: In-memory with automatic cleanup
+- **Deployment**: Optimized for Hugging Face Spaces
+## 📊 Response Format
+**Upload Response:**
+```json
+{
+  "id": 123,
+  "status": "pending",
+  "message": "Transcription started",
+  "created_at": "2024-01-15T10:30:00Z"
+}
+```
+**Status Response:**
+```json
+{
+  "id": 123,
+  "status": "completed",
+  "text": "Hello, this is the transcribed text...",
+  "language": "en",
+  "duration": 45.6,
+  "created_at": "2024-01-15T10:30:00Z",
+  "completed_at": "2024-01-15T10:32:15Z"
+}
+```
+## 🛠️ Development
+This service combines:
+- **Gradio**: Beautiful web interface
+- **FastAPI**: Robust API endpoints
+- **OpenAI Whisper**: State-of-the-art transcription
+- **Async Processing**: Non-blocking operations
+## 📞 Support
+- 📖 **Documentation**: Available in the API tab
+- 🐛 **Issues**: Report via GitHub
+- 💡 **Features**: Suggest improvements
+## 📄 License
+MIT License - free for any use.
+---
+**Ready to transcribe? Upload your video or use the API endpoints above! 🎉**

RESTART_TROUBLESHOOTING.md ADDED Viewed

	@@ -0,0 +1,295 @@

+# Restart Troubleshooting Guide
+If your Video Transcription Service is getting restarted frequently, this guide will help you identify and fix the issue.
+## 🔍 **Common Restart Causes**
+### 1. **Memory Exhaustion (Most Common)**
+**Symptoms:**
+- Service restarts during model loading
+- Restarts when processing large videos
+- "Process killed (signal 9)" in logs
+**Solutions:**
+```bash
+# Use tiny model (uses less memory)
+WHISPER_MODEL=tiny python main.py
+# Or use the robust startup script
+python start_robust.py
+```
+### 2. **Request Timeouts**
+**Symptoms:**
+- Restarts during first transcription request
+- Long delays before restart
+- No error messages, just restart
+**Solutions:**
+```bash
+# Enable model preloading
+MODEL_PRELOAD=true python main.py
+# Use robust startup (preloads automatically)
+python start_robust.py
+```
+### 3. **Dependency Issues**
+**Symptoms:**
+- Restarts immediately after startup
+- Import errors in logs
+- NumPy compatibility errors
+**Solutions:**
+```bash
+# Fix NumPy compatibility
+python fix_numpy.py
+# Reinstall dependencies
+pip install -r requirements.txt
+```
+## 🛠️ **Quick Fixes**
+### **Option 1: Use Robust Startup (Recommended)**
+```bash
+python start_robust.py
+```
+This script automatically:
+- Detects your environment (local/cloud/Render)
+- Sets optimal configuration
+- Preloads the model
+- Uses memory-efficient settings
+### **Option 2: Manual Configuration**
+```bash
+# For free tier / limited memory
+WHISPER_MODEL=tiny MODEL_PRELOAD=true DEBUG=false python main.py
+# For local development
+WHISPER_MODEL=base MODEL_PRELOAD=true python main.py
+```
+### **Option 3: Environment Variables**
+Create a `.env` file:
+```env
+WHISPER_MODEL=tiny
+MODEL_PRELOAD=true
+DEBUG=false
+MAX_FILE_SIZE=52428800
+```
+## 📊 **Memory Optimization**
+### **Model Size Comparison**
+| Model | Memory Usage | Speed | Accuracy |
+|-------|-------------|-------|----------|
+| tiny  | ~39MB      | Fast  | Good     |
+| base  | ~74MB      | Medium| Better   |
+| small | ~244MB     | Slow  | Best     |
+**For free tier (512MB RAM limit): Use `tiny`**
+### **File Size Limits**
+```bash
+# Conservative (recommended for free tier)
+MAX_FILE_SIZE=50MB
+# Standard (for paid tiers)
+MAX_FILE_SIZE=100MB
+```
+## 🔧 **Render.com Specific Fixes**
+### **Update render.yaml**
+```yaml
+services:
+  - type: web
+    name: video-transcription-service
+    env: docker
+    plan: free
+    dockerfilePath: ./Dockerfile
+    envVars:
+      - key: WHISPER_MODEL
+        value: tiny
+      - key: MODEL_PRELOAD
+        value: true
+      - key: DEBUG
+        value: false
+    healthCheckPath: /health
+    autoDeploy: true
+```
+### **Dockerfile Optimization**
+The updated Dockerfile now includes:
+- Memory-efficient settings
+- Model preloading
+- Robust startup script
+## 📋 **Diagnostic Commands**
+### **Check Service Health**
+```bash
+curl http://localhost:8000/health
+```
+**Healthy Response:**
+```json
+{
+  "status": "healthy",
+  "model_status": "loaded",
+  "model_name": "tiny",
+  "active_transcriptions": 0
+}
+```
+### **Monitor Memory Usage**
+```bash
+# Local monitoring
+python -c "
+import psutil
+p = psutil.Process()
+print(f'Memory: {p.memory_info().rss / 1024**2:.1f}MB')
+"
+```
+### **Test Model Loading**
+```bash
+python -c "
+import whisper
+import time
+start = time.time()
+model = whisper.load_model('tiny')
+print(f'Loaded in {time.time()-start:.1f}s')
+"
+```
+## 🚨 **Emergency Fixes**
+### **If Service Won't Start**
+1. **Check dependencies:**
+   ```bash
+   python -c "import fastapi, whisper, torch; print('OK')"
+   ```
+2. **Fix NumPy issues:**
+   ```bash
+   python fix_numpy.py
+   ```
+3. **Use minimal configuration:**
+   ```bash
+   WHISPER_MODEL=tiny DEBUG=false python main.py
+   ```
+### **If Restarts During Requests**
+1. **Enable model preloading:**
+   ```bash
+   MODEL_PRELOAD=true python start_robust.py
+   ```
+2. **Reduce file size limit:**
+   ```bash
+   # Edit config.py
+   MAX_FILE_SIZE = 25 * 1024 * 1024  # 25MB
+   ```
+3. **Use tiny model:**
+   ```bash
+   WHISPER_MODEL=tiny python main.py
+   ```
+## 📈 **Performance Monitoring**
+### **Log Analysis**
+Look for these patterns in logs:
+**Memory Issues:**
+```
+⚠️ High memory usage: 450.1MB (limit: 512MB)
+```
+**Model Loading:**
+```
+✅ Whisper model preloaded successfully in 15.2 seconds
+```
+**Successful Transcription:**
+```
+🎉 Transcription 1 completed successfully in 45.6 seconds total
+```
+### **Health Check Monitoring**
+```bash
+# Continuous monitoring
+while true; do
+  curl -s http://localhost:8000/health | jq '.model_status'
+  sleep 30
+done
+```
+## 🎯 **Best Practices**
+### **For Free Tier Hosting**
+1. Use `WHISPER_MODEL=tiny`
+2. Enable `MODEL_PRELOAD=true`
+3. Set `DEBUG=false`
+4. Limit file sizes to 25-50MB
+5. Process one video at a time
+### **For Local Development**
+1. Use `WHISPER_MODEL=base` or `small`
+2. Enable `DEBUG=true` for detailed logs
+3. Use `LOG_TO_FILE=true` for persistent logs
+4. Monitor memory usage
+### **For Production**
+1. Use paid hosting with more memory
+2. Enable model preloading
+3. Set up proper monitoring
+4. Use load balancing for multiple instances
+## 🔄 **Restart Recovery**
+### **Automatic Recovery**
+The service includes automatic recovery features:
+- Graceful shutdown handling
+- Model preloading on startup
+- Memory usage monitoring
+- Optimal settings detection
+### **Manual Recovery**
+If the service keeps restarting:
+1. **Check logs for error patterns**
+2. **Reduce resource usage**
+3. **Use robust startup script**
+4. **Contact hosting support if needed**
+## 📞 **Getting Help**
+### **Log Collection**
+When reporting issues, include:
+```bash
+# System info
+python -c "import sys, platform; print(f'Python: {sys.version}'); print(f'Platform: {platform.platform()}')"
+# Memory info
+python -c "import psutil; m=psutil.virtual_memory(); print(f'Memory: {m.total/1024**3:.1f}GB total, {m.available/1024**3:.1f}GB available')"
+# Service health
+curl http://localhost:8000/health
+```
+### **Common Solutions Summary**
+| Problem | Solution |
+|---------|----------|
+| Memory exhaustion | Use `WHISPER_MODEL=tiny` |
+| Request timeouts | Enable `MODEL_PRELOAD=true` |
+| NumPy errors | Run `python fix_numpy.py` |
+| Frequent restarts | Use `python start_robust.py` |
+| Large file issues | Reduce `MAX_FILE_SIZE` |
+---
+**With these fixes, your service should run stably without restarts! 🎉**

app.py ADDED Viewed

	@@ -0,0 +1,343 @@

+#!/usr/bin/env python3
+"""
+Hugging Face Spaces app.py - Video Transcription Service
+Combines Gradio interface with FastAPI for full functionality
+"""
+import gradio as gr
+import asyncio
+import threading
+import time
+import os
+import logging
+from datetime import datetime
+from typing import Optional, Tuple
+import uvicorn
+from fastapi import FastAPI, File, UploadFile, HTTPException
+from fastapi.responses import JSONResponse
+import tempfile
+# Import our existing modules
+from config import settings
+from models import TranscriptionStatus, TranscriptionResponse, TranscriptionResult
+from storage import storage
+from transcription_service import transcription_service
+from logging_config import setup_logging, log_step, log_success, log_error
+# Setup logging for Hugging Face Spaces
+setup_logging(level=logging.INFO, log_to_file=False)
+logger = logging.getLogger(__name__)
+# Configure for Hugging Face Spaces
+os.environ.setdefault("WHISPER_MODEL", "base")  # HF Spaces can handle base model
+os.environ.setdefault("MODEL_PRELOAD", "true")
+os.environ.setdefault("DEBUG", "false")
+# FastAPI app for API functionality
+api_app = FastAPI(
+    title="Video Transcription API",
+    description="API endpoints for video transcription",
+    version="1.0.0"
+)
+class TranscriptionManager:
+    def __init__(self):
+        self.model_loaded = False
+        self.model_loading = False
+    async def ensure_model_loaded(self):
+        """Ensure Whisper model is loaded"""
+        if self.model_loaded:
+            return True
+        if self.model_loading:
+            while self.model_loading:
+                await asyncio.sleep(0.1)
+            return self.model_loaded
+        self.model_loading = True
+        try:
+            logger.info("🤖 Loading Whisper model for Hugging Face Spaces...")
+            success = await transcription_service.preload_model()
+            self.model_loaded = success
+            return success
+        finally:
+            self.model_loading = False
+# Global transcription manager
+transcription_manager = TranscriptionManager()
+# FastAPI endpoints (preserve existing API functionality)
+@api_app.post("/transcribe")
+async def api_transcribe(file: UploadFile = File(...), language: str = None):
+    """API endpoint for video transcription"""
+    try:
+        # Ensure model is loaded
+        if not await transcription_manager.ensure_model_loaded():
+            raise HTTPException(status_code=503, detail="Model not available")
+        # Validate file
+        if not file.filename:
+            raise HTTPException(status_code=400, detail="No file provided")
+        # Read file content
+        content = await file.read()
+        if len(content) > settings.MAX_FILE_SIZE:
+            raise HTTPException(status_code=413, detail="File too large")
+        # Create transcription
+        transcription_id = storage.create_transcription(language=language)
+        # Start transcription in background
+        asyncio.create_task(
+            transcription_service.transcribe_video(content, transcription_id, language)
+        )
+        return TranscriptionResponse(
+            id=transcription_id,
+            status=TranscriptionStatus.PENDING,
+            message="Transcription started",
+            created_at=storage.get_transcription(transcription_id).created_at
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"API transcription error: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+@api_app.get("/transcribe/{transcription_id}")
+async def api_get_transcription(transcription_id: int):
+    """API endpoint to get transcription status/results"""
+    result = storage.get_transcription(transcription_id)
+    if not result:
+        raise HTTPException(status_code=404, detail="Transcription not found")
+    return result
+@api_app.get("/health")
+async def api_health():
+    """API health check"""
+    return {
+        "status": "healthy",
+        "model_loaded": transcription_manager.model_loaded,
+        "active_transcriptions": len([
+            t for t in storage._storage.values()
+            if t.status in [TranscriptionStatus.PENDING, TranscriptionStatus.PROCESSING]
+        ]) if hasattr(storage, '_storage') else 0
+    }
+# Gradio interface functions (sync versions for Gradio compatibility)
+def gradio_transcribe(video_file, language):
+    """Gradio transcription function"""
+    if video_file is None:
+        return "❌ Please upload a video file", "", ""
+    try:
+        # Check if model is loaded (sync check)
+        if not transcription_manager.model_loaded:
+            return "❌ Model not loaded yet. Please wait and try again.", "", ""
+        # Read file
+        with open(video_file, 'rb') as f:
+            content = f.read()
+        if len(content) > settings.MAX_FILE_SIZE:
+            return f"❌ File too large. Maximum size: {settings.MAX_FILE_SIZE // (1024*1024)}MB", "", ""
+        # Create transcription
+        transcription_id = storage.create_transcription(language=language if language != "auto" else None)
+        # Start transcription in background
+        loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(loop)
+        loop.run_in_executor(
+            None,
+            lambda: asyncio.run(transcription_service.transcribe_video(
+                content, transcription_id, language if language != "auto" else None
+            ))
+        )
+        return f"✅ Transcription started with ID: {transcription_id}", str(transcription_id), "⏳ Processing..."
+    except Exception as e:
+        logger.error(f"Gradio transcription error: {e}")
+        return f"❌ Error: {str(e)}", "", ""
+def gradio_check_status(transcription_id_str):
+    """Check transcription status for Gradio"""
+    if not transcription_id_str:
+        return "❌ Please provide a transcription ID"
+    try:
+        transcription_id = int(transcription_id_str)
+        result = storage.get_transcription(transcription_id)
+        if not result:
+            return "❌ Transcription not found or expired"
+        if result.status == TranscriptionStatus.COMPLETED:
+            return f"✅ Completed!\n\nLanguage: {result.language}\nDuration: {result.duration}s\n\nText:\n{result.text}"
+        elif result.status == TranscriptionStatus.FAILED:
+            return f"❌ Failed: {result.error_message}"
+        elif result.status == TranscriptionStatus.PROCESSING:
+            return "⏳ Still processing... Please wait and check again."
+        else:
+            return "⏳ Pending... Please wait and check again."
+    except ValueError:
+        return "❌ Invalid transcription ID (must be a number)"
+    except Exception as e:
+        return f"❌ Error: {str(e)}"
+# Create Gradio interface
+def create_gradio_interface():
+    """Create the Gradio interface"""
+    with gr.Blocks(
+        title="Video Transcription Service",
+        theme=gr.themes.Soft(),
+        css="""
+        .gradio-container {
+            max-width: 1000px !important;
+        }
+        """
+    ) as interface:
+        gr.Markdown("""
+        # 🎬 Video Transcription Service
+        Upload your video files and get accurate transcriptions using OpenAI Whisper.
+        **Features:**
+        - 🎥 Multiple video formats (MP4, AVI, MOV, etc.)
+        - 🌐 Automatic language detection or manual selection
+        - 🚀 Fast processing with OpenAI Whisper
+        - 📱 Both web interface and API access
+        """)
+        with gr.Tab("📤 Upload & Transcribe"):
+            with gr.Row():
+                with gr.Column():
+                    video_input = gr.File(
+                        label="Upload Video File",
+                        file_types=["video"],
+                        type="filepath"
+                    )
+                    language_input = gr.Dropdown(
+                        choices=["auto", "en", "es", "fr", "de", "it", "pt", "ru", "ja", "ko", "zh", "ar", "hi"],
+                        value="auto",
+                        label="Language (auto-detect or specify)"
+                    )
+                    transcribe_btn = gr.Button("🚀 Start Transcription", variant="primary")
+                with gr.Column():
+                    status_output = gr.Textbox(label="Status", lines=3)
+                    transcription_id_output = gr.Textbox(label="Transcription ID", visible=True)
+                    result_output = gr.Textbox(label="Progress", lines=2)
+        with gr.Tab("🔍 Check Status"):
+            with gr.Row():
+                with gr.Column():
+                    id_input = gr.Textbox(label="Transcription ID", placeholder="Enter transcription ID...")
+                    check_btn = gr.Button("📊 Check Status", variant="secondary")
+                with gr.Column():
+                    status_result = gr.Textbox(label="Result", lines=10)
+        with gr.Tab("🔧 API Documentation"):
+            gr.Markdown("""
+            ## 🌐 API Endpoints
+            You can also use this service programmatically via API calls:
+            ### Upload Video for Transcription
+            ```bash
+            curl -X POST "https://your-space-name.hf.space/api/transcribe" \\
+              -F "[email protected]" \\
+              -F "language=en"
+            ```
+            ### Check Transcription Status
+            ```bash
+            curl "https://your-space-name.hf.space/api/transcribe/123"
+            ```
+            ### Health Check
+            ```bash
+            curl "https://your-space-name.hf.space/api/health"
+            ```
+            ### Python Example
+            ```python
+            import requests
+            # Upload video
+            with open('video.mp4', 'rb') as f:
+                response = requests.post(
+                    'https://your-space-name.hf.space/api/transcribe',
+                    files={'file': f},
+                    data={'language': 'en'}
+                )
+            transcription_id = response.json()['id']
+            # Check status
+            result = requests.get(f'https://your-space-name.hf.space/api/transcribe/{transcription_id}')
+            print(result.json())
+            ```
+            """)
+        # Event handlers
+        transcribe_btn.click(
+            fn=gradio_transcribe,
+            inputs=[video_input, language_input],
+            outputs=[status_output, transcription_id_output, result_output]
+        )
+        check_btn.click(
+            fn=gradio_check_status,
+            inputs=[id_input],
+            outputs=[status_result]
+        )
+    return interface
+# Startup function
+async def startup():
+    """Initialize services"""
+    logger.info("🚀 Starting Video Transcription Service on Hugging Face Spaces")
+    # Start storage cleanup
+    await storage.start_cleanup_task()
+    # Preload model
+    log_step("Preloading Whisper model")
+    success = await transcription_manager.ensure_model_loaded()
+    if success:
+        log_success("Model preloaded successfully")
+    else:
+        log_error("Model preload failed")
+def run_fastapi():
+    """Run FastAPI in a separate thread"""
+    uvicorn.run(api_app, host="0.0.0.0", port=7860, log_level="info")
+# Main execution
+if __name__ == "__main__":
+    # Run startup
+    asyncio.run(startup())
+    # Start FastAPI in background thread for API access
+    api_thread = threading.Thread(target=run_fastapi, daemon=True)
+    api_thread.start()
+    # Create and launch Gradio interface
+    interface = create_gradio_interface()
+    # Launch with API access enabled
+    interface.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False,  # HF Spaces handles sharing
+        show_api=True,  # Enable API documentation
+        show_error=True
+    )

config.py ADDED Viewed

	@@ -0,0 +1,34 @@

+import os
+from typing import List
+class HuggingFaceSettings:
+    # File upload settings (HF Spaces can handle larger files)
+    MAX_FILE_SIZE = 200 * 1024 * 1024  # 200MB for HF Spaces
+    ALLOWED_EXTENSIONS = ['.mp4', '.avi', '.mov', '.mkv', '.wmv', '.flv', '.webm', '.m4v']
+    # Transcription settings (optimized for HF Spaces)
+    WHISPER_MODEL = os.getenv("WHISPER_MODEL", "base")  # HF Spaces can handle base model
+    CLEANUP_INTERVAL_HOURS = 3.5  # Clean up after 3.5 hours
+    # Performance settings for HF Spaces
+    MODEL_PRELOAD = True  # Always preload on HF Spaces
+    MAX_CONCURRENT_TRANSCRIPTIONS = 2  # HF Spaces can handle more
+    REQUEST_TIMEOUT_SECONDS = 600  # 10 minutes max per request
+    # Rate limiting (more generous on HF Spaces)
+    RATE_LIMIT_REQUESTS = 20  # requests per minute per IP
+    # Server settings
+    HOST = "0.0.0.0"
+    PORT = 7860  # Standard HF Spaces port
+    # Logging settings
+    DEBUG_MODE = os.getenv("DEBUG", "false").lower() == "true"
+    LOG_TO_FILE = False  # No file logging on HF Spaces
+    # Hugging Face Spaces specific
+    HF_SPACE_ID = os.getenv("SPACE_ID", "your-username/video-transcription")
+    HF_SPACE_URL = f"https://{HF_SPACE_ID.replace('/', '-')}.hf.space" if "SPACE_ID" in os.environ else "http://localhost:7860"
+# Use HF-optimized settings
+settings = HuggingFaceSettings()

deploy_to_hf.py ADDED Viewed

	@@ -0,0 +1,190 @@

+#!/usr/bin/env python3
+"""
+Deployment script for Hugging Face Spaces
+Prepares files and provides deployment instructions
+"""
+import os
+import shutil
+import logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def prepare_hf_deployment():
+    """Prepare files for Hugging Face Spaces deployment"""
+    logger.info("🚀 Preparing Video Transcription Service for Hugging Face Spaces")
+    logger.info("=" * 60)
+    # Create deployment directory
+    deploy_dir = "hf_spaces_deploy"
+    if os.path.exists(deploy_dir):
+        shutil.rmtree(deploy_dir)
+    os.makedirs(deploy_dir)
+    # Files to copy/create for HF Spaces
+    files_to_copy = [
+        "app.py",           # Main Gradio app
+        "config.py",        # Configuration
+        "models.py",        # Data models
+        "storage.py",       # Storage management
+        "transcription_service.py",  # Core transcription logic
+        "logging_config.py", # Logging configuration
+        "restart_handler.py" # Restart prevention
+    ]
+    # Copy core files
+    for file in files_to_copy:
+        if os.path.exists(file):
+            shutil.copy2(file, deploy_dir)
+            logger.info(f"✅ Copied {file}")
+        else:
+            logger.warning(f"⚠️ File not found: {file}")
+    # Copy and rename HF-specific files
+    if os.path.exists("requirements_hf.txt"):
+        shutil.copy2("requirements_hf.txt", os.path.join(deploy_dir, "requirements.txt"))
+        logger.info("✅ Copied requirements_hf.txt -> requirements.txt")
+    if os.path.exists("README_HF.md"):
+        shutil.copy2("README_HF.md", os.path.join(deploy_dir, "README.md"))
+        logger.info("✅ Copied README_HF.md -> README.md")
+    if os.path.exists("config_hf.py"):
+        # Replace config.py with HF-optimized version
+        shutil.copy2("config_hf.py", os.path.join(deploy_dir, "config.py"))
+        logger.info("✅ Using HF-optimized config.py")
+    # Create .gitignore for HF Spaces
+    gitignore_content = """
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+*.log
+.env
+.venv
+env/
+venv/
+.DS_Store
+*.tmp
+*.temp
+flagged/
+"""
+    with open(os.path.join(deploy_dir, ".gitignore"), "w") as f:
+        f.write(gitignore_content.strip())
+    logger.info("✅ Created .gitignore")
+    logger.info("\n🎉 Deployment files prepared successfully!")
+    logger.info(f"📁 Files are ready in: {deploy_dir}/")
+    return deploy_dir
+def print_deployment_instructions(deploy_dir):
+    """Print step-by-step deployment instructions"""
+    instructions = f"""
+🚀 HUGGING FACE SPACES DEPLOYMENT INSTRUCTIONS
+{'=' * 50}
+1. 📁 PREPARE YOUR HUGGING FACE ACCOUNT
+   - Go to https://huggingface.co
+   - Sign up/login to your account
+   - Go to "Spaces" tab
+2. 🆕 CREATE NEW SPACE
+   - Click "Create new Space"
+   - Choose a name: e.g., "video-transcription"
+   - Select "Gradio" as SDK
+   - Choose "Public" or "Private"
+   - Click "Create Space"
+3. 📤 UPLOAD FILES
+   Option A - Web Interface:
+   - Upload all files from {deploy_dir}/ to your Space
+   - Make sure app.py is in the root directory
+   Option B - Git (Recommended):
+   ```bash
+   cd {deploy_dir}
+   git init
+   git add .
+   git commit -m "Initial commit"
+   git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+   git push -u origin main
+   ```
+4. ⚙️ CONFIGURE SPACE SETTINGS
+   - Go to your Space settings
+   - Set "Hardware" to "CPU basic" (free) or "CPU upgrade" (better performance)
+   - Enable "Public" if you want API access from external applications
+5. 🚀 DEPLOY
+   - Your Space will automatically build and deploy
+   - Wait for the build to complete (5-10 minutes)
+   - Check logs for any errors
+6. ✅ TEST YOUR DEPLOYMENT
+   Web Interface:
+   - Visit: https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space
+   - Upload a test video file
+   - Verify transcription works
+   API Access:
+   ```bash
+   # Test health endpoint
+   curl "https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space/api/health"
+   # Test transcription
+   curl -X POST "https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space/api/transcribe" \\
+     -F "file=@test_video.mp4" \\
+     -F "language=en"
+   ```
+7. 📊 MONITOR PERFORMANCE
+   - Check Space logs for any issues
+   - Monitor memory usage
+   - Test with different video formats
+🎯 IMPORTANT NOTES:
+- First model load takes 2-3 minutes (downloads Whisper model)
+- Subsequent requests are much faster
+- API endpoints work exactly like your local FastAPI
+- Both web interface and API are available simultaneously
+🔧 TROUBLESHOOTING:
+- If build fails: Check requirements.txt and logs
+- If model loading fails: Try WHISPER_MODEL=tiny in Space settings
+- If memory issues: Upgrade to CPU upgrade hardware
+📞 NEED HELP?
+- Check Space logs in the "Logs" tab
+- Visit Hugging Face Spaces documentation
+- Test locally first: python app.py
+🎉 Your Video Transcription Service will be live at:
+   https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space
+"""
+    print(instructions)
+def main():
+    """Main deployment preparation function"""
+    try:
+        deploy_dir = prepare_hf_deployment()
+        print_deployment_instructions(deploy_dir)
+        logger.info("\n✅ Ready for Hugging Face Spaces deployment!")
+        logger.info(f"📁 Next step: Upload files from {deploy_dir}/ to your HF Space")
+    except Exception as e:
+        logger.error(f"❌ Deployment preparation failed: {e}")
+        return False
+    return True
+if __name__ == "__main__":
+    main()

example_client.py ADDED Viewed

	@@ -0,0 +1,166 @@

+#!/usr/bin/env python3
+"""
+Example client for the Video Transcription Service
+Usage: python example_client.py <video_file> [language]
+"""
+import requests
+import time
+import sys
+import os
+class TranscriptionClient:
+    def __init__(self, base_url="http://localhost:8000"):
+        self.base_url = base_url.rstrip('/')
+    def transcribe_video(self, video_path, language=None, poll_interval=10, max_wait_minutes=10):
+        """
+        Transcribe a video file and wait for results
+        Args:
+            video_path: Path to video file
+            language: Optional language code (e.g., 'en', 'es')
+            poll_interval: Seconds between status checks
+            max_wait_minutes: Maximum minutes to wait for completion
+        Returns:
+            dict: Transcription result or None if failed
+        """
+        if not os.path.exists(video_path):
+            print(f"Error: Video file '{video_path}' not found")
+            return None
+        file_size = os.path.getsize(video_path)
+        print(f"Uploading video: {video_path} ({file_size / (1024*1024):.1f} MB)")
+        # Upload video
+        try:
+            with open(video_path, 'rb') as f:
+                files = {'file': f}
+                data = {}
+                if language:
+                    data['language'] = language
+                print("Uploading...")
+                response = requests.post(f"{self.base_url}/transcribe", files=files, data=data)
+            if response.status_code != 200:
+                print(f"Upload failed: {response.status_code}")
+                print(response.text)
+                return None
+            result = response.json()
+            transcription_id = result['id']
+            print(f"Upload successful! Transcription ID: {transcription_id}")
+            print(f"Status: {result['status']}")
+        except Exception as e:
+            print(f"Upload error: {e}")
+            return None
+        # Poll for results
+        print(f"Waiting for transcription (checking every {poll_interval} seconds)...")
+        max_attempts = (max_wait_minutes * 60) // poll_interval
+        for attempt in range(max_attempts):
+            try:
+                response = requests.get(f"{self.base_url}/transcribe/{transcription_id}")
+                if response.status_code != 200:
+                    print(f"Status check failed: {response.status_code}")
+                    return None
+                result = response.json()
+                status = result['status']
+                if status == 'completed':
+                    print("✅ Transcription completed!")
+                    return result
+                elif status == 'failed':
+                    print(f"❌ Transcription failed: {result.get('error_message', 'Unknown error')}")
+                    return None
+                elif status in ['pending', 'processing']:
+                    print(f"⏳ Status: {status} (attempt {attempt + 1}/{max_attempts})")
+                    time.sleep(poll_interval)
+                else:
+                    print(f"❌ Unknown status: {status}")
+                    return None
+            except Exception as e:
+                print(f"Status check error: {e}")
+                return None
+        print(f"⏰ Transcription timed out after {max_wait_minutes} minutes")
+        return None
+    def get_transcription(self, transcription_id):
+        """Get transcription by ID"""
+        try:
+            response = requests.get(f"{self.base_url}/transcribe/{transcription_id}")
+            if response.status_code == 200:
+                return response.json()
+            else:
+                print(f"Error: {response.status_code}")
+                print(response.text)
+                return None
+        except Exception as e:
+            print(f"Error: {e}")
+            return None
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: python example_client.py <video_file> [language] [api_url]")
+        print("Examples:")
+        print("  python example_client.py video.mp4")
+        print("  python example_client.py video.mp4 en")
+        print("  python example_client.py video.mp4 es https://your-service.onrender.com")
+        sys.exit(1)
+    video_file = sys.argv[1]
+    language = sys.argv[2] if len(sys.argv) > 2 and not sys.argv[2].startswith('http') else None
+    api_url = sys.argv[3] if len(sys.argv) > 3 else sys.argv[2] if len(sys.argv) > 2 and sys.argv[2].startswith('http') else "http://localhost:8000"
+    print("Video Transcription Client")
+    print("=" * 30)
+    print(f"API URL: {api_url}")
+    print(f"Video: {video_file}")
+    print(f"Language: {language or 'auto-detect'}")
+    print()
+    client = TranscriptionClient(api_url)
+    result = client.transcribe_video(video_file, language)
+    if result:
+        print("\n" + "=" * 50)
+        print("TRANSCRIPTION RESULT")
+        print("=" * 50)
+        print(f"ID: {result['id']}")
+        print(f"Language: {result.get('language', 'N/A')}")
+        print(f"Duration: {result.get('duration', 'N/A')} seconds")
+        print(f"Created: {result['created_at']}")
+        print(f"Completed: {result.get('completed_at', 'N/A')}")
+        print()
+        print("TEXT:")
+        print("-" * 20)
+        print(result['text'])
+        print()
+        # Save to file
+        output_file = f"{os.path.splitext(video_file)[0]}_transcription.txt"
+        with open(output_file, 'w', encoding='utf-8') as f:
+            f.write(f"Transcription of: {video_file}\n")
+            f.write(f"Language: {result.get('language', 'N/A')}\n")
+            f.write(f"Duration: {result.get('duration', 'N/A')} seconds\n")
+            f.write(f"Created: {result['created_at']}\n")
+            f.write(f"Completed: {result.get('completed_at', 'N/A')}\n")
+            f.write("\n" + "=" * 50 + "\n")
+            f.write(result['text'])
+        print(f"💾 Transcription saved to: {output_file}")
+    else:
+        print("❌ Transcription failed")
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

fix_numpy.py ADDED Viewed

	@@ -0,0 +1,130 @@

+#!/usr/bin/env python3
+"""
+Fix NumPy compatibility issue for Video Transcription Service
+"""
+import subprocess
+import sys
+import os
+def run_command(command, description):
+    """Run a command and handle errors"""
+    print(f"🔧 {description}...")
+    try:
+        result = subprocess.run(command, shell=True, check=True, capture_output=True, text=True)
+        print(f"✅ {description} completed")
+        if result.stdout.strip():
+            print(f"   Output: {result.stdout.strip()}")
+        return True
+    except subprocess.CalledProcessError as e:
+        print(f"❌ {description} failed:")
+        print(f"   Command: {command}")
+        print(f"   Error: {e.stderr}")
+        return False
+def check_numpy_version():
+    """Check current NumPy version"""
+    try:
+        import numpy as np
+        version = np.__version__
+        print(f"📊 Current NumPy version: {version}")
+        # Check if version is 2.x
+        major_version = int(version.split('.')[0])
+        if major_version >= 2:
+            print("⚠️  NumPy 2.x detected - this causes compatibility issues with PyTorch/Whisper")
+            return False
+        else:
+            print("✅ NumPy version is compatible")
+            return True
+    except ImportError:
+        print("❌ NumPy not installed")
+        return False
+def fix_numpy_compatibility():
+    """Fix NumPy compatibility by downgrading to 1.x"""
+    commands = [
+        ("pip uninstall -y numpy", "Uninstalling current NumPy"),
+        ("pip install 'numpy<2.0.0'", "Installing compatible NumPy version"),
+        ("pip install --force-reinstall torch==2.1.0 torchaudio==2.1.0", "Reinstalling PyTorch with compatible NumPy"),
+        ("pip install --force-reinstall openai-whisper==20231117", "Reinstalling Whisper with compatible NumPy")
+    ]
+    for command, description in commands:
+        if not run_command(command, description):
+            return False
+    return True
+def verify_installation():
+    """Verify that everything works after the fix"""
+    print("\n🧪 Testing installation...")
+    try:
+        # Test NumPy
+        import numpy as np
+        print(f"✅ NumPy {np.__version__} imported successfully")
+        # Test PyTorch
+        import torch
+        print(f"✅ PyTorch {torch.__version__} imported successfully")
+        # Test Whisper
+        import whisper
+        print("✅ Whisper imported successfully")
+        # Test basic functionality
+        print("🔍 Testing Whisper model loading...")
+        try:
+            # This will download the tiny model if not present (much faster than base)
+            model = whisper.load_model("tiny")
+            print("✅ Whisper model loaded successfully")
+            return True
+        except Exception as e:
+            print(f"⚠️  Whisper model loading failed: {e}")
+            print("   This might be due to network issues - try running the service anyway")
+            return True
+    except Exception as e:
+        print(f"❌ Installation verification failed: {e}")
+        return False
+def main():
+    print("🔧 NumPy Compatibility Fix for Video Transcription Service")
+    print("=" * 60)
+    # Check current NumPy version
+    if check_numpy_version():
+        print("\n✅ NumPy version is already compatible!")
+        print("If you're still getting errors, try restarting your service.")
+        return
+    print("\n🔧 Fixing NumPy compatibility...")
+    # Fix NumPy compatibility
+    if not fix_numpy_compatibility():
+        print("\n❌ Failed to fix NumPy compatibility")
+        print("\n💡 Manual fix:")
+        print("1. pip uninstall numpy")
+        print("2. pip install 'numpy<2.0.0'")
+        print("3. pip install --force-reinstall torch torchaudio openai-whisper")
+        sys.exit(1)
+    # Verify installation
+    if not verify_installation():
+        print("\n⚠️  Installation verification had issues")
+        print("Try running the service - it might still work")
+    print("\n🎉 NumPy compatibility fix completed!")
+    print("=" * 40)
+    print("\n📋 Next steps:")
+    print("1. Restart your transcription service:")
+    print("   python main.py")
+    print("   OR")
+    print("   python start.py")
+    print("2. Test with a video file")
+    print("\n💡 If you still get errors, try:")
+    print("- Restart your terminal/command prompt")
+    print("- Deactivate and reactivate your virtual environment")
+if __name__ == "__main__":
+    main()

hf_api_client.py ADDED Viewed

	@@ -0,0 +1,255 @@

+#!/usr/bin/env python3
+"""
+API Client for Hugging Face Spaces Video Transcription Service
+Test both web interface and API functionality
+"""
+import requests
+import time
+import sys
+import os
+from datetime import datetime
+class HFTranscriptionClient:
+    def __init__(self, space_url):
+        """
+        Initialize client for HF Spaces transcription service
+        Args:
+            space_url: Your HF Space URL (e.g., "https://username-spacename.hf.space")
+        """
+        self.base_url = space_url.rstrip('/')
+        self.api_base = f"{self.base_url}/api"
+    def health_check(self):
+        """Check if the service is healthy"""
+        try:
+            response = requests.get(f"{self.api_base}/health", timeout=10)
+            if response.status_code == 200:
+                health = response.json()
+                print("✅ Service is healthy")
+                print(f"   Model loaded: {health.get('model_loaded', False)}")
+                print(f"   Active transcriptions: {health.get('active_transcriptions', 0)}")
+                return True
+            else:
+                print(f"❌ Health check failed: {response.status_code}")
+                return False
+        except requests.exceptions.RequestException as e:
+            print(f"❌ Cannot connect to service: {e}")
+            return False
+    def transcribe_video(self, video_path, language=None):
+        """
+        Upload video for transcription
+        Args:
+            video_path: Path to video file
+            language: Language code (e.g., 'en', 'es') or None for auto-detect
+        Returns:
+            dict: Response with transcription ID or error
+        """
+        if not os.path.exists(video_path):
+            return {"error": f"Video file not found: {video_path}"}
+        try:
+            print(f"📤 Uploading video: {video_path}")
+            with open(video_path, 'rb') as f:
+                files = {'file': f}
+                data = {}
+                if language:
+                    data['language'] = language
+                response = requests.post(
+                    f"{self.api_base}/transcribe",
+                    files=files,
+                    data=data,
+                    timeout=60
+                )
+            if response.status_code == 200:
+                result = response.json()
+                print(f"✅ Upload successful! Transcription ID: {result['id']}")
+                return result
+            else:
+                error_msg = f"Upload failed: {response.status_code}"
+                if response.text:
+                    error_msg += f" - {response.text}"
+                print(f"❌ {error_msg}")
+                return {"error": error_msg}
+        except requests.exceptions.RequestException as e:
+            error_msg = f"Upload error: {e}"
+            print(f"❌ {error_msg}")
+            return {"error": error_msg}
+    def get_transcription_status(self, transcription_id):
+        """
+        Get transcription status and results
+        Args:
+            transcription_id: ID returned from transcribe_video
+        Returns:
+            dict: Transcription status and results
+        """
+        try:
+            response = requests.get(
+                f"{self.api_base}/transcribe/{transcription_id}",
+                timeout=10
+            )
+            if response.status_code == 200:
+                return response.json()
+            elif response.status_code == 404:
+                return {"error": "Transcription not found or expired"}
+            else:
+                return {"error": f"Status check failed: {response.status_code}"}
+        except requests.exceptions.RequestException as e:
+            return {"error": f"Status check error: {e}"}
+    def wait_for_completion(self, transcription_id, max_wait_minutes=15, poll_interval=10):
+        """
+        Wait for transcription to complete
+        Args:
+            transcription_id: ID to monitor
+            max_wait_minutes: Maximum time to wait
+            poll_interval: Seconds between status checks
+        Returns:
+            dict: Final transcription result
+        """
+        print(f"⏳ Waiting for transcription {transcription_id} to complete...")
+        print(f"   Max wait time: {max_wait_minutes} minutes")
+        print(f"   Checking every {poll_interval} seconds")
+        start_time = time.time()
+        max_wait_seconds = max_wait_minutes * 60
+        while time.time() - start_time < max_wait_seconds:
+            result = self.get_transcription_status(transcription_id)
+            if "error" in result:
+                print(f"❌ Error checking status: {result['error']}")
+                return result
+            status = result.get('status', 'unknown')
+            print(f"   Status: {status}")
+            if status == 'completed':
+                print("🎉 Transcription completed!")
+                return result
+            elif status == 'failed':
+                error_msg = result.get('error_message', 'Unknown error')
+                print(f"❌ Transcription failed: {error_msg}")
+                return result
+            elif status in ['pending', 'processing']:
+                time.sleep(poll_interval)
+            else:
+                print(f"❌ Unknown status: {status}")
+                return result
+        print(f"⏰ Transcription timed out after {max_wait_minutes} minutes")
+        return {"error": "Timeout waiting for completion"}
+    def transcribe_and_wait(self, video_path, language=None, max_wait_minutes=15):
+        """
+        Upload video and wait for transcription to complete
+        Args:
+            video_path: Path to video file
+            language: Language code or None for auto-detect
+            max_wait_minutes: Maximum time to wait
+        Returns:
+            dict: Complete transcription result
+        """
+        # Upload video
+        upload_result = self.transcribe_video(video_path, language)
+        if "error" in upload_result:
+            return upload_result
+        transcription_id = upload_result['id']
+        # Wait for completion
+        return self.wait_for_completion(transcription_id, max_wait_minutes)
+def main():
+    """Main function for testing the HF Spaces API"""
+    if len(sys.argv) < 2:
+        print("Hugging Face Spaces Video Transcription API Client")
+        print("=" * 50)
+        print("Usage:")
+        print("  python hf_api_client.py <space_url> [video_file] [language]")
+        print()
+        print("Examples:")
+        print("  python hf_api_client.py https://username-spacename.hf.space")
+        print("  python hf_api_client.py https://username-spacename.hf.space video.mp4")
+        print("  python hf_api_client.py https://username-spacename.hf.space video.mp4 en")
+        print()
+        print("Commands:")
+        print("  health  - Check service health")
+        print("  test    - Run basic functionality test")
+        sys.exit(1)
+    space_url = sys.argv[1]
+    client = HFTranscriptionClient(space_url)
+    print(f"🌐 Connecting to: {space_url}")
+    print("=" * 50)
+    # Health check
+    if not client.health_check():
+        print("❌ Service is not available. Please check your Space URL and try again.")
+        sys.exit(1)
+    # If video file provided, transcribe it
+    if len(sys.argv) >= 3:
+        video_file = sys.argv[2]
+        language = sys.argv[3] if len(sys.argv) > 3 else None
+        print(f"\n🎬 Transcribing video: {video_file}")
+        if language:
+            print(f"🌐 Language: {language}")
+        else:
+            print("🌐 Language: auto-detect")
+        result = client.transcribe_and_wait(video_file, language)
+        if "error" in result:
+            print(f"❌ Transcription failed: {result['error']}")
+        else:
+            print("\n🎉 Transcription Results:")
+            print("=" * 30)
+            print(f"ID: {result.get('id', 'N/A')}")
+            print(f"Language: {result.get('language', 'N/A')}")
+            print(f"Duration: {result.get('duration', 'N/A')} seconds")
+            print(f"Status: {result.get('status', 'N/A')}")
+            print("\nTranscribed Text:")
+            print("-" * 20)
+            print(result.get('text', 'No text available'))
+            # Save to file
+            if result.get('text'):
+                output_file = f"{os.path.splitext(video_file)[0]}_transcription.txt"
+                with open(output_file, 'w', encoding='utf-8') as f:
+                    f.write(f"Transcription of: {video_file}\n")
+                    f.write(f"Language: {result.get('language', 'N/A')}\n")
+                    f.write(f"Duration: {result.get('duration', 'N/A')} seconds\n")
+                    f.write(f"Completed: {datetime.now().isoformat()}\n")
+                    f.write("\n" + "=" * 50 + "\n")
+                    f.write(result['text'])
+                print(f"\n💾 Transcription saved to: {output_file}")
+    else:
+        print("\n✅ Service is ready!")
+        print("🌐 Web interface:", space_url)
+        print("🔗 API base URL:", client.api_base)
+        print("\n📋 To transcribe a video:")
+        print(f"   python {sys.argv[0]} {space_url} your_video.mp4")
+if __name__ == "__main__":
+    main()

hf_spaces_deploy/.gitignore ADDED Viewed

	@@ -0,0 +1,14 @@

+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+*.log
+.env
+.venv
+env/
+venv/
+.DS_Store
+*.tmp
+*.temp
+flagged/

hf_spaces_deploy/README.md ADDED Viewed

	@@ -0,0 +1,154 @@

+---
+title: Video Transcription Service
+emoji: 🎬
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 4.44.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# 🎬 Video Transcription Service
+A powerful video transcription service using OpenAI Whisper, deployed on Hugging Face Spaces with both web interface and API access.
+## ✨ Features
+- 🎥 **Multiple Video Formats**: MP4, AVI, MOV, MKV, WMV, FLV, WebM, M4V
+- 🗣️ **Free Speech-to-Text**: OpenAI Whisper (no API limits)
+- 🌐 **Language Support**: 99+ languages with auto-detection
+- 📱 **Dual Interface**: Web UI + REST API
+- ⚡ **Fast Processing**: Optimized for Hugging Face Spaces
+- 🧹 **Auto Cleanup**: Results stored for 3.5 hours
+## 🚀 Quick Start
+### Web Interface
+1. Upload your video file
+2. Select language (or use auto-detect)
+3. Click "Start Transcription"
+4. Use the transcription ID to check status
+### API Access
+**Upload Video:**
+```bash
+curl -X POST "https://your-space-name.hf.space/api/transcribe" \
+  -F "[email protected]" \
+  -F "language=en"
+```
+**Check Status:**
+```bash
+curl "https://your-space-name.hf.space/api/transcribe/123"
+```
+**Python Example:**
+```python
+import requests
+# Upload video
+with open('video.mp4', 'rb') as f:
+    response = requests.post(
+        'https://your-space-name.hf.space/api/transcribe',
+        files={'file': f},
+        data={'language': 'en'}
+    )
+result = response.json()
+transcription_id = result['id']
+# Check status
+import time
+while True:
+    status_response = requests.get(
+        f'https://your-space-name.hf.space/api/transcribe/{transcription_id}'
+    )
+    status = status_response.json()
+    if status['status'] == 'completed':
+        print("Transcription:", status['text'])
+        break
+    elif status['status'] == 'failed':
+        print("Error:", status['error_message'])
+        break
+    else:
+        print("Status:", status['status'])
+        time.sleep(10)
+```
+## 📋 API Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/transcribe` | POST | Upload video for transcription |
+| `/api/transcribe/{id}` | GET | Get transcription status/results |
+| `/api/health` | GET | Service health check |
+## 🌐 Supported Languages
+Auto-detection or specify: English, Spanish, French, German, Italian, Portuguese, Russian, Japanese, Korean, Chinese, Arabic, Hindi, and 87+ more languages.
+## 📏 Limitations
+- **File Size**: 100MB maximum per video
+- **Processing**: Sequential (one video at a time)
+- **Storage**: Results expire after 3.5 hours
+- **Rate Limiting**: Built-in protection against abuse
+## 🔧 Technical Details
+- **Model**: OpenAI Whisper (base model for accuracy)
+- **Backend**: FastAPI + Gradio
+- **Processing**: Async with real-time status updates
+- **Storage**: In-memory with automatic cleanup
+- **Deployment**: Optimized for Hugging Face Spaces
+## 📊 Response Format
+**Upload Response:**
+```json
+{
+  "id": 123,
+  "status": "pending",
+  "message": "Transcription started",
+  "created_at": "2024-01-15T10:30:00Z"
+}
+```
+**Status Response:**
+```json
+{
+  "id": 123,
+  "status": "completed",
+  "text": "Hello, this is the transcribed text...",
+  "language": "en",
+  "duration": 45.6,
+  "created_at": "2024-01-15T10:30:00Z",
+  "completed_at": "2024-01-15T10:32:15Z"
+}
+```
+## 🛠️ Development
+This service combines:
+- **Gradio**: Beautiful web interface
+- **FastAPI**: Robust API endpoints
+- **OpenAI Whisper**: State-of-the-art transcription
+- **Async Processing**: Non-blocking operations
+## 📞 Support
+- 📖 **Documentation**: Available in the API tab
+- 🐛 **Issues**: Report via GitHub
+- 💡 **Features**: Suggest improvements
+## 📄 License
+MIT License - free for any use.
+---
+**Ready to transcribe? Upload your video or use the API endpoints above! 🎉**

hf_spaces_deploy/app.py ADDED Viewed

	@@ -0,0 +1,343 @@

+#!/usr/bin/env python3
+"""
+Hugging Face Spaces app.py - Video Transcription Service
+Combines Gradio interface with FastAPI for full functionality
+"""
+import gradio as gr
+import asyncio
+import threading
+import time
+import os
+import logging
+from datetime import datetime
+from typing import Optional, Tuple
+import uvicorn
+from fastapi import FastAPI, File, UploadFile, HTTPException
+from fastapi.responses import JSONResponse
+import tempfile
+# Import our existing modules
+from config import settings
+from models import TranscriptionStatus, TranscriptionResponse, TranscriptionResult
+from storage import storage
+from transcription_service import transcription_service
+from logging_config import setup_logging, log_step, log_success, log_error
+# Setup logging for Hugging Face Spaces
+setup_logging(level=logging.INFO, log_to_file=False)
+logger = logging.getLogger(__name__)
+# Configure for Hugging Face Spaces
+os.environ.setdefault("WHISPER_MODEL", "base")  # HF Spaces can handle base model
+os.environ.setdefault("MODEL_PRELOAD", "true")
+os.environ.setdefault("DEBUG", "false")
+# FastAPI app for API functionality
+api_app = FastAPI(
+    title="Video Transcription API",
+    description="API endpoints for video transcription",
+    version="1.0.0"
+)
+class TranscriptionManager:
+    def __init__(self):
+        self.model_loaded = False
+        self.model_loading = False
+    async def ensure_model_loaded(self):
+        """Ensure Whisper model is loaded"""
+        if self.model_loaded:
+            return True
+        if self.model_loading:
+            while self.model_loading:
+                await asyncio.sleep(0.1)
+            return self.model_loaded
+        self.model_loading = True
+        try:
+            logger.info("🤖 Loading Whisper model for Hugging Face Spaces...")
+            success = await transcription_service.preload_model()
+            self.model_loaded = success
+            return success
+        finally:
+            self.model_loading = False
+# Global transcription manager
+transcription_manager = TranscriptionManager()
+# FastAPI endpoints (preserve existing API functionality)
+@api_app.post("/transcribe")
+async def api_transcribe(file: UploadFile = File(...), language: str = None):
+    """API endpoint for video transcription"""
+    try:
+        # Ensure model is loaded
+        if not await transcription_manager.ensure_model_loaded():
+            raise HTTPException(status_code=503, detail="Model not available")
+        # Validate file
+        if not file.filename:
+            raise HTTPException(status_code=400, detail="No file provided")
+        # Read file content
+        content = await file.read()
+        if len(content) > settings.MAX_FILE_SIZE:
+            raise HTTPException(status_code=413, detail="File too large")
+        # Create transcription
+        transcription_id = storage.create_transcription(language=language)
+        # Start transcription in background
+        asyncio.create_task(
+            transcription_service.transcribe_video(content, transcription_id, language)
+        )
+        return TranscriptionResponse(
+            id=transcription_id,
+            status=TranscriptionStatus.PENDING,
+            message="Transcription started",
+            created_at=storage.get_transcription(transcription_id).created_at
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"API transcription error: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+@api_app.get("/transcribe/{transcription_id}")
+async def api_get_transcription(transcription_id: int):
+    """API endpoint to get transcription status/results"""
+    result = storage.get_transcription(transcription_id)
+    if not result:
+        raise HTTPException(status_code=404, detail="Transcription not found")
+    return result
+@api_app.get("/health")
+async def api_health():
+    """API health check"""
+    return {
+        "status": "healthy",
+        "model_loaded": transcription_manager.model_loaded,
+        "active_transcriptions": len([
+            t for t in storage._storage.values()
+            if t.status in [TranscriptionStatus.PENDING, TranscriptionStatus.PROCESSING]
+        ]) if hasattr(storage, '_storage') else 0
+    }
+# Gradio interface functions (sync versions for Gradio compatibility)
+def gradio_transcribe(video_file, language):
+    """Gradio transcription function"""
+    if video_file is None:
+        return "❌ Please upload a video file", "", ""
+    try:
+        # Check if model is loaded (sync check)
+        if not transcription_manager.model_loaded:
+            return "❌ Model not loaded yet. Please wait and try again.", "", ""
+        # Read file
+        with open(video_file, 'rb') as f:
+            content = f.read()
+        if len(content) > settings.MAX_FILE_SIZE:
+            return f"❌ File too large. Maximum size: {settings.MAX_FILE_SIZE // (1024*1024)}MB", "", ""
+        # Create transcription
+        transcription_id = storage.create_transcription(language=language if language != "auto" else None)
+        # Start transcription in background
+        loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(loop)
+        loop.run_in_executor(
+            None,
+            lambda: asyncio.run(transcription_service.transcribe_video(
+                content, transcription_id, language if language != "auto" else None
+            ))
+        )
+        return f"✅ Transcription started with ID: {transcription_id}", str(transcription_id), "⏳ Processing..."
+    except Exception as e:
+        logger.error(f"Gradio transcription error: {e}")
+        return f"❌ Error: {str(e)}", "", ""
+def gradio_check_status(transcription_id_str):
+    """Check transcription status for Gradio"""
+    if not transcription_id_str:
+        return "❌ Please provide a transcription ID"
+    try:
+        transcription_id = int(transcription_id_str)
+        result = storage.get_transcription(transcription_id)
+        if not result:
+            return "❌ Transcription not found or expired"
+        if result.status == TranscriptionStatus.COMPLETED:
+            return f"✅ Completed!\n\nLanguage: {result.language}\nDuration: {result.duration}s\n\nText:\n{result.text}"
+        elif result.status == TranscriptionStatus.FAILED:
+            return f"❌ Failed: {result.error_message}"
+        elif result.status == TranscriptionStatus.PROCESSING:
+            return "⏳ Still processing... Please wait and check again."
+        else:
+            return "⏳ Pending... Please wait and check again."
+    except ValueError:
+        return "❌ Invalid transcription ID (must be a number)"
+    except Exception as e:
+        return f"❌ Error: {str(e)}"
+# Create Gradio interface
+def create_gradio_interface():
+    """Create the Gradio interface"""
+    with gr.Blocks(
+        title="Video Transcription Service",
+        theme=gr.themes.Soft(),
+        css="""
+        .gradio-container {
+            max-width: 1000px !important;
+        }
+        """
+    ) as interface:
+        gr.Markdown("""
+        # 🎬 Video Transcription Service
+        Upload your video files and get accurate transcriptions using OpenAI Whisper.
+        **Features:**
+        - 🎥 Multiple video formats (MP4, AVI, MOV, etc.)
+        - 🌐 Automatic language detection or manual selection
+        - 🚀 Fast processing with OpenAI Whisper
+        - 📱 Both web interface and API access
+        """)
+        with gr.Tab("📤 Upload & Transcribe"):
+            with gr.Row():
+                with gr.Column():
+                    video_input = gr.File(
+                        label="Upload Video File",
+                        file_types=["video"],
+                        type="filepath"
+                    )
+                    language_input = gr.Dropdown(
+                        choices=["auto", "en", "es", "fr", "de", "it", "pt", "ru", "ja", "ko", "zh", "ar", "hi"],
+                        value="auto",
+                        label="Language (auto-detect or specify)"
+                    )
+                    transcribe_btn = gr.Button("🚀 Start Transcription", variant="primary")
+                with gr.Column():
+                    status_output = gr.Textbox(label="Status", lines=3)
+                    transcription_id_output = gr.Textbox(label="Transcription ID", visible=True)
+                    result_output = gr.Textbox(label="Progress", lines=2)
+        with gr.Tab("🔍 Check Status"):
+            with gr.Row():
+                with gr.Column():
+                    id_input = gr.Textbox(label="Transcription ID", placeholder="Enter transcription ID...")
+                    check_btn = gr.Button("📊 Check Status", variant="secondary")
+                with gr.Column():
+                    status_result = gr.Textbox(label="Result", lines=10)
+        with gr.Tab("🔧 API Documentation"):
+            gr.Markdown("""
+            ## 🌐 API Endpoints
+            You can also use this service programmatically via API calls:
+            ### Upload Video for Transcription
+            ```bash
+            curl -X POST "https://your-space-name.hf.space/api/transcribe" \\
+              -F "[email protected]" \\
+              -F "language=en"
+            ```
+            ### Check Transcription Status
+            ```bash
+            curl "https://your-space-name.hf.space/api/transcribe/123"
+            ```
+            ### Health Check
+            ```bash
+            curl "https://your-space-name.hf.space/api/health"
+            ```
+            ### Python Example
+            ```python
+            import requests
+            # Upload video
+            with open('video.mp4', 'rb') as f:
+                response = requests.post(
+                    'https://your-space-name.hf.space/api/transcribe',
+                    files={'file': f},
+                    data={'language': 'en'}
+                )
+            transcription_id = response.json()['id']
+            # Check status
+            result = requests.get(f'https://your-space-name.hf.space/api/transcribe/{transcription_id}')
+            print(result.json())
+            ```
+            """)
+        # Event handlers
+        transcribe_btn.click(
+            fn=gradio_transcribe,
+            inputs=[video_input, language_input],
+            outputs=[status_output, transcription_id_output, result_output]
+        )
+        check_btn.click(
+            fn=gradio_check_status,
+            inputs=[id_input],
+            outputs=[status_result]
+        )
+    return interface
+# Startup function
+async def startup():
+    """Initialize services"""
+    logger.info("🚀 Starting Video Transcription Service on Hugging Face Spaces")
+    # Start storage cleanup
+    await storage.start_cleanup_task()
+    # Preload model
+    log_step("Preloading Whisper model")
+    success = await transcription_manager.ensure_model_loaded()
+    if success:
+        log_success("Model preloaded successfully")
+    else:
+        log_error("Model preload failed")
+def run_fastapi():
+    """Run FastAPI in a separate thread"""
+    uvicorn.run(api_app, host="0.0.0.0", port=7860, log_level="info")
+# Main execution
+if __name__ == "__main__":
+    # Run startup
+    asyncio.run(startup())
+    # Start FastAPI in background thread for API access
+    api_thread = threading.Thread(target=run_fastapi, daemon=True)
+    api_thread.start()
+    # Create and launch Gradio interface
+    interface = create_gradio_interface()
+    # Launch with API access enabled
+    interface.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False,  # HF Spaces handles sharing
+        show_api=True,  # Enable API documentation
+        show_error=True
+    )

hf_spaces_deploy/config.py ADDED Viewed

	@@ -0,0 +1,34 @@

+import os
+from typing import List
+class HuggingFaceSettings:
+    # File upload settings (HF Spaces can handle larger files)
+    MAX_FILE_SIZE = 200 * 1024 * 1024  # 200MB for HF Spaces
+    ALLOWED_EXTENSIONS = ['.mp4', '.avi', '.mov', '.mkv', '.wmv', '.flv', '.webm', '.m4v']
+    # Transcription settings (optimized for HF Spaces)
+    WHISPER_MODEL = os.getenv("WHISPER_MODEL", "base")  # HF Spaces can handle base model
+    CLEANUP_INTERVAL_HOURS = 3.5  # Clean up after 3.5 hours
+    # Performance settings for HF Spaces
+    MODEL_PRELOAD = True  # Always preload on HF Spaces
+    MAX_CONCURRENT_TRANSCRIPTIONS = 2  # HF Spaces can handle more
+    REQUEST_TIMEOUT_SECONDS = 600  # 10 minutes max per request
+    # Rate limiting (more generous on HF Spaces)
+    RATE_LIMIT_REQUESTS = 20  # requests per minute per IP
+    # Server settings
+    HOST = "0.0.0.0"
+    PORT = 7860  # Standard HF Spaces port
+    # Logging settings
+    DEBUG_MODE = os.getenv("DEBUG", "false").lower() == "true"
+    LOG_TO_FILE = False  # No file logging on HF Spaces
+    # Hugging Face Spaces specific
+    HF_SPACE_ID = os.getenv("SPACE_ID", "your-username/video-transcription")
+    HF_SPACE_URL = f"https://{HF_SPACE_ID.replace('/', '-')}.hf.space" if "SPACE_ID" in os.environ else "http://localhost:7860"
+# Use HF-optimized settings
+settings = HuggingFaceSettings()

hf_spaces_deploy/logging_config.py ADDED Viewed

	@@ -0,0 +1,136 @@

+"""
+Logging configuration for Video Transcription Service
+"""
+import logging
+import sys
+from datetime import datetime
+def setup_logging(level=logging.INFO, log_to_file=False):
+    """
+    Setup comprehensive logging for the application
+    Args:
+        level: Logging level (DEBUG, INFO, WARNING, ERROR)
+        log_to_file: Whether to also log to a file
+    """
+    # Create formatter with emojis and detailed info
+    formatter = logging.Formatter(
+        '%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+        datefmt='%Y-%m-%d %H:%M:%S'
+    )
+    # Setup console handler
+    console_handler = logging.StreamHandler(sys.stdout)
+    console_handler.setFormatter(formatter)
+    console_handler.setLevel(level)
+    handlers = [console_handler]
+    # Setup file handler if requested
+    if log_to_file:
+        log_filename = f"transcription_service_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
+        file_handler = logging.FileHandler(log_filename)
+        file_handler.setFormatter(formatter)
+        file_handler.setLevel(level)
+        handlers.append(file_handler)
+    # Configure root logger
+    logging.basicConfig(
+        level=level,
+        handlers=handlers,
+        force=True  # Override any existing configuration
+    )
+    # Set specific logger levels
+    loggers = [
+        'main',
+        'transcription_service',
+        'storage',
+        'uvicorn.access',
+        'uvicorn.error'
+    ]
+    for logger_name in loggers:
+        logger = logging.getLogger(logger_name)
+        logger.setLevel(level)
+    # Reduce noise from some third-party libraries
+    logging.getLogger('httpx').setLevel(logging.WARNING)
+    logging.getLogger('httpcore').setLevel(logging.WARNING)
+    return logging.getLogger(__name__)
+def get_progress_logger():
+    """Get a logger specifically for progress tracking"""
+    logger = logging.getLogger('progress')
+    logger.setLevel(logging.INFO)
+    return logger
+# Progress tracking functions
+def log_step(step_name: str, transcription_id: int = None):
+    """Log a processing step"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.info(f"🔄 [{transcription_id}] {step_name}")
+    else:
+        logger.info(f"🔄 {step_name}")
+def log_success(message: str, transcription_id: int = None):
+    """Log a success message"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.info(f"✅ [{transcription_id}] {message}")
+    else:
+        logger.info(f"✅ {message}")
+def log_error(message: str, transcription_id: int = None):
+    """Log an error message"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.error(f"❌ [{transcription_id}] {message}")
+    else:
+        logger.error(f"❌ {message}")
+def log_warning(message: str, transcription_id: int = None):
+    """Log a warning message"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.warning(f"⚠️ [{transcription_id}] {message}")
+    else:
+        logger.warning(f"⚠️ {message}")
+def log_info(message: str, transcription_id: int = None):
+    """Log an info message"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.info(f"ℹ️ [{transcription_id}] {message}")
+    else:
+        logger.info(f"ℹ️ {message}")
+def log_progress_summary(transcription_id: int, total_time: float, status: str):
+    """Log a summary of transcription progress"""
+    logger = get_progress_logger()
+    logger.info(f"📊 [{transcription_id}] SUMMARY:")
+    logger.info(f"   Status: {status}")
+    logger.info(f"   Total Time: {total_time:.2f} seconds")
+    logger.info(f"   Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+# Example usage and testing
+if __name__ == "__main__":
+    # Test the logging configuration
+    setup_logging(level=logging.INFO)
+    logger = logging.getLogger(__name__)
+    logger.info("🧪 Testing logging configuration...")
+    # Test progress logging
+    log_step("Starting test transcription", 123)
+    log_info("Processing video file", 123)
+    log_success("Audio extraction completed", 123)
+    log_warning("Large file detected", 123)
+    log_error("Test error message", 123)
+    log_progress_summary(123, 45.6, "completed")
+    logger.info("✅ Logging test completed")

hf_spaces_deploy/models.py ADDED Viewed

	@@ -0,0 +1,34 @@

+from pydantic import BaseModel
+from typing import Optional
+from enum import Enum
+from datetime import datetime
+class TranscriptionStatus(str, Enum):
+    PENDING = "pending"
+    PROCESSING = "processing"
+    COMPLETED = "completed"
+    FAILED = "failed"
+class TranscriptionRequest(BaseModel):
+    language: Optional[str] = None  # Auto-detect if None
+class TranscriptionResponse(BaseModel):
+    id: int
+    status: TranscriptionStatus
+    message: str
+    created_at: datetime
+class TranscriptionResult(BaseModel):
+    id: int
+    status: TranscriptionStatus
+    text: Optional[str] = None
+    language: Optional[str] = None
+    duration: Optional[float] = None
+    created_at: datetime
+    completed_at: Optional[datetime] = None
+    error_message: Optional[str] = None
+class ErrorResponse(BaseModel):
+    id: int = 0
+    error: str
+    message: str

hf_spaces_deploy/requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+gradio==4.44.0
+fastapi==0.104.1
+uvicorn[standard]==0.24.0
+python-multipart==0.0.6
+openai-whisper==20231117
+torch==2.1.0
+torchaudio==2.1.0
+ffmpeg-python==0.2.0
+pydantic==2.5.0
+slowapi==0.1.9
+aiofiles==23.2.1
+httpx==0.25.2
+numpy<2.0.0
+psutil==5.9.6

hf_spaces_deploy/restart_handler.py ADDED Viewed

	@@ -0,0 +1,165 @@

+#!/usr/bin/env python3
+"""
+Restart handler for Video Transcription Service
+Helps prevent restarts due to memory/timeout issues
+"""
+import os
+import signal
+import sys
+import time
+import logging
+import psutil
+from datetime import datetime
+logger = logging.getLogger(__name__)
+class RestartHandler:
+    def __init__(self):
+        self.start_time = time.time()
+        self.restart_count = 0
+        self.memory_warnings = 0
+    def setup_signal_handlers(self):
+        """Setup signal handlers for graceful shutdown"""
+        signal.signal(signal.SIGTERM, self._signal_handler)
+        signal.signal(signal.SIGINT, self._signal_handler)
+    def _signal_handler(self, signum, frame):
+        """Handle shutdown signals gracefully"""
+        logger.info(f"🛑 Received signal {signum}, shutting down gracefully...")
+        # Log service statistics
+        uptime = time.time() - self.start_time
+        logger.info(f"📊 Service uptime: {uptime:.1f} seconds")
+        logger.info(f"🔄 Restart count: {self.restart_count}")
+        logger.info(f"⚠️ Memory warnings: {self.memory_warnings}")
+        sys.exit(0)
+    def check_memory_usage(self):
+        """Check memory usage and warn if high"""
+        try:
+            process = psutil.Process()
+            memory_info = process.memory_info()
+            memory_mb = memory_info.rss / (1024 * 1024)
+            # Warn if using more than 400MB (80% of 512MB limit)
+            if memory_mb > 400:
+                self.memory_warnings += 1
+                logger.warning(f"⚠️ High memory usage: {memory_mb:.1f}MB (limit: 512MB)")
+                logger.warning("💡 Consider using 'tiny' model or smaller files")
+                return True
+            elif memory_mb > 300:
+                logger.info(f"📊 Memory usage: {memory_mb:.1f}MB")
+            return False
+        except Exception as e:
+            logger.error(f"❌ Error checking memory: {e}")
+            return False
+    def log_system_info(self):
+        """Log system information for debugging"""
+        try:
+            logger.info("🖥️ System Information:")
+            logger.info(f"   Python: {sys.version.split()[0]}")
+            logger.info(f"   Platform: {sys.platform}")
+            if hasattr(psutil, 'virtual_memory'):
+                memory = psutil.virtual_memory()
+                logger.info(f"   Total Memory: {memory.total / (1024**3):.1f}GB")
+                logger.info(f"   Available Memory: {memory.available / (1024**3):.1f}GB")
+            if hasattr(psutil, 'cpu_count'):
+                logger.info(f"   CPU Cores: {psutil.cpu_count()}")
+        except Exception as e:
+            logger.warning(f"⚠️ Could not get system info: {e}")
+    def create_restart_prevention_tips(self):
+        """Create tips to prevent restarts"""
+        tips = [
+            "🔧 Restart Prevention Tips:",
+            "1. Use WHISPER_MODEL=tiny for faster loading and less memory",
+            "2. Keep video files under 50MB for free tier",
+            "3. Process one video at a time",
+            "4. Enable model preloading: MODEL_PRELOAD=true",
+            "5. Monitor memory usage in logs",
+            "6. Use DEBUG=false in production to reduce log overhead"
+        ]
+        for tip in tips:
+            logger.info(tip)
+# Global restart handler instance
+restart_handler = RestartHandler()
+def setup_restart_prevention():
+    """Setup restart prevention measures"""
+    restart_handler.setup_signal_handlers()
+    restart_handler.log_system_info()
+    restart_handler.create_restart_prevention_tips()
+def check_service_health():
+    """Check service health and log warnings"""
+    return restart_handler.check_memory_usage()
+# Environment variable helpers
+def get_optimal_settings():
+    """Get optimal settings for the current environment"""
+    settings = {}
+    # Detect if running on free tier (limited memory)
+    try:
+        memory = psutil.virtual_memory()
+        total_gb = memory.total / (1024**3)
+        if total_gb < 1:  # Less than 1GB = likely free tier
+            logger.info("🔍 Detected limited memory environment")
+            settings.update({
+                "WHISPER_MODEL": "tiny",
+                "MAX_FILE_SIZE": 50 * 1024 * 1024,  # 50MB
+                "MODEL_PRELOAD": "true",
+                "DEBUG": "false"
+            })
+        else:
+            logger.info("🔍 Detected standard memory environment")
+            settings.update({
+                "WHISPER_MODEL": "base",
+                "MAX_FILE_SIZE": 100 * 1024 * 1024,  # 100MB
+                "MODEL_PRELOAD": "true"
+            })
+    except Exception:
+        # Fallback to conservative settings
+        settings.update({
+            "WHISPER_MODEL": "tiny",
+            "MAX_FILE_SIZE": 50 * 1024 * 1024,
+            "MODEL_PRELOAD": "true"
+        })
+    return settings
+def apply_optimal_settings():
+    """Apply optimal settings if not already set"""
+    optimal = get_optimal_settings()
+    applied = []
+    for key, value in optimal.items():
+        if not os.getenv(key):
+            os.environ[key] = str(value)
+            applied.append(f"{key}={value}")
+    if applied:
+        logger.info("⚙️ Applied optimal settings:")
+        for setting in applied:
+            logger.info(f"   {setting}")
+if __name__ == "__main__":
+    # Test the restart handler
+    logging.basicConfig(level=logging.INFO)
+    setup_restart_prevention()
+    apply_optimal_settings()
+    logger.info("✅ Restart handler test completed")

hf_spaces_deploy/storage.py ADDED Viewed

	@@ -0,0 +1,158 @@

+import asyncio
+from datetime import datetime, timedelta, timezone
+from typing import Dict, Optional
+from models import TranscriptionResult, TranscriptionStatus
+from config import settings
+import logging
+# Configure logging for this module
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+class InMemoryStorage:
+    def __init__(self):
+        self._storage: Dict[int, TranscriptionResult] = {}
+        self._next_id = 1
+        self._cleanup_task = None
+    async def start_cleanup_task(self):
+        """Start the background cleanup task"""
+        if self._cleanup_task is None:
+            logger.info("🧹 Starting automatic cleanup task")
+            logger.info(f"⏰ Cleanup interval: {settings.CLEANUP_INTERVAL_HOURS} hours")
+            self._cleanup_task = asyncio.create_task(self._cleanup_loop())
+        else:
+            logger.info("🧹 Cleanup task already running")
+    async def stop_cleanup_task(self):
+        """Stop the background cleanup task"""
+        if self._cleanup_task:
+            logger.info("🛑 Stopping cleanup task")
+            self._cleanup_task.cancel()
+            try:
+                await self._cleanup_task
+            except asyncio.CancelledError:
+                pass
+            self._cleanup_task = None
+            logger.info("✅ Cleanup task stopped")
+        else:
+            logger.info("🧹 No cleanup task to stop")
+    def create_transcription(self, language: Optional[str] = None) -> int:
+        """Create a new transcription entry and return its ID"""
+        transcription_id = self._next_id
+        self._next_id += 1
+        logger.info(f"📝 Creating new transcription entry with ID: {transcription_id}")
+        logger.info(f"🌐 Language: {language or 'auto-detect'}")
+        result = TranscriptionResult(
+            id=transcription_id,
+            status=TranscriptionStatus.PENDING,
+            language=language,
+            created_at=datetime.now(timezone.utc)
+        )
+        self._storage[transcription_id] = result
+        logger.info(f"✅ Transcription {transcription_id} created successfully")
+        logger.info(f"📊 Total active transcriptions: {len(self._storage)}")
+        return transcription_id
+    def get_transcription(self, transcription_id: int) -> Optional[TranscriptionResult]:
+        """Get transcription by ID"""
+        logger.info(f"🔍 Looking up transcription ID: {transcription_id}")
+        result = self._storage.get(transcription_id)
+        if result:
+            logger.info(f"✅ Found transcription {transcription_id} with status: {result.status}")
+        else:
+            logger.warning(f"❌ Transcription {transcription_id} not found")
+        return result
+    def update_transcription(self, transcription_id: int, **kwargs) -> bool:
+        """Update transcription fields"""
+        if transcription_id not in self._storage:
+            logger.warning(f"❌ Cannot update transcription {transcription_id} - not found")
+            return False
+        result = self._storage[transcription_id]
+        old_status = result.status if hasattr(result, 'status') else 'unknown'
+        for key, value in kwargs.items():
+            if hasattr(result, key):
+                setattr(result, key, value)
+        new_status = result.status if hasattr(result, 'status') else 'unknown'
+        logger.info(f"📝 Updated transcription {transcription_id}")
+        if 'status' in kwargs:
+            logger.info(f"🔄 Status changed: {old_status} → {new_status}")
+        # Log specific updates
+        for key, value in kwargs.items():
+            if key == 'text' and value:
+                text_preview = value[:50] + "..." if len(value) > 50 else value
+                logger.info(f"📄 Text updated: {text_preview}")
+            elif key == 'error_message' and value:
+                logger.error(f"❌ Error recorded: {value}")
+            elif key not in ['status', 'text', 'error_message']:
+                logger.info(f"📊 {key}: {value}")
+        return True
+    def delete_transcription(self, transcription_id: int) -> bool:
+        """Delete transcription by ID"""
+        if transcription_id in self._storage:
+            result = self._storage[transcription_id]
+            del self._storage[transcription_id]
+            logger.info(f"🗑️ Deleted transcription {transcription_id} (status: {result.status})")
+            logger.info(f"📊 Remaining transcriptions: {len(self._storage)}")
+            return True
+        else:
+            logger.warning(f"❌ Cannot delete transcription {transcription_id} - not found")
+            return False
+    async def _cleanup_loop(self):
+        """Background task to clean up old transcriptions"""
+        logger.info("🧹 Cleanup loop started")
+        while True:
+            try:
+                logger.info("😴 Cleanup sleeping for 1 hour...")
+                await asyncio.sleep(3600)  # Check every hour
+                logger.info("⏰ Running scheduled cleanup...")
+                await self._cleanup_old_transcriptions()
+            except asyncio.CancelledError:
+                logger.info("🛑 Cleanup loop cancelled")
+                break
+            except Exception as e:
+                logger.error(f"❌ Error in cleanup loop: {e}")
+    async def _cleanup_old_transcriptions(self):
+        """Remove transcriptions older than the configured time"""
+        logger.info("🧹 Starting cleanup of old transcriptions...")
+        cutoff_time = datetime.now(timezone.utc) - timedelta(hours=settings.CLEANUP_INTERVAL_HOURS)
+        logger.info(f"⏰ Cutoff time: {cutoff_time} (older than {settings.CLEANUP_INTERVAL_HOURS} hours)")
+        to_delete = []
+        for transcription_id, result in self._storage.items():
+            age_hours = (datetime.now(timezone.utc) - result.created_at).total_seconds() / 3600
+            if result.created_at < cutoff_time:
+                logger.info(f"🗑️ Marking transcription {transcription_id} for deletion (age: {age_hours:.1f} hours)")
+                to_delete.append(transcription_id)
+        if not to_delete:
+            logger.info("✅ No old transcriptions to clean up")
+            return
+        logger.info(f"🧹 Deleting {len(to_delete)} old transcriptions...")
+        for transcription_id in to_delete:
+            self.delete_transcription(transcription_id)
+        logger.info(f"✅ Cleanup completed - removed {len(to_delete)} transcriptions")
+        logger.info(f"📊 Active transcriptions remaining: {len(self._storage)}")
+# Global storage instance
+storage = InMemoryStorage()

hf_spaces_deploy/transcription_service.py ADDED Viewed

	@@ -0,0 +1,304 @@

+import whisper
+import ffmpeg
+import tempfile
+import os
+import asyncio
+import logging
+import time
+from typing import Optional
+from datetime import datetime, timezone
+from storage import storage
+from models import TranscriptionStatus
+from config import settings
+# Configure logging for this module
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+class TranscriptionService:
+    def __init__(self):
+        self._model = None
+        self._model_loading = False
+        self._model_load_error = None
+    async def preload_model(self):
+        """Preload Whisper model during startup to avoid request timeouts"""
+        if self._model is not None:
+            logger.info("🤖 Whisper model already loaded")
+            return True
+        if self._model_load_error:
+            logger.error(f"❌ Previous model load failed: {self._model_load_error}")
+            return False
+        try:
+            logger.info(f"🚀 Preloading Whisper model: {settings.WHISPER_MODEL}")
+            logger.info("📥 This may take 30-60 seconds for first-time download...")
+            logger.info("⚡ Preloading during startup to avoid request timeouts...")
+            start_time = time.time()
+            # Run in thread pool to avoid blocking startup
+            loop = asyncio.get_event_loop()
+            self._model = await loop.run_in_executor(
+                None,
+                whisper.load_model,
+                settings.WHISPER_MODEL
+            )
+            load_time = time.time() - start_time
+            logger.info(f"✅ Whisper model preloaded successfully in {load_time:.2f} seconds")
+            logger.info("🎯 Service ready for transcription requests!")
+            return True
+        except Exception as e:
+            error_msg = f"Failed to preload Whisper model: {str(e)}"
+            logger.error(f"❌ {error_msg}")
+            self._model_load_error = error_msg
+            return False
+    async def _load_model(self):
+        """Load Whisper model asynchronously (fallback if not preloaded)"""
+        if self._model is not None:
+            logger.info("🤖 Whisper model already loaded")
+            return
+        if self._model_load_error:
+            logger.error(f"❌ Model load error: {self._model_load_error}")
+            raise Exception(self._model_load_error)
+        if self._model_loading:
+            logger.info("⏳ Whisper model is currently loading, waiting...")
+            # Wait for model to load
+            while self._model_loading:
+                await asyncio.sleep(0.1)
+            if self._model is None:
+                raise Exception("Model loading failed")
+            logger.info("✅ Whisper model loading completed (waited)")
+            return
+        # If we get here, model wasn't preloaded - try to load it now
+        logger.warning("⚠️ Model not preloaded, loading during request (may cause timeout)")
+        self._model_loading = True
+        try:
+            logger.info(f"🤖 Loading Whisper model: {settings.WHISPER_MODEL}")
+            start_time = time.time()
+            # Run in thread pool to avoid blocking
+            loop = asyncio.get_event_loop()
+            self._model = await loop.run_in_executor(
+                None,
+                whisper.load_model,
+                settings.WHISPER_MODEL
+            )
+            load_time = time.time() - start_time
+            logger.info(f"✅ Whisper model loaded successfully in {load_time:.2f} seconds")
+        except Exception as e:
+            error_msg = f"Failed to load Whisper model: {str(e)}"
+            logger.error(f"❌ {error_msg}")
+            self._model_load_error = error_msg
+            raise Exception(error_msg)
+        finally:
+            self._model_loading = False
+    async def transcribe_video(self, video_content: bytes, transcription_id: int, language: Optional[str] = None):
+        """Transcribe video content asynchronously"""
+        start_time = time.time()
+        logger.info(f"🎬 Starting video transcription for ID: {transcription_id}")
+        logger.info(f"📊 Video size: {len(video_content) / (1024*1024):.2f}MB")
+        logger.info(f"🌐 Language: {language or 'auto-detect'}")
+        # Check memory before starting
+        from restart_handler import check_service_health
+        if check_service_health():
+            logger.warning(f"⚠️ High memory usage detected before transcription {transcription_id}")
+        try:
+            # Update status to processing
+            logger.info(f"📝 Updating status to PROCESSING for ID: {transcription_id}")
+            storage.update_transcription(
+                transcription_id,
+                status=TranscriptionStatus.PROCESSING
+            )
+            # Load model if needed
+            logger.info(f"🤖 Loading Whisper model for transcription {transcription_id}")
+            await self._load_model()
+            # Extract audio from video
+            logger.info(f"🎵 Extracting audio from video for transcription {transcription_id}")
+            audio_start = time.time()
+            audio_path = await self._extract_audio(video_content)
+            audio_time = time.time() - audio_start
+            logger.info(f"✅ Audio extraction completed in {audio_time:.2f} seconds")
+            try:
+                # Transcribe audio
+                logger.info(f"🗣️ Starting audio transcription for ID {transcription_id}")
+                transcribe_start = time.time()
+                result = await self._transcribe_audio(audio_path, language)
+                transcribe_time = time.time() - transcribe_start
+                # Log transcription results
+                text_length = len(result["text"]) if result["text"] else 0
+                logger.info(f"✅ Transcription completed in {transcribe_time:.2f} seconds")
+                logger.info(f"📝 Transcribed text length: {text_length} characters")
+                logger.info(f"🌐 Detected language: {result.get('language', 'unknown')}")
+                logger.info(f"⏱️ Audio duration: {result.get('duration', 'unknown')} seconds")
+                # Update storage with results
+                logger.info(f"💾 Saving transcription results for ID {transcription_id}")
+                storage.update_transcription(
+                    transcription_id,
+                    status=TranscriptionStatus.COMPLETED,
+                    text=result["text"],
+                    language=result["language"],
+                    duration=result.get("duration"),
+                    completed_at=datetime.now(timezone.utc)
+                )
+                total_time = time.time() - start_time
+                logger.info(f"🎉 Transcription {transcription_id} completed successfully in {total_time:.2f} seconds total")
+            finally:
+                # Clean up audio file
+                if os.path.exists(audio_path):
+                    logger.info(f"🧹 Cleaning up temporary audio file")
+                    os.unlink(audio_path)
+        except Exception as e:
+            total_time = time.time() - start_time
+            logger.error(f"❌ Transcription {transcription_id} failed after {total_time:.2f} seconds: {str(e)}")
+            logger.error(f"🔍 Error type: {type(e).__name__}")
+            storage.update_transcription(
+                transcription_id,
+                status=TranscriptionStatus.FAILED,
+                error_message=str(e),
+                completed_at=datetime.now(timezone.utc)
+            )
+    async def _extract_audio(self, video_content: bytes) -> str:
+        """Extract audio from video content"""
+        logger.info("📁 Creating temporary video file...")
+        # Create temporary files
+        with tempfile.NamedTemporaryFile(delete=False, suffix='.tmp') as video_file:
+            video_file.write(video_content)
+            video_path = video_file.name
+        audio_path = tempfile.mktemp(suffix='.wav')
+        logger.info(f"📁 Temporary files created - Video: {video_path}, Audio: {audio_path}")
+        try:
+            # Extract audio using ffmpeg
+            logger.info("🎵 Running FFmpeg to extract audio...")
+            loop = asyncio.get_event_loop()
+            await loop.run_in_executor(
+                None,
+                self._extract_audio_sync,
+                video_path,
+                audio_path
+            )
+            # Check if audio file was created successfully
+            if os.path.exists(audio_path):
+                audio_size = os.path.getsize(audio_path)
+                logger.info(f"✅ Audio extraction successful - Size: {audio_size / (1024*1024):.2f}MB")
+            else:
+                logger.error("❌ Audio file was not created")
+                raise Exception("Audio extraction failed - no output file")
+            return audio_path
+        finally:
+            # Clean up video file
+            if os.path.exists(video_path):
+                logger.info("🧹 Cleaning up temporary video file")
+                os.unlink(video_path)
+    def _extract_audio_sync(self, video_path: str, audio_path: str):
+        """Synchronous audio extraction"""
+        try:
+            logger.info("🔧 Configuring FFmpeg for audio extraction...")
+            logger.info("   - Codec: PCM 16-bit")
+            logger.info("   - Channels: 1 (mono)")
+            logger.info("   - Sample rate: 16kHz")
+            (
+                ffmpeg
+                .input(video_path)
+                .output(audio_path, acodec='pcm_s16le', ac=1, ar='16000')
+                .overwrite_output()
+                .run(quiet=True)
+            )
+            logger.info("✅ FFmpeg audio extraction completed")
+        except Exception as e:
+            logger.error(f"❌ FFmpeg audio extraction failed: {str(e)}")
+            raise
+    async def _transcribe_audio(self, audio_path: str, language: Optional[str] = None) -> dict:
+        """Transcribe audio file"""
+        logger.info(f"🗣️ Starting Whisper transcription...")
+        logger.info(f"🎵 Audio file: {audio_path}")
+        logger.info(f"🌐 Language: {language or 'auto-detect'}")
+        loop = asyncio.get_event_loop()
+        # Run transcription in thread pool
+        logger.info("⚡ Running transcription in background thread...")
+        result = await loop.run_in_executor(
+            None,
+            self._transcribe_audio_sync,
+            audio_path,
+            language
+        )
+        logger.info("✅ Whisper transcription completed")
+        return result
+    def _transcribe_audio_sync(self, audio_path: str, language: Optional[str] = None) -> dict:
+        """Synchronous audio transcription"""
+        try:
+            logger.info("🤖 Preparing Whisper transcription options...")
+            options = {}
+            if language:
+                options['language'] = language
+                logger.info(f"🌐 Language specified: {language}")
+            else:
+                logger.info("🌐 Language: auto-detect")
+            logger.info("🎯 Starting Whisper model inference...")
+            start_time = time.time()
+            result = self._model.transcribe(audio_path, **options)
+            inference_time = time.time() - start_time
+            # Log detailed results
+            text = result["text"].strip()
+            detected_language = result.get("language", "unknown")
+            duration = result.get("duration", 0)
+            logger.info(f"✅ Whisper inference completed in {inference_time:.2f} seconds")
+            logger.info(f"📝 Text length: {len(text)} characters")
+            logger.info(f"🌐 Detected language: {detected_language}")
+            logger.info(f"⏱️ Audio duration: {duration:.2f} seconds")
+            if len(text) > 100:
+                logger.info(f"📄 Text preview: {text[:100]}...")
+            else:
+                logger.info(f"📄 Full text: {text}")
+            return {
+                "text": text,
+                "language": detected_language,
+                "duration": duration
+            }
+        except Exception as e:
+            logger.error(f"❌ Whisper transcription failed: {str(e)}")
+            logger.error(f"🔍 Error type: {type(e).__name__}")
+            raise
+# Global service instance
+transcription_service = TranscriptionService()

log_monitor.py ADDED Viewed

	@@ -0,0 +1,195 @@

+#!/usr/bin/env python3
+"""
+Real-time log monitor for Video Transcription Service
+"""
+import requests
+import time
+import sys
+import json
+from datetime import datetime
+class TranscriptionMonitor:
+    def __init__(self, base_url="http://localhost:8000"):
+        self.base_url = base_url.rstrip('/')
+        self.active_transcriptions = {}
+    def monitor_transcription(self, transcription_id, poll_interval=5):
+        """Monitor a specific transcription with real-time updates"""
+        print(f"🔍 Monitoring transcription ID: {transcription_id}")
+        print(f"⏱️ Poll interval: {poll_interval} seconds")
+        print("=" * 50)
+        start_time = time.time()
+        last_status = None
+        while True:
+            try:
+                response = requests.get(f"{self.base_url}/transcribe/{transcription_id}")
+                if response.status_code == 404:
+                    print(f"❌ Transcription {transcription_id} not found or expired")
+                    break
+                elif response.status_code != 200:
+                    print(f"❌ Error checking status: {response.status_code}")
+                    break
+                result = response.json()
+                status = result['status']
+                elapsed = time.time() - start_time
+                # Only print updates when status changes or every 30 seconds
+                if status != last_status or elapsed % 30 < poll_interval:
+                    timestamp = datetime.now().strftime("%H:%M:%S")
+                    print(f"[{timestamp}] 📊 Status: {status.upper()} (elapsed: {elapsed:.1f}s)")
+                    if status == 'completed':
+                        print("🎉 Transcription completed!")
+                        print(f"🌐 Language: {result.get('language', 'N/A')}")
+                        print(f"⏱️ Duration: {result.get('duration', 'N/A')} seconds")
+                        text = result.get('text', '')
+                        if text:
+                            preview = text[:100] + "..." if len(text) > 100 else text
+                            print(f"📝 Text preview: {preview}")
+                        break
+                    elif status == 'failed':
+                        print(f"❌ Transcription failed: {result.get('error_message', 'Unknown error')}")
+                        break
+                last_status = status
+                time.sleep(poll_interval)
+            except KeyboardInterrupt:
+                print("\n🛑 Monitoring stopped by user")
+                break
+            except Exception as e:
+                print(f"❌ Error: {e}")
+                time.sleep(poll_interval)
+    def list_active_transcriptions(self):
+        """List all active transcriptions by checking health endpoint"""
+        try:
+            response = requests.get(f"{self.base_url}/health")
+            if response.status_code == 200:
+                health = response.json()
+                active = health.get('active_transcriptions', 0)
+                print(f"📊 Active transcriptions: {active}")
+                return active
+            else:
+                print(f"❌ Cannot get health status: {response.status_code}")
+                return 0
+        except Exception as e:
+            print(f"❌ Error checking health: {e}")
+            return 0
+    def test_service(self):
+        """Test if the service is running"""
+        try:
+            response = requests.get(f"{self.base_url}/health", timeout=5)
+            if response.status_code == 200:
+                health = response.json()
+                print("✅ Service is running")
+                print(f"📊 Status: {health.get('status', 'unknown')}")
+                print(f"📊 Active transcriptions: {health.get('active_transcriptions', 0)}")
+                return True
+            else:
+                print(f"❌ Service returned status: {response.status_code}")
+                return False
+        except requests.exceptions.ConnectionError:
+            print(f"❌ Cannot connect to service at {self.base_url}")
+            print("   Make sure the service is running with: python main.py")
+            return False
+        except Exception as e:
+            print(f"❌ Error testing service: {e}")
+            return False
+    def upload_and_monitor(self, video_file, language=None):
+        """Upload a video and monitor its transcription"""
+        if not self.test_service():
+            return
+        print(f"📤 Uploading video: {video_file}")
+        try:
+            with open(video_file, 'rb') as f:
+                files = {'file': f}
+                data = {}
+                if language:
+                    data['language'] = language
+                response = requests.post(f"{self.base_url}/transcribe", files=files, data=data)
+            if response.status_code == 200:
+                result = response.json()
+                transcription_id = result['id']
+                print(f"✅ Upload successful! ID: {transcription_id}")
+                print()
+                self.monitor_transcription(transcription_id)
+            else:
+                print(f"❌ Upload failed: {response.status_code}")
+                print(response.text)
+        except FileNotFoundError:
+            print(f"❌ Video file not found: {video_file}")
+        except Exception as e:
+            print(f"❌ Upload error: {e}")
+def main():
+    if len(sys.argv) < 2:
+        print("Video Transcription Service - Log Monitor")
+        print("=" * 40)
+        print("Usage:")
+        print("  python log_monitor.py test                    # Test service")
+        print("  python log_monitor.py monitor <id>            # Monitor transcription")
+        print("  python log_monitor.py upload <video_file>     # Upload and monitor")
+        print("  python log_monitor.py upload <video_file> <lang>  # Upload with language")
+        print()
+        print("Examples:")
+        print("  python log_monitor.py test")
+        print("  python log_monitor.py monitor 123")
+        print("  python log_monitor.py upload video.mp4")
+        print("  python log_monitor.py upload video.mp4 en")
+        sys.exit(1)
+    # Get API URL from environment or use default
+    api_url = sys.argv[-1] if sys.argv[-1].startswith('http') else "http://localhost:8000"
+    if api_url != "http://localhost:8000":
+        sys.argv = sys.argv[:-1]  # Remove URL from args
+    monitor = TranscriptionMonitor(api_url)
+    command = sys.argv[1].lower()
+    if command == "test":
+        monitor.test_service()
+        monitor.list_active_transcriptions()
+    elif command == "monitor":
+        if len(sys.argv) < 3:
+            print("❌ Please provide transcription ID")
+            print("Usage: python log_monitor.py monitor <id>")
+            sys.exit(1)
+        try:
+            transcription_id = int(sys.argv[2])
+            monitor.monitor_transcription(transcription_id)
+        except ValueError:
+            print("❌ Invalid transcription ID (must be a number)")
+            sys.exit(1)
+    elif command == "upload":
+        if len(sys.argv) < 3:
+            print("❌ Please provide video file")
+            print("Usage: python log_monitor.py upload <video_file> [language]")
+            sys.exit(1)
+        video_file = sys.argv[2]
+        language = sys.argv[3] if len(sys.argv) > 3 else None
+        monitor.upload_and_monitor(video_file, language)
+    else:
+        print(f"❌ Unknown command: {command}")
+        print("Available commands: test, monitor, upload")
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

logging_config.py ADDED Viewed

	@@ -0,0 +1,136 @@

+"""
+Logging configuration for Video Transcription Service
+"""
+import logging
+import sys
+from datetime import datetime
+def setup_logging(level=logging.INFO, log_to_file=False):
+    """
+    Setup comprehensive logging for the application
+    Args:
+        level: Logging level (DEBUG, INFO, WARNING, ERROR)
+        log_to_file: Whether to also log to a file
+    """
+    # Create formatter with emojis and detailed info
+    formatter = logging.Formatter(
+        '%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+        datefmt='%Y-%m-%d %H:%M:%S'
+    )
+    # Setup console handler
+    console_handler = logging.StreamHandler(sys.stdout)
+    console_handler.setFormatter(formatter)
+    console_handler.setLevel(level)
+    handlers = [console_handler]
+    # Setup file handler if requested
+    if log_to_file:
+        log_filename = f"transcription_service_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
+        file_handler = logging.FileHandler(log_filename)
+        file_handler.setFormatter(formatter)
+        file_handler.setLevel(level)
+        handlers.append(file_handler)
+    # Configure root logger
+    logging.basicConfig(
+        level=level,
+        handlers=handlers,
+        force=True  # Override any existing configuration
+    )
+    # Set specific logger levels
+    loggers = [
+        'main',
+        'transcription_service',
+        'storage',
+        'uvicorn.access',
+        'uvicorn.error'
+    ]
+    for logger_name in loggers:
+        logger = logging.getLogger(logger_name)
+        logger.setLevel(level)
+    # Reduce noise from some third-party libraries
+    logging.getLogger('httpx').setLevel(logging.WARNING)
+    logging.getLogger('httpcore').setLevel(logging.WARNING)
+    return logging.getLogger(__name__)
+def get_progress_logger():
+    """Get a logger specifically for progress tracking"""
+    logger = logging.getLogger('progress')
+    logger.setLevel(logging.INFO)
+    return logger
+# Progress tracking functions
+def log_step(step_name: str, transcription_id: int = None):
+    """Log a processing step"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.info(f"🔄 [{transcription_id}] {step_name}")
+    else:
+        logger.info(f"🔄 {step_name}")
+def log_success(message: str, transcription_id: int = None):
+    """Log a success message"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.info(f"✅ [{transcription_id}] {message}")
+    else:
+        logger.info(f"✅ {message}")
+def log_error(message: str, transcription_id: int = None):
+    """Log an error message"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.error(f"❌ [{transcription_id}] {message}")
+    else:
+        logger.error(f"❌ {message}")
+def log_warning(message: str, transcription_id: int = None):
+    """Log a warning message"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.warning(f"⚠️ [{transcription_id}] {message}")
+    else:
+        logger.warning(f"⚠️ {message}")
+def log_info(message: str, transcription_id: int = None):
+    """Log an info message"""
+    logger = get_progress_logger()
+    if transcription_id:
+        logger.info(f"ℹ️ [{transcription_id}] {message}")
+    else:
+        logger.info(f"ℹ️ {message}")
+def log_progress_summary(transcription_id: int, total_time: float, status: str):
+    """Log a summary of transcription progress"""
+    logger = get_progress_logger()
+    logger.info(f"📊 [{transcription_id}] SUMMARY:")
+    logger.info(f"   Status: {status}")
+    logger.info(f"   Total Time: {total_time:.2f} seconds")
+    logger.info(f"   Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+# Example usage and testing
+if __name__ == "__main__":
+    # Test the logging configuration
+    setup_logging(level=logging.INFO)
+    logger = logging.getLogger(__name__)
+    logger.info("🧪 Testing logging configuration...")
+    # Test progress logging
+    log_step("Starting test transcription", 123)
+    log_info("Processing video file", 123)
+    log_success("Audio extraction completed", 123)
+    log_warning("Large file detected", 123)
+    log_error("Test error message", 123)
+    log_progress_summary(123, 45.6, "completed")
+    logger.info("✅ Logging test completed")

main.py ADDED Viewed

	@@ -0,0 +1,295 @@

+from fastapi import FastAPI, File, UploadFile, HTTPException, Request, Depends
+from fastapi.responses import JSONResponse
+from fastapi.middleware.cors import CORSMiddleware
+import asyncio
+import logging
+import os
+from pathlib import Path
+from slowapi import Limiter, _rate_limit_exceeded_handler
+from slowapi.util import get_remote_address
+from slowapi.errors import RateLimitExceeded
+from config import settings
+from models import (
+    TranscriptionRequest, TranscriptionResponse, TranscriptionResult,
+    ErrorResponse, TranscriptionStatus
+)
+from storage import storage
+from transcription_service import transcription_service
+# Configure logging and restart prevention
+from logging_config import setup_logging, log_step, log_success, log_error, log_info, log_progress_summary
+from restart_handler import setup_restart_prevention, apply_optimal_settings, check_service_health
+# Apply optimal settings for the environment
+apply_optimal_settings()
+# Setup logging (can be controlled via environment variable)
+log_level = logging.DEBUG if os.getenv("DEBUG", "false").lower() == "true" else logging.INFO
+setup_logging(level=log_level, log_to_file=os.getenv("LOG_TO_FILE", "false").lower() == "true")
+logger = logging.getLogger(__name__)
+# Setup restart prevention
+setup_restart_prevention()
+# Initialize rate limiter
+limiter = Limiter(key_func=get_remote_address)
+# Create FastAPI app
+app = FastAPI(
+    title="Video Transcription Service",
+    description="A free video transcription service using OpenAI Whisper",
+    version="1.0.0",
+    docs_url="/docs",
+    redoc_url="/redoc"
+)
+# Add rate limiting
+app.state.limiter = limiter
+app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
+# Add CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+@app.on_event("startup")
+async def startup_event():
+    """Initialize services on startup"""
+    logger.info("🚀 Starting Video Transcription Service")
+    logger.info("=" * 50)
+    logger.info("📋 Service Configuration:")
+    logger.info(f"   🤖 Whisper Model: {settings.WHISPER_MODEL}")
+    logger.info(f"   📏 Max File Size: {settings.MAX_FILE_SIZE // (1024*1024)}MB")
+    logger.info(f"   🕒 Cleanup Interval: {settings.CLEANUP_INTERVAL_HOURS} hours")
+    logger.info(f"   🚦 Rate Limit: {settings.RATE_LIMIT_REQUESTS} requests/minute")
+    logger.info(f"   🌐 Host: {settings.HOST}:{settings.PORT}")
+    logger.info(f"   📁 Supported Formats: {', '.join(settings.ALLOWED_EXTENSIONS)}")
+    logger.info(f"   ⚡ Model Preload: {settings.MODEL_PRELOAD}")
+    logger.info("=" * 50)
+    log_step("Initializing storage cleanup task")
+    await storage.start_cleanup_task()
+    # Preload Whisper model to avoid request timeouts
+    if settings.MODEL_PRELOAD:
+        log_step("Preloading Whisper model (prevents request timeouts)")
+        model_loaded = await transcription_service.preload_model()
+        if model_loaded:
+            log_success("Whisper model preloaded successfully")
+        else:
+            logger.warning("⚠️ Model preload failed - will try to load during requests")
+    else:
+        logger.info("⚠️ Model preload disabled - will load during first request")
+    log_success("Service startup completed")
+@app.on_event("shutdown")
+async def shutdown_event():
+    """Cleanup on shutdown"""
+    logger.info("🛑 Shutting down Video Transcription Service")
+    log_step("Stopping cleanup task")
+    await storage.stop_cleanup_task()
+    log_success("Service shutdown completed")
+def validate_file(file: UploadFile) -> None:
+    """Validate uploaded file"""
+    logger.info(f"📁 Validating file: {file.filename}")
+    if not file.filename:
+        logger.error("❌ No filename provided")
+        raise HTTPException(status_code=400, detail="No file provided")
+    # Check file extension
+    file_ext = Path(file.filename).suffix.lower()
+    logger.info(f"🔍 File extension: {file_ext}")
+    if file_ext not in settings.ALLOWED_EXTENSIONS:
+        logger.error(f"❌ Unsupported file format: {file_ext}")
+        raise HTTPException(
+            status_code=400,
+            detail=f"Unsupported file format. Allowed: {', '.join(settings.ALLOWED_EXTENSIONS)}"
+        )
+    logger.info(f"✅ File format validation passed: {file_ext}")
+async def validate_file_size(file: UploadFile) -> bytes:
+    """Validate file size and return content"""
+    logger.info(f"📊 Reading file content for size validation...")
+    content = await file.read()
+    file_size_mb = len(content) / (1024 * 1024)
+    max_size_mb = settings.MAX_FILE_SIZE // (1024 * 1024)
+    logger.info(f"📏 File size: {file_size_mb:.2f}MB (max: {max_size_mb}MB)")
+    if len(content) > settings.MAX_FILE_SIZE:
+        logger.error(f"❌ File too large: {file_size_mb:.2f}MB > {max_size_mb}MB")
+        raise HTTPException(
+            status_code=413,
+            detail=f"File too large. Maximum size: {max_size_mb}MB"
+        )
+    if len(content) == 0:
+        logger.error("❌ Empty file detected")
+        raise HTTPException(status_code=400, detail="Empty file")
+    logger.info(f"✅ File size validation passed: {file_size_mb:.2f}MB")
+    return content
+@app.get("/")
+async def root():
+    """Health check endpoint"""
+    return {
+        "service": "Video Transcription Service",
+        "status": "running",
+        "version": "1.0.0",
+        "docs": "/docs"
+    }
+@app.post("/transcribe", response_model=TranscriptionResponse)
+@limiter.limit(f"{settings.RATE_LIMIT_REQUESTS}/minute")
+async def transcribe_video(
+    request: Request,
+    file: UploadFile = File(...),
+    language: str = None
+):
+    """
+    Upload a video file for transcription
+    - **file**: Video file (MP4, AVI, MOV, etc.) - Max 100MB
+    - **language**: Optional language code (e.g., 'en', 'es', 'fr') - Auto-detect if not provided
+    Returns transcription ID for status checking
+    """
+    try:
+        logger.info(f"🚀 Starting transcription request for file: {file.filename}")
+        logger.info(f"🌐 Language specified: {language or 'auto-detect'}")
+        # Validate file
+        validate_file(file)
+        # Read and validate file content
+        content = await validate_file_size(file)
+        # Create transcription entry
+        logger.info("📝 Creating transcription entry in storage...")
+        transcription_id = storage.create_transcription(language=language)
+        logger.info(f"🆔 Transcription ID created: {transcription_id}")
+        # Start transcription in background
+        logger.info(f"⚡ Starting background transcription task for ID: {transcription_id}")
+        asyncio.create_task(
+            transcription_service.transcribe_video(content, transcription_id, language)
+        )
+        logger.info(f"✅ Transcription request accepted successfully - ID: {transcription_id}")
+        return TranscriptionResponse(
+            id=transcription_id,
+            status=TranscriptionStatus.PENDING,
+            message="Transcription started. Use the ID to check status.",
+            created_at=storage.get_transcription(transcription_id).created_at
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Error in transcribe endpoint: {str(e)}")
+        return JSONResponse(
+            status_code=500,
+            content=ErrorResponse(
+                id=0,
+                error="internal_error",
+                message="An internal error occurred"
+            ).dict()
+        )
+@app.get("/transcribe/{transcription_id}", response_model=TranscriptionResult)
+async def get_transcription(transcription_id: int):
+    """
+    Get transcription status and results
+    - **transcription_id**: ID returned from the transcribe endpoint
+    Returns transcription status and text (if completed)
+    """
+    try:
+        logger.info(f"🔍 Looking up transcription ID: {transcription_id}")
+        result = storage.get_transcription(transcription_id)
+        if not result:
+            logger.warning(f"❌ Transcription not found: {transcription_id}")
+            return JSONResponse(
+                status_code=404,
+                content=ErrorResponse(
+                    id=0,
+                    error="not_found",
+                    message="Transcription not found or expired"
+                ).dict()
+            )
+        logger.info(f"📊 Transcription status for ID {transcription_id}: {result.status}")
+        if result.status == TranscriptionStatus.COMPLETED:
+            text_preview = result.text[:100] + "..." if result.text and len(result.text) > 100 else result.text
+            logger.info(f"✅ Transcription completed - Preview: {text_preview}")
+        elif result.status == TranscriptionStatus.FAILED:
+            logger.error(f"❌ Transcription failed for ID {transcription_id}: {result.error_message}")
+        return result
+    except Exception as e:
+        logger.error(f"Error in get_transcription endpoint: {str(e)}")
+        return JSONResponse(
+            status_code=500,
+            content=ErrorResponse(
+                id=0,
+                error="internal_error",
+                message="An internal error occurred"
+            ).dict()
+        )
+@app.get("/health")
+async def health_check():
+    """Detailed health check"""
+    # Check model status
+    model_status = "not_loaded"
+    if transcription_service._model is not None:
+        model_status = "loaded"
+    elif transcription_service._model_loading:
+        model_status = "loading"
+    elif transcription_service._model_load_error:
+        model_status = "error"
+    active_transcriptions = 0
+    total_transcriptions = 0
+    if hasattr(storage, '_storage'):
+        total_transcriptions = len(storage._storage)
+        active_transcriptions = len([
+            t for t in storage._storage.values()
+            if t.status in [TranscriptionStatus.PENDING, TranscriptionStatus.PROCESSING]
+        ])
+    return {
+        "status": "healthy" if model_status in ["loaded", "loading"] else "degraded",
+        "model_status": model_status,
+        "model_name": settings.WHISPER_MODEL,
+        "model_error": transcription_service._model_load_error,
+        "total_transcriptions": total_transcriptions,
+        "active_transcriptions": active_transcriptions,
+        "max_file_size_mb": settings.MAX_FILE_SIZE // (1024*1024),
+        "supported_formats": settings.ALLOWED_EXTENSIONS,
+        "uptime_check": datetime.now().isoformat()
+    }
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(
+        "main:app",
+        host=settings.HOST,
+        port=settings.PORT,
+        reload=False
+    )

models.py ADDED Viewed

	@@ -0,0 +1,34 @@

+from pydantic import BaseModel
+from typing import Optional
+from enum import Enum
+from datetime import datetime
+class TranscriptionStatus(str, Enum):
+    PENDING = "pending"
+    PROCESSING = "processing"
+    COMPLETED = "completed"
+    FAILED = "failed"
+class TranscriptionRequest(BaseModel):
+    language: Optional[str] = None  # Auto-detect if None
+class TranscriptionResponse(BaseModel):
+    id: int
+    status: TranscriptionStatus
+    message: str
+    created_at: datetime
+class TranscriptionResult(BaseModel):
+    id: int
+    status: TranscriptionStatus
+    text: Optional[str] = None
+    language: Optional[str] = None
+    duration: Optional[float] = None
+    created_at: datetime
+    completed_at: Optional[datetime] = None
+    error_message: Optional[str] = None
+class ErrorResponse(BaseModel):
+    id: int = 0
+    error: str
+    message: str

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+gradio==4.44.0
+fastapi==0.104.1
+uvicorn[standard]==0.24.0
+python-multipart==0.0.6
+openai-whisper==20231117
+torch==2.1.0
+torchaudio==2.1.0
+ffmpeg-python==0.2.0
+pydantic==2.5.0
+slowapi==0.1.9
+aiofiles==23.2.1
+httpx==0.25.2
+numpy<2.0.0
+psutil==5.9.6

restart_handler.py ADDED Viewed

	@@ -0,0 +1,165 @@

+#!/usr/bin/env python3
+"""
+Restart handler for Video Transcription Service
+Helps prevent restarts due to memory/timeout issues
+"""
+import os
+import signal
+import sys
+import time
+import logging
+import psutil
+from datetime import datetime
+logger = logging.getLogger(__name__)
+class RestartHandler:
+    def __init__(self):
+        self.start_time = time.time()
+        self.restart_count = 0
+        self.memory_warnings = 0
+    def setup_signal_handlers(self):
+        """Setup signal handlers for graceful shutdown"""
+        signal.signal(signal.SIGTERM, self._signal_handler)
+        signal.signal(signal.SIGINT, self._signal_handler)
+    def _signal_handler(self, signum, frame):
+        """Handle shutdown signals gracefully"""
+        logger.info(f"🛑 Received signal {signum}, shutting down gracefully...")
+        # Log service statistics
+        uptime = time.time() - self.start_time
+        logger.info(f"📊 Service uptime: {uptime:.1f} seconds")
+        logger.info(f"🔄 Restart count: {self.restart_count}")
+        logger.info(f"⚠️ Memory warnings: {self.memory_warnings}")
+        sys.exit(0)
+    def check_memory_usage(self):
+        """Check memory usage and warn if high"""
+        try:
+            process = psutil.Process()
+            memory_info = process.memory_info()
+            memory_mb = memory_info.rss / (1024 * 1024)
+            # Warn if using more than 400MB (80% of 512MB limit)
+            if memory_mb > 400:
+                self.memory_warnings += 1
+                logger.warning(f"⚠️ High memory usage: {memory_mb:.1f}MB (limit: 512MB)")
+                logger.warning("💡 Consider using 'tiny' model or smaller files")
+                return True
+            elif memory_mb > 300:
+                logger.info(f"📊 Memory usage: {memory_mb:.1f}MB")
+            return False
+        except Exception as e:
+            logger.error(f"❌ Error checking memory: {e}")
+            return False
+    def log_system_info(self):
+        """Log system information for debugging"""
+        try:
+            logger.info("🖥️ System Information:")
+            logger.info(f"   Python: {sys.version.split()[0]}")
+            logger.info(f"   Platform: {sys.platform}")
+            if hasattr(psutil, 'virtual_memory'):
+                memory = psutil.virtual_memory()
+                logger.info(f"   Total Memory: {memory.total / (1024**3):.1f}GB")
+                logger.info(f"   Available Memory: {memory.available / (1024**3):.1f}GB")
+            if hasattr(psutil, 'cpu_count'):
+                logger.info(f"   CPU Cores: {psutil.cpu_count()}")
+        except Exception as e:
+            logger.warning(f"⚠️ Could not get system info: {e}")
+    def create_restart_prevention_tips(self):
+        """Create tips to prevent restarts"""
+        tips = [
+            "🔧 Restart Prevention Tips:",
+            "1. Use WHISPER_MODEL=tiny for faster loading and less memory",
+            "2. Keep video files under 50MB for free tier",
+            "3. Process one video at a time",
+            "4. Enable model preloading: MODEL_PRELOAD=true",
+            "5. Monitor memory usage in logs",
+            "6. Use DEBUG=false in production to reduce log overhead"
+        ]
+        for tip in tips:
+            logger.info(tip)
+# Global restart handler instance
+restart_handler = RestartHandler()
+def setup_restart_prevention():
+    """Setup restart prevention measures"""
+    restart_handler.setup_signal_handlers()
+    restart_handler.log_system_info()
+    restart_handler.create_restart_prevention_tips()
+def check_service_health():
+    """Check service health and log warnings"""
+    return restart_handler.check_memory_usage()
+# Environment variable helpers
+def get_optimal_settings():
+    """Get optimal settings for the current environment"""
+    settings = {}
+    # Detect if running on free tier (limited memory)
+    try:
+        memory = psutil.virtual_memory()
+        total_gb = memory.total / (1024**3)
+        if total_gb < 1:  # Less than 1GB = likely free tier
+            logger.info("🔍 Detected limited memory environment")
+            settings.update({
+                "WHISPER_MODEL": "tiny",
+                "MAX_FILE_SIZE": 50 * 1024 * 1024,  # 50MB
+                "MODEL_PRELOAD": "true",
+                "DEBUG": "false"
+            })
+        else:
+            logger.info("🔍 Detected standard memory environment")
+            settings.update({
+                "WHISPER_MODEL": "base",
+                "MAX_FILE_SIZE": 100 * 1024 * 1024,  # 100MB
+                "MODEL_PRELOAD": "true"
+            })
+    except Exception:
+        # Fallback to conservative settings
+        settings.update({
+            "WHISPER_MODEL": "tiny",
+            "MAX_FILE_SIZE": 50 * 1024 * 1024,
+            "MODEL_PRELOAD": "true"
+        })
+    return settings
+def apply_optimal_settings():
+    """Apply optimal settings if not already set"""
+    optimal = get_optimal_settings()
+    applied = []
+    for key, value in optimal.items():
+        if not os.getenv(key):
+            os.environ[key] = str(value)
+            applied.append(f"{key}={value}")
+    if applied:
+        logger.info("⚙️ Applied optimal settings:")
+        for setting in applied:
+            logger.info(f"   {setting}")
+if __name__ == "__main__":
+    # Test the restart handler
+    logging.basicConfig(level=logging.INFO)
+    setup_restart_prevention()
+    apply_optimal_settings()
+    logger.info("✅ Restart handler test completed")

setup.py ADDED Viewed

	@@ -0,0 +1,148 @@

+#!/usr/bin/env python3
+"""
+Setup script for Video Transcription Service
+"""
+import subprocess
+import sys
+import os
+import platform
+def run_command(command, description):
+    """Run a command and handle errors"""
+    print(f"📦 {description}...")
+    try:
+        result = subprocess.run(command, shell=True, check=True, capture_output=True, text=True)
+        print(f"✅ {description} completed")
+        return True
+    except subprocess.CalledProcessError as e:
+        print(f"❌ {description} failed:")
+        print(f"   Command: {command}")
+        print(f"   Error: {e.stderr}")
+        return False
+def check_python_version():
+    """Check if Python version is compatible"""
+    version = sys.version_info
+    if version.major < 3 or (version.major == 3 and version.minor < 8):
+        print(f"❌ Python 3.8+ required, found {version.major}.{version.minor}")
+        return False
+    print(f"✅ Python {version.major}.{version.minor}.{version.micro} is compatible")
+    return True
+def install_python_dependencies():
+    """Install Python dependencies"""
+    commands = [
+        ("pip install --upgrade pip", "Upgrading pip"),
+        ("pip install 'numpy<2.0.0'", "Installing compatible NumPy version"),
+        ("pip install -r requirements.txt", "Installing Python packages")
+    ]
+    for command, description in commands:
+        if not run_command(command, description):
+            return False
+    return True
+def check_ffmpeg():
+    """Check if FFmpeg is installed"""
+    try:
+        subprocess.run(['ffmpeg', '-version'], capture_output=True, check=True)
+        print("✅ FFmpeg is installed")
+        return True
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        print("❌ FFmpeg not found")
+        return False
+def install_ffmpeg_instructions():
+    """Show FFmpeg installation instructions"""
+    system = platform.system().lower()
+    print("\n📋 FFmpeg Installation Instructions:")
+    print("=" * 40)
+    if system == "windows":
+        print("Windows:")
+        print("1. Download FFmpeg from: https://ffmpeg.org/download.html")
+        print("2. Extract to C:\\ffmpeg")
+        print("3. Add C:\\ffmpeg\\bin to your PATH environment variable")
+        print("4. Restart your terminal/command prompt")
+    elif system == "darwin":  # macOS
+        print("macOS:")
+        print("1. Install Homebrew if not already installed:")
+        print("   /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"")
+        print("2. Install FFmpeg:")
+        print("   brew install ffmpeg")
+    else:  # Linux
+        print("Linux (Ubuntu/Debian):")
+        print("   sudo apt-get update && sudo apt-get install ffmpeg")
+        print("\nLinux (CentOS/RHEL):")
+        print("   sudo yum install ffmpeg")
+        print("\nLinux (Arch):")
+        print("   sudo pacman -S ffmpeg")
+def create_virtual_environment():
+    """Create and activate virtual environment"""
+    if os.path.exists('venv'):
+        print("✅ Virtual environment already exists")
+        return True
+    if not run_command(f"{sys.executable} -m venv venv", "Creating virtual environment"):
+        return False
+    print("\n📝 To activate the virtual environment:")
+    if platform.system().lower() == "windows":
+        print("   venv\\Scripts\\activate")
+    else:
+        print("   source venv/bin/activate")
+    return True
+def main():
+    print("🚀 Video Transcription Service Setup")
+    print("=" * 40)
+    # Check Python version
+    if not check_python_version():
+        sys.exit(1)
+    # Create virtual environment
+    print("\n1. Setting up virtual environment...")
+    if not create_virtual_environment():
+        print("❌ Failed to create virtual environment")
+        sys.exit(1)
+    # Install Python dependencies
+    print("\n2. Installing Python dependencies...")
+    if not install_python_dependencies():
+        print("❌ Failed to install Python dependencies")
+        print("\n💡 Try running these commands manually:")
+        print("   pip install --upgrade pip")
+        print("   pip install -r requirements.txt")
+        sys.exit(1)
+    # Check FFmpeg
+    print("\n3. Checking FFmpeg...")
+    if not check_ffmpeg():
+        install_ffmpeg_instructions()
+        print("\n⚠️  Please install FFmpeg and run this setup again")
+        sys.exit(1)
+    # Success
+    print("\n🎉 Setup completed successfully!")
+    print("=" * 40)
+    print("\n📋 Next steps:")
+    print("1. Activate virtual environment (if not already active)")
+    if platform.system().lower() == "windows":
+        print("   venv\\Scripts\\activate")
+    else:
+        print("   source venv/bin/activate")
+    print("2. Start the service:")
+    print("   python start.py")
+    print("   OR")
+    print("   python main.py")
+    print("3. Open your browser to:")
+    print("   http://localhost:8000/docs")
+    print("\n📖 For deployment instructions, see DEPLOYMENT.md")
+if __name__ == "__main__":
+    main()

start.py ADDED Viewed

	@@ -0,0 +1,113 @@

+#!/usr/bin/env python3
+"""
+Development startup script for Video Transcription Service
+"""
+import subprocess
+import sys
+import os
+import time
+import requests
+def check_dependencies():
+    """Check if required dependencies are installed"""
+    print("Checking dependencies...")
+    # Check Python packages
+    try:
+        import fastapi
+        import whisper
+        import ffmpeg
+        print("✅ Python packages installed")
+    except ImportError as e:
+        print(f"❌ Missing Python package: {e}")
+        print("Run: pip install -r requirements.txt")
+        return False
+    # Check FFmpeg
+    try:
+        subprocess.run(['ffmpeg', '-version'], capture_output=True, check=True)
+        print("✅ FFmpeg installed")
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        print("❌ FFmpeg not found")
+        print("Install FFmpeg:")
+        print("  Windows: Download from https://ffmpeg.org/download.html")
+        print("  macOS: brew install ffmpeg")
+        print("  Linux: sudo apt-get install ffmpeg")
+        return False
+    return True
+def start_server():
+    """Start the development server"""
+    print("\nStarting Video Transcription Service...")
+    print("=" * 50)
+    try:
+        # Start the server
+        process = subprocess.Popen([
+            sys.executable, '-m', 'uvicorn',
+            'main:app',
+            '--host', '0.0.0.0',
+            '--port', '8000',
+            '--reload'
+        ])
+        # Wait for server to start
+        print("Waiting for server to start...")
+        for i in range(30):  # Wait up to 30 seconds
+            try:
+                response = requests.get('http://localhost:8000/health', timeout=1)
+                if response.status_code == 200:
+                    break
+            except:
+                pass
+            time.sleep(1)
+            print(f"  Attempt {i+1}/30...")
+        else:
+            print("❌ Server failed to start within 30 seconds")
+            process.terminate()
+            return False
+        print("\n🚀 Server started successfully!")
+        print("=" * 50)
+        print("📍 Service URL: http://localhost:8000")
+        print("📖 API Docs: http://localhost:8000/docs")
+        print("🔍 Health Check: http://localhost:8000/health")
+        print("=" * 50)
+        print("\nPress Ctrl+C to stop the server")
+        # Wait for user to stop
+        try:
+            process.wait()
+        except KeyboardInterrupt:
+            print("\n\nStopping server...")
+            process.terminate()
+            process.wait()
+            print("✅ Server stopped")
+        return True
+    except Exception as e:
+        print(f"❌ Failed to start server: {e}")
+        return False
+def main():
+    print("Video Transcription Service - Development Startup")
+    print("=" * 50)
+    # Check if we're in the right directory
+    if not os.path.exists('main.py'):
+        print("❌ main.py not found. Make sure you're in the project directory.")
+        sys.exit(1)
+    # Check dependencies
+    if not check_dependencies():
+        sys.exit(1)
+    # Start server
+    if not start_server():
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

start_robust.py ADDED Viewed

	@@ -0,0 +1,155 @@

+#!/usr/bin/env python3
+"""
+Robust startup script for Video Transcription Service
+Handles restarts and optimizes for free tier hosting
+"""
+import os
+import sys
+import time
+import subprocess
+import logging
+from datetime import datetime
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+def detect_environment():
+    """Detect if running on Render.com or locally"""
+    if os.getenv("RENDER"):
+        return "render"
+    elif os.getenv("PORT"):
+        return "cloud"
+    else:
+        return "local"
+def get_optimal_env_vars():
+    """Get optimal environment variables for the detected environment"""
+    env = detect_environment()
+    base_vars = {
+        "PYTHONUNBUFFERED": "1",
+        "MODEL_PRELOAD": "true"
+    }
+    if env == "render":
+        logger.info("🌐 Detected Render.com environment")
+        base_vars.update({
+            "WHISPER_MODEL": "tiny",  # Faster loading, less memory
+            "DEBUG": "false",         # Reduce log overhead
+            "LOG_TO_FILE": "false"    # No file logging on Render
+        })
+    elif env == "cloud":
+        logger.info("☁️ Detected cloud environment")
+        base_vars.update({
+            "WHISPER_MODEL": "tiny",
+            "DEBUG": "false"
+        })
+    else:
+        logger.info("💻 Detected local environment")
+        base_vars.update({
+            "WHISPER_MODEL": os.getenv("WHISPER_MODEL", "base"),
+            "DEBUG": os.getenv("DEBUG", "true")
+        })
+    return base_vars
+def preload_model():
+    """Preload the Whisper model to avoid request timeouts"""
+    try:
+        logger.info("🤖 Preloading Whisper model...")
+        # Import and load model
+        import whisper
+        model_name = os.getenv("WHISPER_MODEL", "tiny")
+        start_time = time.time()
+        model = whisper.load_model(model_name)
+        load_time = time.time() - start_time
+        logger.info(f"✅ Model '{model_name}' preloaded in {load_time:.2f} seconds")
+        return True
+    except Exception as e:
+        logger.error(f"❌ Model preload failed: {e}")
+        return False
+def start_service():
+    """Start the FastAPI service with optimal settings"""
+    env_vars = get_optimal_env_vars()
+    # Set environment variables
+    for key, value in env_vars.items():
+        if not os.getenv(key):
+            os.environ[key] = value
+            logger.info(f"⚙️ Set {key}={value}")
+    # Log configuration
+    logger.info("📋 Service Configuration:")
+    logger.info(f"   🤖 Whisper Model: {os.getenv('WHISPER_MODEL', 'base')}")
+    logger.info(f"   🔧 Debug Mode: {os.getenv('DEBUG', 'false')}")
+    logger.info(f"   📥 Model Preload: {os.getenv('MODEL_PRELOAD', 'true')}")
+    logger.info(f"   🌐 Port: {os.getenv('PORT', '8000')}")
+    # Preload model if enabled
+    if os.getenv("MODEL_PRELOAD", "true").lower() == "true":
+        if not preload_model():
+            logger.warning("⚠️ Continuing without model preload...")
+    # Start the service
+    try:
+        logger.info("🚀 Starting FastAPI service...")
+        # Use uvicorn directly
+        import uvicorn
+        from main import app
+        port = int(os.getenv("PORT", 8000))
+        host = os.getenv("HOST", "0.0.0.0")
+        uvicorn.run(
+            app,
+            host=host,
+            port=port,
+            log_level="info",
+            access_log=True,
+            timeout_keep_alive=30,
+            timeout_graceful_shutdown=30
+        )
+    except KeyboardInterrupt:
+        logger.info("🛑 Service stopped by user")
+    except Exception as e:
+        logger.error(f"❌ Service failed: {e}")
+        sys.exit(1)
+def check_dependencies():
+    """Check if all dependencies are installed"""
+    try:
+        import fastapi
+        import whisper
+        import torch
+        logger.info("✅ Core dependencies available")
+        return True
+    except ImportError as e:
+        logger.error(f"❌ Missing dependency: {e}")
+        logger.error("Run: pip install -r requirements.txt")
+        return False
+def main():
+    logger.info("🚀 Video Transcription Service - Robust Startup")
+    logger.info("=" * 50)
+    # Check dependencies
+    if not check_dependencies():
+        sys.exit(1)
+    # Start service
+    start_service()
+if __name__ == "__main__":
+    main()

storage.py ADDED Viewed

	@@ -0,0 +1,158 @@

+import asyncio
+from datetime import datetime, timedelta, timezone
+from typing import Dict, Optional
+from models import TranscriptionResult, TranscriptionStatus
+from config import settings
+import logging
+# Configure logging for this module
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+class InMemoryStorage:
+    def __init__(self):
+        self._storage: Dict[int, TranscriptionResult] = {}
+        self._next_id = 1
+        self._cleanup_task = None
+    async def start_cleanup_task(self):
+        """Start the background cleanup task"""
+        if self._cleanup_task is None:
+            logger.info("🧹 Starting automatic cleanup task")
+            logger.info(f"⏰ Cleanup interval: {settings.CLEANUP_INTERVAL_HOURS} hours")
+            self._cleanup_task = asyncio.create_task(self._cleanup_loop())
+        else:
+            logger.info("🧹 Cleanup task already running")
+    async def stop_cleanup_task(self):
+        """Stop the background cleanup task"""
+        if self._cleanup_task:
+            logger.info("🛑 Stopping cleanup task")
+            self._cleanup_task.cancel()
+            try:
+                await self._cleanup_task
+            except asyncio.CancelledError:
+                pass
+            self._cleanup_task = None
+            logger.info("✅ Cleanup task stopped")
+        else:
+            logger.info("🧹 No cleanup task to stop")
+    def create_transcription(self, language: Optional[str] = None) -> int:
+        """Create a new transcription entry and return its ID"""
+        transcription_id = self._next_id
+        self._next_id += 1
+        logger.info(f"📝 Creating new transcription entry with ID: {transcription_id}")
+        logger.info(f"🌐 Language: {language or 'auto-detect'}")
+        result = TranscriptionResult(
+            id=transcription_id,
+            status=TranscriptionStatus.PENDING,
+            language=language,
+            created_at=datetime.now(timezone.utc)
+        )
+        self._storage[transcription_id] = result
+        logger.info(f"✅ Transcription {transcription_id} created successfully")
+        logger.info(f"📊 Total active transcriptions: {len(self._storage)}")
+        return transcription_id
+    def get_transcription(self, transcription_id: int) -> Optional[TranscriptionResult]:
+        """Get transcription by ID"""
+        logger.info(f"🔍 Looking up transcription ID: {transcription_id}")
+        result = self._storage.get(transcription_id)
+        if result:
+            logger.info(f"✅ Found transcription {transcription_id} with status: {result.status}")
+        else:
+            logger.warning(f"❌ Transcription {transcription_id} not found")
+        return result
+    def update_transcription(self, transcription_id: int, **kwargs) -> bool:
+        """Update transcription fields"""
+        if transcription_id not in self._storage:
+            logger.warning(f"❌ Cannot update transcription {transcription_id} - not found")
+            return False
+        result = self._storage[transcription_id]
+        old_status = result.status if hasattr(result, 'status') else 'unknown'
+        for key, value in kwargs.items():
+            if hasattr(result, key):
+                setattr(result, key, value)
+        new_status = result.status if hasattr(result, 'status') else 'unknown'
+        logger.info(f"📝 Updated transcription {transcription_id}")
+        if 'status' in kwargs:
+            logger.info(f"🔄 Status changed: {old_status} → {new_status}")
+        # Log specific updates
+        for key, value in kwargs.items():
+            if key == 'text' and value:
+                text_preview = value[:50] + "..." if len(value) > 50 else value
+                logger.info(f"📄 Text updated: {text_preview}")
+            elif key == 'error_message' and value:
+                logger.error(f"❌ Error recorded: {value}")
+            elif key not in ['status', 'text', 'error_message']:
+                logger.info(f"📊 {key}: {value}")
+        return True
+    def delete_transcription(self, transcription_id: int) -> bool:
+        """Delete transcription by ID"""
+        if transcription_id in self._storage:
+            result = self._storage[transcription_id]
+            del self._storage[transcription_id]
+            logger.info(f"🗑️ Deleted transcription {transcription_id} (status: {result.status})")
+            logger.info(f"📊 Remaining transcriptions: {len(self._storage)}")
+            return True
+        else:
+            logger.warning(f"❌ Cannot delete transcription {transcription_id} - not found")
+            return False
+    async def _cleanup_loop(self):
+        """Background task to clean up old transcriptions"""
+        logger.info("🧹 Cleanup loop started")
+        while True:
+            try:
+                logger.info("😴 Cleanup sleeping for 1 hour...")
+                await asyncio.sleep(3600)  # Check every hour
+                logger.info("⏰ Running scheduled cleanup...")
+                await self._cleanup_old_transcriptions()
+            except asyncio.CancelledError:
+                logger.info("🛑 Cleanup loop cancelled")
+                break
+            except Exception as e:
+                logger.error(f"❌ Error in cleanup loop: {e}")
+    async def _cleanup_old_transcriptions(self):
+        """Remove transcriptions older than the configured time"""
+        logger.info("🧹 Starting cleanup of old transcriptions...")
+        cutoff_time = datetime.now(timezone.utc) - timedelta(hours=settings.CLEANUP_INTERVAL_HOURS)
+        logger.info(f"⏰ Cutoff time: {cutoff_time} (older than {settings.CLEANUP_INTERVAL_HOURS} hours)")
+        to_delete = []
+        for transcription_id, result in self._storage.items():
+            age_hours = (datetime.now(timezone.utc) - result.created_at).total_seconds() / 3600
+            if result.created_at < cutoff_time:
+                logger.info(f"🗑️ Marking transcription {transcription_id} for deletion (age: {age_hours:.1f} hours)")
+                to_delete.append(transcription_id)
+        if not to_delete:
+            logger.info("✅ No old transcriptions to clean up")
+            return
+        logger.info(f"🧹 Deleting {len(to_delete)} old transcriptions...")
+        for transcription_id in to_delete:
+            self.delete_transcription(transcription_id)
+        logger.info(f"✅ Cleanup completed - removed {len(to_delete)} transcriptions")
+        logger.info(f"📊 Active transcriptions remaining: {len(self._storage)}")
+# Global storage instance
+storage = InMemoryStorage()

test_api.py ADDED Viewed

	@@ -0,0 +1,130 @@

+#!/usr/bin/env python3
+"""
+Simple test script for the Video Transcription Service
+"""
+import requests
+import time
+import sys
+import os
+def test_transcription_service(base_url="http://localhost:8000", video_file=None):
+    """Test the transcription service with a video file"""
+    print(f"Testing Video Transcription Service at {base_url}")
+    print("=" * 50)
+    # Test 1: Health check
+    print("1. Testing health check...")
+    try:
+        response = requests.get(f"{base_url}/health")
+        if response.status_code == 200:
+            print("✅ Health check passed")
+            print(f"   Response: {response.json()}")
+        else:
+            print(f"❌ Health check failed: {response.status_code}")
+            return False
+    except Exception as e:
+        print(f"❌ Health check error: {e}")
+        return False
+    # Test 2: Root endpoint
+    print("\n2. Testing root endpoint...")
+    try:
+        response = requests.get(f"{base_url}/")
+        if response.status_code == 200:
+            print("✅ Root endpoint passed")
+            print(f"   Response: {response.json()}")
+        else:
+            print(f"❌ Root endpoint failed: {response.status_code}")
+    except Exception as e:
+        print(f"❌ Root endpoint error: {e}")
+    # Test 3: File upload (if video file provided)
+    if video_file and os.path.exists(video_file):
+        print(f"\n3. Testing video upload with {video_file}...")
+        try:
+            with open(video_file, 'rb') as f:
+                files = {'file': f}
+                data = {'language': 'en'}
+                response = requests.post(f"{base_url}/transcribe", files=files, data=data)
+            if response.status_code == 200:
+                result = response.json()
+                transcription_id = result['id']
+                print("✅ Video upload successful")
+                print(f"   Transcription ID: {transcription_id}")
+                print(f"   Status: {result['status']}")
+                # Test 4: Check transcription status
+                print(f"\n4. Checking transcription status...")
+                max_attempts = 30  # Wait up to 5 minutes
+                for attempt in range(max_attempts):
+                    try:
+                        response = requests.get(f"{base_url}/transcribe/{transcription_id}")
+                        if response.status_code == 200:
+                            result = response.json()
+                            status = result['status']
+                            print(f"   Attempt {attempt + 1}: Status = {status}")
+                            if status == 'completed':
+                                print("✅ Transcription completed!")
+                                print(f"   Text: {result['text'][:100]}...")
+                                print(f"   Language: {result.get('language', 'N/A')}")
+                                print(f"   Duration: {result.get('duration', 'N/A')} seconds")
+                                break
+                            elif status == 'failed':
+                                print(f"❌ Transcription failed: {result.get('error_message', 'Unknown error')}")
+                                break
+                            elif status in ['pending', 'processing']:
+                                time.sleep(10)  # Wait 10 seconds before next check
+                            else:
+                                print(f"❌ Unknown status: {status}")
+                                break
+                        else:
+                            print(f"❌ Status check failed: {response.status_code}")
+                            break
+                    except Exception as e:
+                        print(f"❌ Status check error: {e}")
+                        break
+                else:
+                    print("⏰ Transcription timed out (5 minutes)")
+            else:
+                print(f"❌ Video upload failed: {response.status_code}")
+                print(f"   Response: {response.text}")
+        except Exception as e:
+            print(f"❌ Video upload error: {e}")
+    else:
+        print(f"\n3. Skipping video upload test (no video file provided)")
+        print(f"   To test with a video file, run: python test_api.py <video_file>")
+    # Test 5: Invalid transcription ID
+    print(f"\n5. Testing invalid transcription ID...")
+    try:
+        response = requests.get(f"{base_url}/transcribe/99999")
+        if response.status_code == 404:
+            print("✅ Invalid ID handling works correctly")
+        else:
+            print(f"❌ Invalid ID test failed: {response.status_code}")
+    except Exception as e:
+        print(f"❌ Invalid ID test error: {e}")
+    print("\n" + "=" * 50)
+    print("Test completed!")
+    return True
+if __name__ == "__main__":
+    # Get base URL from environment or use default
+    base_url = os.getenv("API_URL", "http://localhost:8000")
+    # Get video file from command line argument
+    video_file = sys.argv[1] if len(sys.argv) > 1 else None
+    if video_file and not os.path.exists(video_file):
+        print(f"Error: Video file '{video_file}' not found")
+        sys.exit(1)
+    success = test_transcription_service(base_url, video_file)
+    sys.exit(0 if success else 1)

transcription_service.py ADDED Viewed

	@@ -0,0 +1,304 @@

+import whisper
+import ffmpeg
+import tempfile
+import os
+import asyncio
+import logging
+import time
+from typing import Optional
+from datetime import datetime, timezone
+from storage import storage
+from models import TranscriptionStatus
+from config import settings
+# Configure logging for this module
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+class TranscriptionService:
+    def __init__(self):
+        self._model = None
+        self._model_loading = False
+        self._model_load_error = None
+    async def preload_model(self):
+        """Preload Whisper model during startup to avoid request timeouts"""
+        if self._model is not None:
+            logger.info("🤖 Whisper model already loaded")
+            return True
+        if self._model_load_error:
+            logger.error(f"❌ Previous model load failed: {self._model_load_error}")
+            return False
+        try:
+            logger.info(f"🚀 Preloading Whisper model: {settings.WHISPER_MODEL}")
+            logger.info("📥 This may take 30-60 seconds for first-time download...")
+            logger.info("⚡ Preloading during startup to avoid request timeouts...")
+            start_time = time.time()
+            # Run in thread pool to avoid blocking startup
+            loop = asyncio.get_event_loop()
+            self._model = await loop.run_in_executor(
+                None,
+                whisper.load_model,
+                settings.WHISPER_MODEL
+            )
+            load_time = time.time() - start_time
+            logger.info(f"✅ Whisper model preloaded successfully in {load_time:.2f} seconds")
+            logger.info("🎯 Service ready for transcription requests!")
+            return True
+        except Exception as e:
+            error_msg = f"Failed to preload Whisper model: {str(e)}"
+            logger.error(f"❌ {error_msg}")
+            self._model_load_error = error_msg
+            return False
+    async def _load_model(self):
+        """Load Whisper model asynchronously (fallback if not preloaded)"""
+        if self._model is not None:
+            logger.info("🤖 Whisper model already loaded")
+            return
+        if self._model_load_error:
+            logger.error(f"❌ Model load error: {self._model_load_error}")
+            raise Exception(self._model_load_error)
+        if self._model_loading:
+            logger.info("⏳ Whisper model is currently loading, waiting...")
+            # Wait for model to load
+            while self._model_loading:
+                await asyncio.sleep(0.1)
+            if self._model is None:
+                raise Exception("Model loading failed")
+            logger.info("✅ Whisper model loading completed (waited)")
+            return
+        # If we get here, model wasn't preloaded - try to load it now
+        logger.warning("⚠️ Model not preloaded, loading during request (may cause timeout)")
+        self._model_loading = True
+        try:
+            logger.info(f"🤖 Loading Whisper model: {settings.WHISPER_MODEL}")
+            start_time = time.time()
+            # Run in thread pool to avoid blocking
+            loop = asyncio.get_event_loop()
+            self._model = await loop.run_in_executor(
+                None,
+                whisper.load_model,
+                settings.WHISPER_MODEL
+            )
+            load_time = time.time() - start_time
+            logger.info(f"✅ Whisper model loaded successfully in {load_time:.2f} seconds")
+        except Exception as e:
+            error_msg = f"Failed to load Whisper model: {str(e)}"
+            logger.error(f"❌ {error_msg}")
+            self._model_load_error = error_msg
+            raise Exception(error_msg)
+        finally:
+            self._model_loading = False
+    async def transcribe_video(self, video_content: bytes, transcription_id: int, language: Optional[str] = None):
+        """Transcribe video content asynchronously"""
+        start_time = time.time()
+        logger.info(f"🎬 Starting video transcription for ID: {transcription_id}")
+        logger.info(f"📊 Video size: {len(video_content) / (1024*1024):.2f}MB")
+        logger.info(f"🌐 Language: {language or 'auto-detect'}")
+        # Check memory before starting
+        from restart_handler import check_service_health
+        if check_service_health():
+            logger.warning(f"⚠️ High memory usage detected before transcription {transcription_id}")
+        try:
+            # Update status to processing
+            logger.info(f"📝 Updating status to PROCESSING for ID: {transcription_id}")
+            storage.update_transcription(
+                transcription_id,
+                status=TranscriptionStatus.PROCESSING
+            )
+            # Load model if needed
+            logger.info(f"🤖 Loading Whisper model for transcription {transcription_id}")
+            await self._load_model()
+            # Extract audio from video
+            logger.info(f"🎵 Extracting audio from video for transcription {transcription_id}")
+            audio_start = time.time()
+            audio_path = await self._extract_audio(video_content)
+            audio_time = time.time() - audio_start
+            logger.info(f"✅ Audio extraction completed in {audio_time:.2f} seconds")
+            try:
+                # Transcribe audio
+                logger.info(f"🗣️ Starting audio transcription for ID {transcription_id}")
+                transcribe_start = time.time()
+                result = await self._transcribe_audio(audio_path, language)
+                transcribe_time = time.time() - transcribe_start
+                # Log transcription results
+                text_length = len(result["text"]) if result["text"] else 0
+                logger.info(f"✅ Transcription completed in {transcribe_time:.2f} seconds")
+                logger.info(f"📝 Transcribed text length: {text_length} characters")
+                logger.info(f"🌐 Detected language: {result.get('language', 'unknown')}")
+                logger.info(f"⏱️ Audio duration: {result.get('duration', 'unknown')} seconds")
+                # Update storage with results
+                logger.info(f"💾 Saving transcription results for ID {transcription_id}")
+                storage.update_transcription(
+                    transcription_id,
+                    status=TranscriptionStatus.COMPLETED,
+                    text=result["text"],
+                    language=result["language"],
+                    duration=result.get("duration"),
+                    completed_at=datetime.now(timezone.utc)
+                )
+                total_time = time.time() - start_time
+                logger.info(f"🎉 Transcription {transcription_id} completed successfully in {total_time:.2f} seconds total")
+            finally:
+                # Clean up audio file
+                if os.path.exists(audio_path):
+                    logger.info(f"🧹 Cleaning up temporary audio file")
+                    os.unlink(audio_path)
+        except Exception as e:
+            total_time = time.time() - start_time
+            logger.error(f"❌ Transcription {transcription_id} failed after {total_time:.2f} seconds: {str(e)}")
+            logger.error(f"🔍 Error type: {type(e).__name__}")
+            storage.update_transcription(
+                transcription_id,
+                status=TranscriptionStatus.FAILED,
+                error_message=str(e),
+                completed_at=datetime.now(timezone.utc)
+            )
+    async def _extract_audio(self, video_content: bytes) -> str:
+        """Extract audio from video content"""
+        logger.info("📁 Creating temporary video file...")
+        # Create temporary files
+        with tempfile.NamedTemporaryFile(delete=False, suffix='.tmp') as video_file:
+            video_file.write(video_content)
+            video_path = video_file.name
+        audio_path = tempfile.mktemp(suffix='.wav')
+        logger.info(f"📁 Temporary files created - Video: {video_path}, Audio: {audio_path}")
+        try:
+            # Extract audio using ffmpeg
+            logger.info("🎵 Running FFmpeg to extract audio...")
+            loop = asyncio.get_event_loop()
+            await loop.run_in_executor(
+                None,
+                self._extract_audio_sync,
+                video_path,
+                audio_path
+            )
+            # Check if audio file was created successfully
+            if os.path.exists(audio_path):
+                audio_size = os.path.getsize(audio_path)
+                logger.info(f"✅ Audio extraction successful - Size: {audio_size / (1024*1024):.2f}MB")
+            else:
+                logger.error("❌ Audio file was not created")
+                raise Exception("Audio extraction failed - no output file")
+            return audio_path
+        finally:
+            # Clean up video file
+            if os.path.exists(video_path):
+                logger.info("🧹 Cleaning up temporary video file")
+                os.unlink(video_path)
+    def _extract_audio_sync(self, video_path: str, audio_path: str):
+        """Synchronous audio extraction"""
+        try:
+            logger.info("🔧 Configuring FFmpeg for audio extraction...")
+            logger.info("   - Codec: PCM 16-bit")
+            logger.info("   - Channels: 1 (mono)")
+            logger.info("   - Sample rate: 16kHz")
+            (
+                ffmpeg
+                .input(video_path)
+                .output(audio_path, acodec='pcm_s16le', ac=1, ar='16000')
+                .overwrite_output()
+                .run(quiet=True)
+            )
+            logger.info("✅ FFmpeg audio extraction completed")
+        except Exception as e:
+            logger.error(f"❌ FFmpeg audio extraction failed: {str(e)}")
+            raise
+    async def _transcribe_audio(self, audio_path: str, language: Optional[str] = None) -> dict:
+        """Transcribe audio file"""
+        logger.info(f"🗣️ Starting Whisper transcription...")
+        logger.info(f"🎵 Audio file: {audio_path}")
+        logger.info(f"🌐 Language: {language or 'auto-detect'}")
+        loop = asyncio.get_event_loop()
+        # Run transcription in thread pool
+        logger.info("⚡ Running transcription in background thread...")
+        result = await loop.run_in_executor(
+            None,
+            self._transcribe_audio_sync,
+            audio_path,
+            language
+        )
+        logger.info("✅ Whisper transcription completed")
+        return result
+    def _transcribe_audio_sync(self, audio_path: str, language: Optional[str] = None) -> dict:
+        """Synchronous audio transcription"""
+        try:
+            logger.info("🤖 Preparing Whisper transcription options...")
+            options = {}
+            if language:
+                options['language'] = language
+                logger.info(f"🌐 Language specified: {language}")
+            else:
+                logger.info("🌐 Language: auto-detect")
+            logger.info("🎯 Starting Whisper model inference...")
+            start_time = time.time()
+            result = self._model.transcribe(audio_path, **options)
+            inference_time = time.time() - start_time
+            # Log detailed results
+            text = result["text"].strip()
+            detected_language = result.get("language", "unknown")
+            duration = result.get("duration", 0)
+            logger.info(f"✅ Whisper inference completed in {inference_time:.2f} seconds")
+            logger.info(f"📝 Text length: {len(text)} characters")
+            logger.info(f"🌐 Detected language: {detected_language}")
+            logger.info(f"⏱️ Audio duration: {duration:.2f} seconds")
+            if len(text) > 100:
+                logger.info(f"📄 Text preview: {text[:100]}...")
+            else:
+                logger.info(f"📄 Full text: {text}")
+            return {
+                "text": text,
+                "language": detected_language,
+                "duration": duration
+            }
+        except Exception as e:
+            logger.error(f"❌ Whisper transcription failed: {str(e)}")
+            logger.error(f"🔍 Error type: {type(e).__name__}")
+            raise
+# Global service instance
+transcription_service = TranscriptionService()