# Doctra Hugging Face Spaces Deployment Guide ## 🚀 Quick Deployment ### Option 1: Direct Upload to Hugging Face Spaces 1. **Create a new Space**: - Go to [Hugging Face Spaces](https://huggingface.co/spaces) - Click "Create new Space" - Choose "Gradio" as the SDK - Set the title to "Doctra - Document Parser" 2. **Upload files**: - Upload all files from this `hf_space` folder to your Space - Make sure `app.py` is in the root directory 3. **Configure environment**: - Go to Settings → Secrets - Add `VLM_API_KEY` if you want to use VLM features - Set the value to your API key (OpenAI, Anthropic, Google, etc.) ### Option 2: Git Repository Deployment 1. **Create a Git repository**: ```bash git init git add . git commit -m "Initial Doctra HF Space deployment" git remote add origin git push -u origin main ``` 2. **Connect to Hugging Face Spaces**: - Create a new Space - Choose "Git repository" as the source - Enter your repository URL - Set the app file to `app.py` ### Option 3: Docker Deployment 1. **Build the Docker image**: ```bash docker build -t doctra-hf-space . ``` 2. **Run the container**: ```bash docker run -p 7860:7860 doctra-hf-space ``` ## 🔧 Configuration ### Environment Variables Set these in your Hugging Face Space settings: - `VLM_API_KEY`: Your API key for VLM providers - `GRADIO_SERVER_NAME`: Server hostname (default: 0.0.0.0) - `GRADIO_SERVER_PORT`: Server port (default: 7860) ### Hardware Requirements - **CPU**: Minimum 2 cores recommended - **RAM**: Minimum 4GB, 8GB+ recommended - **Storage**: 10GB+ for models and dependencies - **GPU**: Optional but recommended for faster processing ## 📊 Performance Optimization ### For Hugging Face Spaces 1. **Use CPU-optimized models** when GPU is not available 2. **Reduce DPI settings** for faster processing 3. **Process smaller documents** to avoid memory issues 4. **Enable caching** for repeated operations ### For Local Deployment 1. **Use GPU acceleration** when available 2. **Increase memory limits** for large documents 3. **Use SSD storage** for better I/O performance 4. **Configure proper logging** for debugging ## 🐛 Troubleshooting ### Common Issues 1. **Import Errors**: - Check that all dependencies are in `requirements.txt` - Verify Python version compatibility 2. **Memory Issues**: - Reduce DPI settings - Process smaller documents - Increase available memory 3. **API Key Issues**: - Verify API key is correctly set - Check provider-specific requirements - Test API connectivity 4. **File Upload Issues**: - Check file size limits - Verify file format support - Ensure proper permissions ### Debug Mode To enable debug mode, set: ```bash export GRADIO_DEBUG=1 ``` ## 📈 Monitoring ### Health Checks - Monitor CPU and memory usage - Check disk space availability - Verify API key validity - Test document processing pipeline ### Logs - Application logs: Check Gradio output - Error logs: Monitor for exceptions - Performance logs: Track processing times - User logs: Monitor usage patterns ## 🔄 Updates ### Updating the Application 1. **Code updates**: Push changes to your repository 2. **Dependency updates**: Update `requirements.txt` 3. **Model updates**: Download new model versions 4. **Configuration updates**: Modify environment variables ### Version Control - Use semantic versioning - Tag releases appropriately - Maintain changelog - Test before deployment ## 🛡️ Security ### Best Practices 1. **API Keys**: Store securely, never commit to code 2. **File Uploads**: Validate file types and sizes 3. **Rate Limiting**: Implement to prevent abuse 4. **Input Validation**: Sanitize all user inputs ### Privacy - No data is stored permanently - Files are processed in temporary directories - API calls are made securely - User data is not logged ## 📞 Support For issues and questions: 1. **GitHub Issues**: Report bugs and feature requests 2. **Documentation**: Check the main README.md 3. **Community**: Join discussions on Hugging Face 4. **Email**: Contact the development team ## 🎯 Next Steps After successful deployment: 1. **Test all features** with sample documents 2. **Configure monitoring** and alerting 3. **Set up backups** for important data 4. **Plan for scaling** based on usage 5. **Gather user feedback** for improvements