MogensR commited on
Commit
8987747
·
1 Parent(s): 0b5fadf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -288
README.md CHANGED
@@ -8,294 +8,93 @@ sdk_version: "4.44.1"
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- python_version: "3.9"
12
- ---
13
-
14
- # Video Background Replacement
15
-
16
- Professional-quality video background replacement using SAM2 + MatAnyone for precise segmentation and cinema-grade compositing.
17
-
18
- ## Features
19
-
20
- ### Core Capabilities
21
- - **High-Quality Segmentation**: SAM2-powered person detection with sub-pixel accuracy
22
- - **Advanced Matting**: MatAnyone integration for professional edge refinement
23
- - **Dual Processing Modes**:
24
- - **Single-Stage**: Direct background replacement (fast)
25
- - **Two-Stage**: Green screen intermediate for broadcast-quality results
26
- - **Multiple Background Options**: Custom uploads, professional presets, procedural generation
27
- - **GPU Accelerated**: CUDA optimization for NVIDIA GPUs with CPU fallback
28
-
29
- ### Technical Highlights
30
- - Keyframe-based processing with temporal consistency
31
- - Automatic memory management and error recovery
32
- - Real-time progress tracking with ETA estimation
33
- - Audio preservation throughout processing
34
- - Robust codec fallback system
35
-
36
- ## Quick Start
37
-
38
- ### Option 1: Docker (Recommended)
39
-
40
- ```bash
41
- # Clone repository
42
- git clone <your-repo-url>
43
- cd video-background-replacement
44
-
45
- # Build and run with GPU support
46
- docker build -t video-bg-replacement .
47
- docker run --gpus all -p 7860:7860 video-bg-replacement
48
- ```
49
-
50
- ### Option 2: Local Installation
51
-
52
- ```bash
53
- # Clone repository
54
- git clone <your-repo-url>
55
- cd video-background-replacement
56
-
57
- # Create virtual environment
58
- python -m venv venv
59
- source venv/bin/activate # On Windows: venv\Scripts\activate
60
-
61
- # Install dependencies
62
- pip install -r requirements.txt
63
-
64
- # Run application
65
- python app.py
66
- ```
67
-
68
- ## System Requirements
69
-
70
- ### Minimum Requirements
71
- - Python 3.10+
72
- - 8GB RAM
73
- - 4GB storage space
74
- - FFmpeg installed
75
-
76
- ### Recommended for Best Performance
77
- - NVIDIA GPU with 6GB+ VRAM
78
- - 16GB+ RAM
79
- - CUDA 12.1+ support
80
- - Fast SSD storage
81
-
82
- ### Supported Platforms
83
- - Linux (Ubuntu 20.04+, tested)
84
- - Windows 10/11 with WSL2
85
- - macOS (CPU-only, limited testing)
86
-
87
- ## Usage Guide
88
-
89
- ### Basic Workflow
90
-
91
- 1. **Launch Application**
92
- ```bash
93
- python app.py
94
- ```
95
- Access web interface at `http://localhost:7860`
96
-
97
- 2. **Load Models** (first-time setup)
98
- - Click "Load Models" button
99
- - Wait for SAM2 and MatAnyone to download and initialize
100
- - Status will show "Models loaded and validated"
101
-
102
- 3. **Process Video**
103
- - Upload your video file (MP4, AVI, MOV supported)
104
- - Choose background method:
105
- - **Professional Presets**: Studio-quality backgrounds
106
- - **Custom Upload**: Your own background image
107
- - Select processing options:
108
- - **Two-Stage Mode**: Better quality, slower processing
109
- - **Quality Preset**: Fast/Balanced/High
110
- - Click "Process Video"
111
-
112
- ### Processing Modes
113
-
114
- #### Single-Stage Mode (Default)
115
- - Direct background replacement
116
- - Faster processing (2-5x speed)
117
- - Good quality for most use cases
118
- - Recommended for: Social media, quick edits, testing
119
-
120
- #### Two-Stage Mode (Premium)
121
- - Green screen intermediate step
122
- - Cinema-quality edge compositing
123
- - Advanced chroma key algorithms
124
- - Recommended for: Professional content, broadcast, film
125
-
126
- ### Background Options
127
-
128
- #### Professional Presets
129
- - `office_modern`: Clean contemporary office
130
- - `studio_blue`: Broadcast-quality blue background
131
- - `studio_green`: Professional green screen replacement
132
- - `minimalist`: Clean white gradient
133
- - `warm_gradient`: Warm sunset atmosphere
134
- - `tech_dark`: Modern tech/gaming setup
135
-
136
- #### Custom Backgrounds
137
- - Upload any image (JPG, PNG supported)
138
- - Automatically resized to match video resolution
139
- - Best results with high-resolution images (1920x1080+)
140
-
141
- ## Configuration
142
-
143
- ### Environment Variables
144
-
145
- ```bash
146
- # Model settings
147
- export MODEL_CACHE_DIR="/path/to/model/cache"
148
- export FORCE_CPU="false"
149
- export DISABLE_MATANYONE="false"
150
-
151
- # Processing settings
152
- export KEYFRAME_INTERVAL="5"
153
- export FRAME_SKIP="1"
154
- export QUALITY_PRESET="balanced"
155
-
156
- # Video encoding
157
- export OUTPUT_CODEC="mp4v"
158
- export CRF="18"
159
- ```
160
-
161
- ### Quality Presets
162
-
163
- | Preset | Speed | Quality | Use Case |
164
- |--------|-------|---------|----------|
165
- | `fast` | 3x faster | Good | Social media, previews |
166
- | `balanced` | Normal | High | General use |
167
- | `high` | 2x slower | Excellent | Professional content |
168
-
169
- ## API Reference
170
-
171
- ### Core Functions
172
-
173
- ```python
174
- from app import process_video_fixed, load_models_with_validation
175
-
176
- # Load models
177
- status = load_models_with_validation()
178
-
179
- # Process video
180
- result_path, message = process_video_fixed(
181
- video_path="input.mp4",
182
- background_choice="studio_blue",
183
- custom_background_path=None,
184
- use_two_stage=False,
185
- chroma_preset="standard"
186
- )
187
- ```
188
-
189
- ### Two-Stage Processing
190
-
191
- ```python
192
- from two_stage_processor import TwoStageProcessor
193
-
194
- processor = TwoStageProcessor(sam2_predictor, matanyone_model)
195
-
196
- # Full pipeline
197
- result_path, message = processor.process_full_pipeline(
198
- video_path="input.mp4",
199
- background=background_image,
200
- final_output="output.mp4",
201
- chroma_settings={"tolerance": 40, "edge_softness": 2}
202
- )
203
- ```
204
-
205
- ## Troubleshooting
206
-
207
- ### Common Issues
208
-
209
- **CUDA Out of Memory**
210
- ```bash
211
- # Reduce processing quality
212
- export QUALITY_PRESET="fast"
213
- export KEYFRAME_INTERVAL="8"
214
- ```
215
-
216
- **Models Not Loading**
217
- ```bash
218
- # Clear cache and retry
219
- rm -rf /tmp/model_cache
220
- python app.py
221
- ```
222
-
223
- **Video Processing Fails**
224
- - Check video format (MP4 recommended)
225
- - Ensure video is not corrupted
226
- - Try shorter clips first (under 30 seconds)
227
-
228
- **Audio Missing**
229
- - FFmpeg must be installed and in PATH
230
- - Check input video has audio track
231
- - Try different output format
232
-
233
- ### Performance Optimization
234
-
235
- **For Large Videos**
236
- - Use "fast" quality preset
237
- - Increase `KEYFRAME_INTERVAL` to 8-10
238
- - Process in shorter segments
239
-
240
- **For High Resolution**
241
- - Ensure sufficient VRAM (6GB+ recommended)
242
- - Use two-stage mode for best quality
243
- - Consider downscaling input video
244
-
245
- ## Development
246
-
247
- ### Project Structure
248
- ```
249
- ├── app.py # Main application entry point
250
- ├── utilities.py # Core CV functions (segmentation, compositing)
251
- ├── two_stage_processor.py # Green screen pipeline
252
- ├── ui_components.py # Gradio interface
253
- ├── requirements.txt # Python dependencies
254
- ├── Dockerfile # Container configuration
255
- └── README.md # This file
256
- ```
257
-
258
- ### Contributing
259
-
260
- 1. Fork the repository
261
- 2. Create feature branch (`git checkout -b feature/amazing-feature`)
262
- 3. Commit changes (`git commit -m 'Add amazing feature'`)
263
- 4. Push to branch (`git push origin feature/amazing-feature`)
264
- 5. Open Pull Request
265
-
266
- ### Testing
267
-
268
- ```bash
269
- # Run with test video
270
- python app.py --test-mode
271
-
272
- # Process sample
273
- python -c "
274
- from app import process_video_fixed, load_models_with_validation
275
- load_models_with_validation()
276
- result = process_video_fixed('test_video.mp4', 'office_modern', None)
277
- print(f'Result: {result}')
278
- "
279
- ```
280
-
281
- ## License
282
-
283
- MIT License - see LICENSE file for details.
284
-
285
- ## Acknowledgments
286
-
287
- - **SAM2**: Meta's Segment Anything 2 for segmentation
288
- - **MatAnyone**: High-quality image matting
289
- - **Gradio**: Web interface framework
290
- - **OpenCV**: Computer vision processing
291
- - **FFmpeg**: Video encoding/decoding
292
-
293
- ## Support
294
-
295
- - **Issues**: Report bugs via GitHub Issues
296
- - **Discussions**: Feature requests and questions
297
- - **Documentation**: Check troubleshooting section first
298
-
299
  ---
300
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
301
  *For deployment on Hugging Face Spaces, see the space configuration in the app header.*
 
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ python_version: "3.10"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
+ # ============================================================================
15
+ # PYTORCH CUDA WHEELS (CUDA 12.1)
16
+ # ============================================================================
17
+ --extra-index-url https://download.pytorch.org/whl/cu121
18
+
19
+ # ============================================================================
20
+ # CORE PYTHON DEPENDENCIES (Python 3.10)
21
+ # ============================================================================
22
+ numpy==1.26.4
23
+ Pillow>=10.0.1,<11.0
24
+ setuptools>=65.7.0,<69.0
25
+ wheel>=0.40.0,<1.0
26
+ typing-extensions>=4.12.2,<5.0
27
+
28
+ # ============================================================================
29
+ # WEB FRAMEWORK & UI
30
+ # ============================================================================
31
+ gradio==4.44.1
32
+ gradio_client==1.3.0
33
+
34
+ # ============================================================================
35
+ # DEEP LEARNING & AI MODELS (Python 3.10 + CUDA 12.1)
36
+ # ============================================================================
37
+ torch==2.1.0
38
+ torchvision==0.16.0
39
+ torchaudio==2.1.0
40
+
41
+ # Hugging Face ecosystem - MatAnyOne compatible versions
42
+ transformers==4.43.3
43
+ huggingface_hub==0.24.5
44
+ accelerate>=0.20.3,<1.0
45
+ safetensors==0.4.3
46
+
47
+ # Model utilities - MatAnyOne requirements
48
+ einops==0.8.0
49
+ timm>=0.9.16
50
+
51
+ # ============================================================================
52
+ # COMPUTER VISION & VIDEO PROCESSING
53
+ # ============================================================================
54
+ opencv-python-headless==4.10.0.84
55
+
56
+ # Video processing
57
+ moviepy>=1.0.3,<2.0
58
+ imageio==2.34
59
+ imageio-ffmpeg>=0.4.8,<1.0
60
+ ffmpeg-python>=0.2.0,<1.0
61
+
62
+ # ============================================================================
63
+ # SCIENTIFIC COMPUTING
64
+ # ============================================================================
65
+ scipy==1.13.1
66
+ tqdm>=4.66.1,<5.0
67
+
68
+ # ============================================================================
69
+ # CONFIGURATION & UTILITIES
70
+ # ============================================================================
71
+ hydra-core==1.3.2
72
+ omegaconf==2.3.0
73
+ diskcache>=5.6.3,<6.0
74
+ psutil>=5.9.0,<6.0
75
+
76
+ # ============================================================================
77
+ # MATANYONE DEPENDENCIES (Python 3.10 compatible)
78
+ # ============================================================================
79
+ easydict==1.10
80
+ gdown>=4.7.1
81
+ hickle>=5.0
82
+ cchardet>=2.1.7
83
+ gitpython>=3.1
84
+ netifaces>=0.11.0
85
+ pycocotools>=2.0.7
86
+ tensorboard>=2.11
87
+ protobuf<4 # Uncommented to prevent potential conflicts
88
+
89
+ # ============================================================================
90
+ # GIT DEPENDENCIES (Model Repositories)
91
+ # ============================================================================
92
+ # SAM2 (Python 3.10+)
93
+ git+https://github.com/facebookresearch/segment-anything-2.git@2b90b9f5ceec907a1c18123530e92e794ad901a4
94
+
95
+ # MatAnyOne
96
+ git+https://github.com/pq-yang/MatAnyOne.git@2234ce5cdc487749515518bd035b5e18bccea3da
97
+
98
+ # Thin Plate Spline (MatAnyOne dependency)
99
+ git+https://github.com/cheind/py-thin-plate-spline@f6995795397118b7d0ac01aecd3f39ffbfad9dee
100
  *For deployment on Hugging Face Spaces, see the space configuration in the app header.*