Update README.md
Browse files
README.md
CHANGED
|
@@ -1,14 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
-
Combined and quantized models for WanVideo, originating from here:
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
---
|
| 12 |
-
clip_vision_h: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
### π WanVideo Model Suite
|
| 2 |
+
**Combined & Quantized Models for ComfyUI Workflows**
|
| 3 |
+
*Derived from `Wan-AI/Wan2.1-VACE-14B`*
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## π Overview
|
| 8 |
+
This repository provides optimized models for [**WanVideo**](https://github.com/kijai/ComfyUI-WanVideoWrapper)βa high-fidelity video generation framework. Models are quantized to balance performance and resource efficiency while retaining visual quality. Designed for seamless integration with ComfyUI via:
|
| 9 |
+
- **[WanVideo Wrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper)** (Third-party extension)
|
| 10 |
+
- Native **WanVideo nodes** in ComfyUI
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## π§ Key Components
|
| 15 |
+
|
| 16 |
+
### 1. **Core Diffusion Models**
|
| 17 |
+
| File | Size | Description |
|
| 18 |
+
|------|------|-------------|
|
| 19 |
+
| `wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors` | Quantized (FP8) | Base video generation model (14B params, 720p). |
|
| 20 |
+
| `fantasytalking_fp16.safetensors` | FP16 | Specialized model for expressive dialogue animation. |
|
| 21 |
+
|
| 22 |
+
### 2. **Text & Vision Encoders**
|
| 23 |
+
| File | Type | Role |
|
| 24 |
+
|------|------|------|
|
| 25 |
+
| `umt5-xxl-enc-bf16.safetensors` | Text Encoder (UMT5-XXL) | BF16 precision for multilingual text understanding. |
|
| 26 |
+
| `clip_vision_h.safetensors` | Vision Encoder | Processes visual inputs for conditional generation. |
|
| 27 |
+
|
| 28 |
---
|
| 29 |
+
|
| 30 |
+
## π ComfyUI Setup Guide
|
| 31 |
+
Place files in these directories within your ComfyUI installation:
|
| 32 |
+
```bash
|
| 33 |
+
models/
|
| 34 |
+
βββ diffusion_models/
|
| 35 |
+
β βββ wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors
|
| 36 |
+
β βββ fantasytalking_fp16.safetensors
|
| 37 |
+
βββ clip_vision/
|
| 38 |
+
β βββ clip_vision_h.safetensors
|
| 39 |
+
βββ text_encoders/
|
| 40 |
+
βββ umt5-xxl-enc-bf16.safetensors
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
---
|
|
|
|
| 44 |
|
| 45 |
+
## π Dependencies & Resources
|
| 46 |
+
1. **Vision Encoder Resources**
|
| 47 |
+
- Download `clip_vision_h.safetensors` from:
|
| 48 |
+
[Comfy-Org/Wan_2.1_ComfyUI_repackaged](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision)
|
| 49 |
+
|
| 50 |
+
2. **FantasyTalking Model**
|
| 51 |
+
- Source code & usage: [GitHub Repository](https://github.com/Fantasy-AMAP/fantasy-talking)
|
| 52 |
+
|
| 53 |
+
3. **Base Model**
|
| 54 |
+
- Full precision version: [Wan-AI/Wan2.1-VACE-14B](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
|
| 58 |
+
## π‘ Usage Notes
|
| 59 |
+
- **Quantization Benefits**: FP8 reduces VRAM usage by ~50% vs FP16, enabling 720p generation on consumer GPUs.
|
| 60 |
+
- **Workflow Compatibility**: Combine with `Text-to-Video`, `Image-to-Video`, or `FantasyTalking` nodes in ComfyUI.
|
| 61 |
+
- **Multi-Modal Inputs**: UMT5-XXL encoder supports multilingual prompts (e.g., English, Chinese).
|
| 62 |
|
| 63 |
---
|
|
|
|
| 64 |
|
| 65 |
+
## βοΈ License
|
| 66 |
+
*Inherited from parent models ([Check Wan-AI License](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)). Non-commercial/research use recommended pending verification.*
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
**β¨ Pro Tip**: For optimal results, pair with WanVideoβs temporal consistency modules to reduce frame flickering in long sequences.
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
*Model Card curated by the ComfyUI community. Maintained for reproducibility and ease of deployment.*
|