ALGOTECH
/

WanVideo_comfy

Model card Files Files and versions

xet

Community

ALGOTECH commited on Jul 9, 2025

Commit

c64d752

verified ·

1 Parent(s): b78b09e

Update README.md

Browse files

Files changed (1) hide show

README.md +66 -7

README.md CHANGED Viewed

@@ -1,14 +1,73 @@
 ---
-base_model:
-- Wan-AI/Wan2.1-VACE-14B
 ---
-Combined and quantized models for WanVideo, originating from here:
-https://huggingface.co/Wan-AI/
-Can be used with: https://github.com/kijai/ComfyUI-WanVideoWrapper and ComfyUI native WanVideo nodes.
 ---
-clip_vision_h: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision
-FantasyTalking: https://github.com/Fantasy-AMAP/fantasy-talking

+### 🚀 WanVideo Model Suite
+**Combined & Quantized Models for ComfyUI Workflows**
+*Derived from `Wan-AI/Wan2.1-VACE-14B`*
+---
+## 📋 Overview
+This repository provides optimized models for [**WanVideo**](https://github.com/kijai/ComfyUI-WanVideoWrapper)—a high-fidelity video generation framework. Models are quantized to balance performance and resource efficiency while retaining visual quality. Designed for seamless integration with ComfyUI via:
+- **[WanVideo Wrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper)** (Third-party extension)
+- Native **WanVideo nodes** in ComfyUI
+---
+## 🔧 Key Components
+### 1. **Core Diffusion Models**
+| File | Size | Description |
+|------|------|-------------|
+| `wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors` | Quantized (FP8) | Base video generation model (14B params, 720p). |
+| `fantasytalking_fp16.safetensors` | FP16 | Specialized model for expressive dialogue animation. |
+### 2. **Text & Vision Encoders**
+| File | Type | Role |
+|------|------|------|
+| `umt5-xxl-enc-bf16.safetensors` | Text Encoder (UMT5-XXL) | BF16 precision for multilingual text understanding. |
+| `clip_vision_h.safetensors` | Vision Encoder | Processes visual inputs for conditional generation. |
 ---
+## 📁 ComfyUI Setup Guide
+Place files in these directories within your ComfyUI installation:
+```bash
+models/
+├── diffusion_models/
+│   ├── wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors
+│   └── fantasytalking_fp16.safetensors
+├── clip_vision/
+│   └── clip_vision_h.safetensors
+└── text_encoders/
+    └── umt5-xxl-enc-bf16.safetensors
+```
 ---
+## 🔗 Dependencies & Resources
+1. **Vision Encoder Resources**
+   - Download `clip_vision_h.safetensors` from:
+     [Comfy-Org/Wan_2.1_ComfyUI_repackaged](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision)
+2. **FantasyTalking Model**
+   - Source code & usage: [GitHub Repository](https://github.com/Fantasy-AMAP/fantasy-talking)
+3. **Base Model**
+   - Full precision version: [Wan-AI/Wan2.1-VACE-14B](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)
+---
+## 💡 Usage Notes
+- **Quantization Benefits**: FP8 reduces VRAM usage by ~50% vs FP16, enabling 720p generation on consumer GPUs.
+- **Workflow Compatibility**: Combine with `Text-to-Video`, `Image-to-Video`, or `FantasyTalking` nodes in ComfyUI.
+- **Multi-Modal Inputs**: UMT5-XXL encoder supports multilingual prompts (e.g., English, Chinese).
 ---
+## ⚖️ License
+*Inherited from parent models ([Check Wan-AI License](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)). Non-commercial/research use recommended pending verification.*
+---
+**✨ Pro Tip**: For optimal results, pair with WanVideo’s temporal consistency modules to reduce frame flickering in long sequences.
+---
+*Model Card curated by the ComfyUI community. Maintained for reproducibility and ease of deployment.*