English
frankjiang commited on
Commit
559b091
·
1 Parent(s): 466dffa

Release model weights.

Browse files
Files changed (3) hide show
  1. README.md +96 -3
  2. README_zh.md +96 -0
  3. fantasytalking_model.ckpt +3 -0
README.md CHANGED
@@ -1,3 +1,96 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [中文阅读](./README_zh.md)
2
+ # FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
3
+
4
+ [![Home Page](https://img.shields.io/badge/Project-<Website>-blue.svg)](https://fantasy-amap.github.io/fantasy-talking/)
5
+ [![arXiv](https://img.shields.io/badge/Arxiv-2504.04842-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2504.04842)
6
+ [![hf_paper](https://img.shields.io/badge/🤗-Paper%20In%20HF-red.svg)](https://huggingface.co/papers/2504.04842)
7
+
8
+ ## 🔥 Latest News!!
9
+ * April 28, 2025: We released the inference code and model weights for audio conditions.
10
+
11
+
12
+ <!-- ![Fig.1](https://github.com/Fantasy-AMAP/fantasy-talking/blob/main/assert/fig0_1_0.png) -->
13
+
14
+
15
+ ## Quickstart
16
+ ### 🛠️Installation
17
+
18
+ Clone the repo:
19
+
20
+ ```
21
+ git clone https://github.com/Fantasy-AMAP/fantasy-talking.git
22
+ cd fantasy-talking
23
+ ```
24
+
25
+ Install dependencies:
26
+ ```
27
+ # Ensure torch >= 2.0.0
28
+ pip install -r requirements.txt
29
+ # Optional to install flash_attn to accelerate attention computation
30
+ pip install flash_attn
31
+ ```
32
+
33
+ ### 🧱Model Download
34
+ | Models | Download Link | Notes |
35
+ | --------------|-------------------------------------------------------------------------------|-------------------------------|
36
+ | Wan2.1-I2V-14B-720P | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P) | Base model
37
+ | Wav2Vec | 🤗 [Huggingface](https://huggingface.co/facebook/wav2vec2-base-960h) 🤖 [ModelScope](https://modelscope.cn/models/AI-ModelScope/wav2vec2-base-960h) | Audio encoder
38
+ | FantasyTalking model | 🤗 [Huggingface](https://huggingface.co/acvlab/FantasyTalking/) 🤖 [ModelScope](https://www.modelscope.cn/models/amap_cvlab/FantasyTalking/) | Our audio condition weights
39
+
40
+ Download models using huggingface-cli:
41
+ ``` sh
42
+ pip install "huggingface_hub[cli]"
43
+ huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./models/Wan2.1-I2V-14B-720P
44
+ huggingface-cli download facebook/wav2vec2-base-960h --local-dir ./models/wav2vec2-base-960h
45
+ huggingface-cli download acvlab/FantasyTalking --files fantasytalking_model.ckpt --local-dir ./models/fantasytalking_model.ckpt
46
+ ```
47
+
48
+ Download models using modelscope-cli:
49
+ ``` sh
50
+ pip install modelscope
51
+ modelscope download Wan-AI/Wan2.1-I2V-14B-720P --local_dir ./models/Wan2.1-I2V-14B-720P
52
+ modelscope download AI-ModelScope/wav2vec2-base-960h --local_dir ./models/wav2vec2-base-960h
53
+ modelscope download amap_cvlab/FantasyTalking --files fantasytalking_model.ckpt --local-dir ./models/fantasytalking_model.ckpt
54
+ ```
55
+
56
+ ### 🔑 Inference
57
+ ``` sh
58
+ python infer.py --image_path ./assets/images/woman.png --audio_path ./assets/audios/woman.wav
59
+ ```
60
+ You can control the character's behavior through the prompt. The recommended range for prompt and audio cfg is [3-7].
61
+ ``` sh
62
+ python infer.py --image_path ./assets/images/woman.png --audio_path ./assets/audios/woman.wav --prompt "The person is speaking enthusiastically, with their hands continuously waving." --prompt_cfg_scale 5.0 --audio_cfg_scale 5.0
63
+ ```
64
+
65
+ We present a detailed table here. The model is tested on a single A100.(512x512, 81 frames).
66
+
67
+ |`torch_dtype`|`num_persistent_param_in_dit`|Speed|Required VRAM|
68
+ |-|-|-|-|
69
+ |torch.bfloat16|None (unlimited)|15.5s/it|40G|
70
+ |torch.bfloat16|7*10**9 (7B)|32.8s/it|20G|
71
+ |torch.bfloat16|0|42.6s/it|5G|
72
+
73
+ ### Gradio Demo
74
+ We construct an online demo in Huggingface.
75
+ For the local gradio demo, you can run:
76
+ ``` sh
77
+ pip install gradio spaces
78
+ python app.py
79
+ ```
80
+
81
+ ## 🧩 Community Works
82
+ We ❤️ contributions from the open-source community! If your work has improved FantasyTalking, please inform us.
83
+ ## 🔗Citation
84
+ If you find this repository useful, please consider giving a star ⭐ and citation
85
+ ```
86
+ @article{wang2025fantasytalking,
87
+ title={FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis},
88
+ author={Wang, Mengchao and Wang, Qiang and Jiang, Fan and Fan, Yaqi and Zhang, Yunpeng and Qi, Yonggang and Zhao, Kun and Xu, Mu},
89
+ journal={arXiv preprint arXiv:2504.04842},
90
+ year={2025}
91
+ }
92
+ ```
93
+
94
+ ## Acknowledgments
95
+ Thanks to [Wan2.1](https://github.com/Wan-Video/Wan2.1), [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), and [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) for open-sourcing their models and code, which provided valuable references and support for this project. Their contributions to the open-source community are truly appreciated.
96
+
README_zh.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [中文阅读](./README_zh.md)
2
+ # FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
3
+
4
+ [![Home Page](https://img.shields.io/badge/Project-<Website>-blue.svg)](https://fantasy-amap.github.io/fantasy-talking/)
5
+ [![arXiv](https://img.shields.io/badge/Arxiv-2504.04842-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2504.04842)
6
+ [![hf_paper](https://img.shields.io/badge/🤗-Paper%20In%20HF-red.svg)](https://huggingface.co/papers/2504.04842)
7
+
8
+ ## 🔥 Latest News!!
9
+ * 2025年4月28日: 开源了音频条件下的推理代码和模型权重。
10
+
11
+
12
+ <!-- ![Fig.1](https://github.com/Fantasy-AMAP/fantasy-talking/blob/main/assert/fig0_1_0.png) -->
13
+
14
+
15
+ ## 快速开始
16
+ ### 🛠️安装和依赖
17
+
18
+ 首先克隆git仓库:
19
+
20
+ ```
21
+ git clone https://github.com/Fantasy-AMAP/fantasy-talking.git
22
+ cd fantasy-talking
23
+ ```
24
+
25
+ 安装依赖:
26
+ ```
27
+ pip install -r requirements.txt
28
+ # 可选安装 flash_attn 以加速注意力计算
29
+ pip install flash_attn
30
+ ```
31
+
32
+ ### 🧱模型下载
33
+ | 模型 | 下载链接 | 备注 |
34
+ | --------------|-------------------------------------------------------------------------------|-------------------------------|
35
+ | Wan2.1-I2V-14B-720P | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P) | 基础模型
36
+ | Wav2Vec | 🤗 [Huggingface](https://huggingface.co/facebook/wav2vec2-base-960h) 🤖 [ModelScope](https://modelscope.cn/models/AI-ModelScope/wav2vec2-base-960h) | 音频编码器
37
+ | FantasyTalking model | 🤗 [Huggingface](https://huggingface.co/acvlab/FantasyTalking/) 🤖 [ModelScope](https://www.modelscope.cn/models/amap_cvlab/FantasyTalking/) | 我们的音频条件权重
38
+
39
+ 使用huggingface-cli下载模型:
40
+ ``` sh
41
+ pip install "huggingface_hub[cli]"
42
+ huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./models/Wan2.1-I2V-14B-720P
43
+ huggingface-cli download facebook/wav2vec2-base-960h --local-dir ./models/wav2vec2-base-960h
44
+ huggingface-cli download Wan-AI/Wan2.1-T2V-14B --files fantasytalking_model.ckpt --local-dir ./models/fantasytalking_model.ckpt
45
+ ```
46
+
47
+ 使用modelscope-cli下载模型:
48
+ ``` sh
49
+ pip install modelscope
50
+ modelscope download Wan-AI/Wan2.1-I2V-14B-720P --local_dir ./models/Wan2.1-I2V-14B-720P
51
+ modelscope download AI-ModelScope/wav2vec2-base-960h --local_dir ./models/wav2vec2-base-960h
52
+ modelscope download amap_cvlab/FantasyTalking --files fantasytalking_model.ckpt --local-dir ./models/fantasytalking_model.ckpt
53
+ ```
54
+
55
+ ### 🔑 推理
56
+ ``` sh
57
+ python infer.py --image_path ./assets/images/woman.png --audio_path ./assets/audios/woman.wav
58
+ ```
59
+ 您可以通过提示控制角色的行为。提示和音频配置的推荐范围是[3-7]。
60
+ ``` sh
61
+ python infer.py --image_path ./assets/images/woman.png --audio_path ./assets/audios/woman.wav --prompt "The person is speaking enthusiastically, with their hands continuously waving." --prompt_cfg_scale 5.0 --audio_cfg_scale 5.0
62
+ ```
63
+
64
+ 我们在此处提供了一个详细的表格。该模型在单个A100上进行了测试。(512x512,81帧)
65
+ |`torch_dtype`|`num_persistent_param_in_dit`|Speed|Required VRAM|
66
+ |-|-|-|-|
67
+ |torch.bfloat16|None (unlimited)|15.5s/it|40G|
68
+ |torch.bfloat16|7*10**9 (7B)|32.8s/it|20G|
69
+ |torch.bfloat16|0|42.6s/it|5G|
70
+
71
+ ### Gradio 示例
72
+ 我们构建了一个Huggingface在线演示。
73
+
74
+ 对于本地的gradio演示,您可以运行:
75
+ ``` sh
76
+ pip install gradio spaces
77
+ python app.py
78
+ ```
79
+
80
+ ## 🧩 社区工作
81
+ 我们❤️喜欢来自开源社区的贡献!如果你的工作改进了FantasyTalking,请告诉我们。
82
+
83
+ ## 🔗Citation
84
+ If you find this repository useful, please consider giving a star ⭐ and citation
85
+ ```
86
+ @article{wang2025fantasytalking,
87
+ title={FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis},
88
+ author={Wang, Mengchao and Wang, Qiang and Jiang, Fan and Fan, Yaqi and Zhang, Yunpeng and Qi, Yonggang and Zhao, Kun and Xu, Mu},
89
+ journal={arXiv preprint arXiv:2504.04842},
90
+ year={2025}
91
+ }
92
+ ```
93
+
94
+ ## 致谢
95
+ 感谢[Wan2.1](https://github.com/Wan-Video/Wan2.1)、[HunyuanVideo](https://github.com/Tencent/HunyuanVideo)和[DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)开源他们的模型和代码,为该项目提供了宝贵的参考和支持。他们对开源社区的贡献真正值得赞赏。
96
+
fantasytalking_model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e75eb54d2f6e5606a4c009785dd588a6e30d0f07bdd09bf433d624f148a1b6b
3
+ size 3361779185