Add link to paper, code, and project page
#32
by
						
nielsr
	
							HF Staff
						- opened
							
					
    	
        README.md
    CHANGED
    
    | @@ -1,16 +1,17 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
            -
            license: apache-2.0
         | 
| 3 | 
             
            language:
         | 
| 4 | 
             
            - en
         | 
| 5 | 
             
            - zh
         | 
|  | |
|  | |
| 6 | 
             
            pipeline_tag: text-to-video
         | 
| 7 | 
             
            tags:
         | 
| 8 | 
             
            - video generation
         | 
| 9 | 
            -
            library_name: diffusers
         | 
| 10 | 
             
            inference:
         | 
| 11 | 
             
              parameters:
         | 
| 12 | 
             
                num_inference_steps: 10
         | 
| 13 | 
             
            ---
         | 
|  | |
| 14 | 
             
            # Wan2.1
         | 
| 15 |  | 
| 16 | 
             
            <p align="center">
         | 
| @@ -18,7 +19,7 @@ inference: | |
| 18 | 
             
            <p>
         | 
| 19 |  | 
| 20 | 
             
            <p align="center">
         | 
| 21 | 
            -
                💜 <a href=""><b>Wan</b></a>    |    🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a>     |   🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>   |    📑 <a href="">Paper (Coming soon)</a>    |    📑 <a href="https:// | 
| 22 | 
             
            <br>
         | 
| 23 |  | 
| 24 | 
             
            -----
         | 
| @@ -68,13 +69,13 @@ This repository features our T2V-14B model, which establishes a new SOTA perform | |
| 68 |  | 
| 69 | 
             
            #### Installation
         | 
| 70 | 
             
            Clone the repo:
         | 
| 71 | 
            -
            ```
         | 
| 72 | 
             
            git clone https://github.com/Wan-Video/Wan2.1.git
         | 
| 73 | 
             
            cd Wan2.1
         | 
| 74 | 
             
            ```
         | 
| 75 |  | 
| 76 | 
             
            Install dependencies:
         | 
| 77 | 
            -
            ```
         | 
| 78 | 
             
            # Ensure torch >= 2.4.0
         | 
| 79 | 
             
            pip install -r requirements.txt
         | 
| 80 | 
             
            ```
         | 
| @@ -142,13 +143,13 @@ To facilitate implementation, we will start with a basic version of the inferenc | |
| 142 |  | 
| 143 | 
             
            - Single-GPU inference
         | 
| 144 |  | 
| 145 | 
            -
            ```
         | 
| 146 | 
             
            python generate.py  --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
         | 
| 147 | 
             
            ```
         | 
| 148 |  | 
| 149 | 
             
            If you encounter OOM (Out-of-Memory) issues, you can use the `--offload_model True` and `--t5_cpu` options to reduce GPU memory usage. For example, on an RTX 4090 GPU:
         | 
| 150 |  | 
| 151 | 
            -
            ```
         | 
| 152 | 
             
            python generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
         | 
| 153 | 
             
            ```
         | 
| 154 |  | 
| @@ -157,7 +158,7 @@ python generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B | |
| 157 |  | 
| 158 | 
             
            - Multi-GPU inference using FSDP + xDiT USP
         | 
| 159 |  | 
| 160 | 
            -
            ```
         | 
| 161 | 
             
            pip install "xfuser>=0.4.1"
         | 
| 162 | 
             
            torchrun --nproc_per_node=8 generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
         | 
| 163 | 
             
            ```
         | 
| @@ -172,7 +173,7 @@ Extending the prompts can effectively enrich the details in the generated videos | |
| 172 | 
             
              - Configure the environment variable `DASH_API_KEY` to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable `DASH_API_URL` to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the [dashscope document](https://www.alibabacloud.com/help/en/model-studio/developer-reference/use-qwen-by-calling-api?spm=a2c63.p38356.0.i1).
         | 
| 173 | 
             
              - Use the `qwen-plus` model for text-to-video tasks and `qwen-vl-max` for image-to-video tasks.
         | 
| 174 | 
             
              - You can modify the model used for extension with the parameter `--prompt_extend_model`. For example:
         | 
| 175 | 
            -
            ```
         | 
| 176 | 
             
            DASH_API_KEY=your_key python generate.py  --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'dashscope' --prompt_extend_target_lang 'ch'
         | 
| 177 | 
             
            ```
         | 
| 178 |  | 
| @@ -184,13 +185,13 @@ DASH_API_KEY=your_key python generate.py  --task t2v-14B --size 1280*720 --ckpt_ | |
| 184 | 
             
              - Larger models generally provide better extension results but require more GPU memory.
         | 
| 185 | 
             
              - You can modify the model used for extension with the parameter `--prompt_extend_model` , allowing you to specify either a local model path or a Hugging Face model. For example:
         | 
| 186 |  | 
| 187 | 
            -
            ```
         | 
| 188 | 
             
            python generate.py  --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch'
         | 
| 189 | 
             
            ```
         | 
| 190 |  | 
| 191 | 
             
            ##### (3) Runing local gradio
         | 
| 192 |  | 
| 193 | 
            -
            ```
         | 
| 194 | 
             
            cd gradio
         | 
| 195 | 
             
            # if one uses dashscope’s API for prompt extension
         | 
| 196 | 
             
            DASH_API_KEY=your_key python t2v_14B_singleGPU.py --prompt_extend_method 'dashscope' --ckpt_dir ./Wan2.1-T2V-14B
         | 
|  | |
| 1 | 
             
            ---
         | 
|  | |
| 2 | 
             
            language:
         | 
| 3 | 
             
            - en
         | 
| 4 | 
             
            - zh
         | 
| 5 | 
            +
            library_name: diffusers
         | 
| 6 | 
            +
            license: apache-2.0
         | 
| 7 | 
             
            pipeline_tag: text-to-video
         | 
| 8 | 
             
            tags:
         | 
| 9 | 
             
            - video generation
         | 
|  | |
| 10 | 
             
            inference:
         | 
| 11 | 
             
              parameters:
         | 
| 12 | 
             
                num_inference_steps: 10
         | 
| 13 | 
             
            ---
         | 
| 14 | 
            +
             | 
| 15 | 
             
            # Wan2.1
         | 
| 16 |  | 
| 17 | 
             
            <p align="center">
         | 
|  | |
| 19 | 
             
            <p>
         | 
| 20 |  | 
| 21 | 
             
            <p align="center">
         | 
| 22 | 
            +
                💜 <a href="https://wan.video"><b>Wan</b></a>    |    🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a>     |   🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>   |    📑 <a href="https://huggingface.co/papers/2503.20314">Paper (Coming soon)</a>    |    📑 <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a>    |   💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>   |    📖 <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>  
         | 
| 23 | 
             
            <br>
         | 
| 24 |  | 
| 25 | 
             
            -----
         | 
|  | |
| 69 |  | 
| 70 | 
             
            #### Installation
         | 
| 71 | 
             
            Clone the repo:
         | 
| 72 | 
            +
            ```sh
         | 
| 73 | 
             
            git clone https://github.com/Wan-Video/Wan2.1.git
         | 
| 74 | 
             
            cd Wan2.1
         | 
| 75 | 
             
            ```
         | 
| 76 |  | 
| 77 | 
             
            Install dependencies:
         | 
| 78 | 
            +
            ```sh
         | 
| 79 | 
             
            # Ensure torch >= 2.4.0
         | 
| 80 | 
             
            pip install -r requirements.txt
         | 
| 81 | 
             
            ```
         | 
|  | |
| 143 |  | 
| 144 | 
             
            - Single-GPU inference
         | 
| 145 |  | 
| 146 | 
            +
            ```sh
         | 
| 147 | 
             
            python generate.py  --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
         | 
| 148 | 
             
            ```
         | 
| 149 |  | 
| 150 | 
             
            If you encounter OOM (Out-of-Memory) issues, you can use the `--offload_model True` and `--t5_cpu` options to reduce GPU memory usage. For example, on an RTX 4090 GPU:
         | 
| 151 |  | 
| 152 | 
            +
            ```sh
         | 
| 153 | 
             
            python generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
         | 
| 154 | 
             
            ```
         | 
| 155 |  | 
|  | |
| 158 |  | 
| 159 | 
             
            - Multi-GPU inference using FSDP + xDiT USP
         | 
| 160 |  | 
| 161 | 
            +
            ```sh
         | 
| 162 | 
             
            pip install "xfuser>=0.4.1"
         | 
| 163 | 
             
            torchrun --nproc_per_node=8 generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
         | 
| 164 | 
             
            ```
         | 
|  | |
| 173 | 
             
              - Configure the environment variable `DASH_API_KEY` to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable `DASH_API_URL` to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the [dashscope document](https://www.alibabacloud.com/help/en/model-studio/developer-reference/use-qwen-by-calling-api?spm=a2c63.p38356.0.i1).
         | 
| 174 | 
             
              - Use the `qwen-plus` model for text-to-video tasks and `qwen-vl-max` for image-to-video tasks.
         | 
| 175 | 
             
              - You can modify the model used for extension with the parameter `--prompt_extend_model`. For example:
         | 
| 176 | 
            +
            ```sh
         | 
| 177 | 
             
            DASH_API_KEY=your_key python generate.py  --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'dashscope' --prompt_extend_target_lang 'ch'
         | 
| 178 | 
             
            ```
         | 
| 179 |  | 
|  | |
| 185 | 
             
              - Larger models generally provide better extension results but require more GPU memory.
         | 
| 186 | 
             
              - You can modify the model used for extension with the parameter `--prompt_extend_model` , allowing you to specify either a local model path or a Hugging Face model. For example:
         | 
| 187 |  | 
| 188 | 
            +
            ```sh
         | 
| 189 | 
             
            python generate.py  --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch'
         | 
| 190 | 
             
            ```
         | 
| 191 |  | 
| 192 | 
             
            ##### (3) Runing local gradio
         | 
| 193 |  | 
| 194 | 
            +
            ```sh
         | 
| 195 | 
             
            cd gradio
         | 
| 196 | 
             
            # if one uses dashscope’s API for prompt extension
         | 
| 197 | 
             
            DASH_API_KEY=your_key python t2v_14B_singleGPU.py --prompt_extend_method 'dashscope' --ckpt_dir ./Wan2.1-T2V-14B
         | 
