amd
/

Text-to-Video
nielsr HF Staff commited on
Commit
3e29fea
·
verified ·
1 Parent(s): 0d4146c

Improve model card

Browse files

This PR improves the model card by adding a link to the paper, specifying the correct library name, and adding more details about the model from the paper abstract. It also restructures the examples section to be more user-friendly.

Files changed (1) hide show
  1. README.md +130 -84
README.md CHANGED
@@ -1,14 +1,37 @@
1
  ---
2
- license: gpl-3.0
 
3
  datasets:
4
  - nkp37/OpenVid-1M
5
  - TempoFunk/webvid-10M
6
- base_model:
7
- - VideoCrafter/VideoCrafter2
8
  pipeline_tag: text-to-video
 
9
  ---
 
10
  # Advanced text-to-video Diffusion Models
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  ⚡️ This repository provides training recipes for the AMD efficient text-to-video models, which are designed for high performance and efficiency. The training process includes two key steps:
14
 
@@ -18,88 +41,111 @@ pipeline_tag: text-to-video
18
 
19
  This implementation is released to promote further research and innovation in the field of efficient text-to-video generation, optimized for AMD Instinct accelerators.
20
 
21
- You can download the code from our [GitHub Repo](https://github.com/AMD-AIG-AIMA/AMD-Hummingbird-T2V).
22
-
23
- <img src="GIFs/vbench.png" alt="Vbench performance" title="Vbench performance" class="vbench-img">
24
-
25
-
26
- **8-Steps Results**
27
- <style>
28
- table {
29
- width: auto;
30
- border-collapse: collapse;
31
- }
32
- th, td {
33
- border: 1px solid #ddd;
34
- text-align: center;
35
- padding: 0px;
36
- vertical-align: middle;
37
- width: 256px; /* 每列宽度固定 */
38
- }
39
- tr.text-row {
40
- height: 30px; /* 文字行高度 */
41
- }
42
- tr.image-row {
43
- height: 160px; /* 图片行高度 */
44
- }
45
- /* 默认表格中的图片大小 */
46
- img {
47
- width: 256px;
48
- height: 160px;
49
- object-fit: cover;
50
- }
51
- /* 只影响 vbench.png */
52
- .vbench-img {
53
- width: 785px !important;
54
- height: 698px !important;
55
- object-fit: contain; /* 让图片完整显示,不裁剪 */
56
- }
57
- </style>
58
-
59
-
60
- <table>
61
- <tr class="text-row">
62
- <th>A cute happy Corgi playing in park, sunset, pixel.</th>
63
- <th>A cute happy Corgi playing in park, sunset, animated style.</th>
64
- <th>A cute raccoon playing guitar in the beach.</th>
65
- <th>A cute raccoon playing guitar in the forest.</th>
66
- </tr>
67
- <tr class="image-row">
68
- <td><img src="GIFs/A_cute_happy_Corgi_playing_in_park,_sunset,_pixel_.gif"></td>
69
- <td><img src="GIFs/A cute happy Corgi playing in park, sunset, animated style.gif"></td>
70
- <td><img src="GIFs/A cute raccoon playing guitar in the beach.gif"></td>
71
- <td><img src="GIFs/A cute raccoon playing guitar in the forest.gif"></td>
72
- </tr>
73
- <tr class="text-row">
74
- <th>A quiet beach at dawn and the waves gently lapping.</th>
75
- <th>A cute teddy bear, dressed in a red silk outfit, stands in a vibrant street, Chinese New Year.</th>
76
- <th>A sandcastle being eroded by the incoming tide.</th>
77
- <th>An astronaut flying in space, in cyberpunk style.</th>
78
- </tr>
79
- <tr class="image-row">
80
- <td><img src="GIFs/A_quiet_beach_at_dawn_and_the_waves_gently_lapping.gif"></td>
81
- <td><img src="GIFs/A cute teddy bear, dressed in a red silk outfit, stands in a vibrant street, chinese new year..gif"></td>
82
- <td><img src="GIFs/A sandcastle being eroded by the incoming tide.gif"></td>
83
- <td><img src="GIFs/An astronaut flying in space, in cyberpunk style.gif"></td>
84
- </tr>
85
- <tr class="text-row">
86
- <th>A cat DJ at a party.</th>
87
- <th>A 3D model of a 1800s victorian house.</th>
88
- <th>A drone flying over a snowy forest.</th>
89
- <th>A ghost ship navigating through a sea under a moon.</th>
90
- </tr>
91
- <tr class="image-row">
92
- <td><img src="GIFs/A_cat_DJ_at_a_party.gif"></td>
93
- <td><img src="GIFs/A 3D model of a 1800s victorian house..gif"></td>
94
- <td><img src="GIFs/a_drone_flying_over_a_snowy_forest.gif"></td>
95
- <td><img src="GIFs/A_ghost_ship_navigating_through_a_sea_under_a_moon.gif"></td>
96
- </tr>
97
- </table>
98
-
99
-
100
-
101
-
102
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  # License
105
  Copyright (c) 2024 Advanced Micro Devices, Inc. All Rights Reserved.
 
1
  ---
2
+ base_model:
3
+ - VideoCrafter/VideoCrafter2
4
  datasets:
5
  - nkp37/OpenVid-1M
6
  - TempoFunk/webvid-10M
7
+ license: gpl-3.0
 
8
  pipeline_tag: text-to-video
9
+ library_name: diffusers
10
  ---
11
+
12
  # Advanced text-to-video Diffusion Models
13
 
14
+ This repository contains the model from the paper [AMD-Hummingbird: Towards an Efficient Text-to-Video Model](https://huggingface.co/papers/2503.18559). Hummingbird is a lightweight text-to-video (T2V) framework that prunes existing models (such as VideoCrafter2) and enhances visual quality through visual feedback learning. It aims to improve the efficiency of T2V generation, making it more suitable for deployment on resource-limited devices while preserving high-quality video generation.
15
+
16
+ ## Table of Contents
17
+ - [Advanced text-to-video Diffusion Models](#advanced-text-to-video-diffusion-models)
18
+ - [Key Features](#key-features)
19
+ - [8-Steps Results](#8-steps-results)
20
+ - [Checkpoint](#checkpoint)
21
+ - [Installation](#installation)
22
+ - [conda](#conda)
23
+ - [docker](#docker)
24
+ - [Data Processing](#data-processing)
25
+ - [VQA](#vqa)
26
+ - [Remove Dolly Zoom Videos](#remove-dolly-zoom-videos)
27
+ - [Training](#training)
28
+ - [Model Distillation](#model-distillation)
29
+ - [Acceleration Training](#acceleration-training)
30
+ - [Inference](#inference)
31
+ - [License](#license)
32
+
33
+
34
+ ## Key Features
35
 
36
  ⚡️ This repository provides training recipes for the AMD efficient text-to-video models, which are designed for high performance and efficiency. The training process includes two key steps:
37
 
 
41
 
42
  This implementation is released to promote further research and innovation in the field of efficient text-to-video generation, optimized for AMD Instinct accelerators.
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
+ ![Vbench performance](GIFs/vbench.png)
46
+
47
+
48
+
49
+
50
+ ## 8-Steps Results
51
+
52
+ | Prompt | Generated Video | Prompt | Generated Video |
53
+ |--------------------------------------------|--------------------------|----------------------------------------------|------------------------|
54
+ | A cute happy Corgi playing in park, sunset, pixel. | ![GIF](GIFs/A_cute_happy_Corgi_playing_in_park,_sunset,_pixel_.gif) | A cute happy Corgi playing in park, sunset, animated style. | ![GIF](GIFs/A_cute_happy_Corgi_playing_in_park,_sunset,_animated_style.gif) |
55
+ | A quiet beach at dawn and the waves gently lapping. | ![GIF](GIFs/A_quiet_beach_at_dawn_and_the_waves_gently_lapping.gif) | A cute teddy bear, dressed in a red silk outfit, stands in a vibrant street, chinese new year. | ![GIF](GIFs/A_cute_teddy_bear,_dressed_in_a_red_silk_outfit,_stands_in_a_vibrant_street,_chinese_new_year..gif) |
56
+ | A cat DJ at a party. | ![GIF](GIFs/A_cat_DJ_at_a_party.gif) | A 3D model of a 1800s victorian house. | ![GIF](GIFs/A_3D_model_of_a_1800s_victorian_house..gif) |
57
+ | A cute raccoon playing guitar in the beach. | ![GIF](GIFs/A_cute_raccoon_playing_guitar_in_the_beach.gif) | A cute raccoon playing guitar in the forest. | ![GIF](GIFs/A_cute_raccoon_playing_guitar_in_the_forest.gif)|
58
+ | A sandcastle being eroded by the incoming tide. | ![GIF](GIFs/A_sandcastle_being_eroded_by_the_incoming_tide.gif) | An astronaut flying in space, in cyberpunk style. | ![GIF](GIFs/An_astronaut_flying_in_space,_in_cyberpunk_style.gif) |
59
+ | A drone flying over a snowy forest. | ![GIF](GIFs/a_drone_flying_over_a_snowy_forest.gif) | A ghost ship navigating through a sea under a moon. | ![GIF](GIFs/A_ghost_ship_navigating_through_a_sea_under_a_moon.gif) |
60
+
61
+
62
+ # Checkpoint
63
+ Our pretrained checkpoint can be downloaded from [HuggingFace](https://huggingface.co/amd/AMD-Hummingbird-T2V/tree/main)
64
+
65
+ # Installation
66
+ We train both 0.9B and 0.7 T2V models on MI250 and evalute them on MI250, MI300, RTX7900xt and RadeonTM 880M RyzenTM AI 9 365 Ubuntu 6.8.0-51-generic.
67
+
68
+ ## conda
69
+ ```
70
+ conda create -n AMD_Hummingbird python=3.10
71
+ conda activate AMD_Hummingbird
72
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/rocm6.1
73
+ pip install -r requirements.txt
74
+ ```
75
+ For rocm flash-attn, you can install it by this [link](https://github.com/ROCm/flash-attention).
76
+ ```
77
+ git clone https://github.com/ROCm/flash-attention.git
78
+ cd flash-attention
79
+ python setup.py install
80
+ ```
81
+ It will take about 1.5 hours to install.
82
+
83
+ ## docker
84
+ First, you should use `docker pull` to download the image.
85
+ ```
86
+ docker pull rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
87
+ ```
88
+ Second, you can use `docker run` to run the image, for example:
89
+ ```
90
+ docker run \
91
+ -v "$(pwd):/workspace" \
92
+ --device=/dev/kfd \
93
+ --device=/dev/dri \
94
+ -it \
95
+ --network=host \
96
+ --name hummingbird \
97
+ rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
98
+ ```
99
+ When you in the container, you can use `pip` to install other dependencies:
100
+ ```
101
+ pip install -r requirements.txt
102
+ ```
103
+
104
+ # Data Processing
105
+
106
+ ## VQA
107
+ ```
108
+ cd data_pre_process/DOVER
109
+ sh run.sh
110
+ ```
111
+ Then you can get a score table for all video qualities, sort according to the table, and remove low-scoring videos.
112
+ ## Remove Dolly Zoom Videos
113
+ ```
114
+ cd data_pre_process/VBench
115
+ sh run.sh
116
+ ```
117
+ According to the motion smoothness score csv file, you can remove low-scoring videos.
118
+ # Training
119
+
120
+ ## Model Distillation
121
+
122
+ ```
123
+ sh configs/training_512_t2v_v1.0/run_distill.sh
124
+ ```
125
+
126
+
127
+ ## Acceleration Training
128
+
129
+ ```
130
+ cd acceleration/t2v-turbo
131
+
132
+ # for 0.7 B model
133
+ sh train_07B.sh
134
+
135
+ # for 0.9 B model
136
+ sh train_09B.sh
137
+ ```
138
+
139
+
140
+ # Inference
141
+
142
+ ```
143
+ # for 0.7B model
144
+ python inference_command_config_07B.py
145
+
146
+ # for 0.9B model
147
+ python inference_command_config_09B.py
148
+ ```
149
 
150
  # License
151
  Copyright (c) 2024 Advanced Micro Devices, Inc. All Rights Reserved.