amd
/

File size: 3,993 Bytes
2576b8f
 
 
 
 
4c3d339
14bd6e2
 
 
 
9c2636b
14bd6e2
 
 
 
 
 
9c2636b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14bd6e2
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: agpl-3.0
datasets:
- nkp37/OpenVid-1M
- TempoFunk/webvid-10M
---
⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited 
computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by 
**reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal 
overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning 
negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16 
inference steps on an AMD Radeon™ RX 7900 XTX GPU.  Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among 
U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training 
methodology, and benchmark performance.

<img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">

<img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">

<style>
  table {
    width: auto;
    border-collapse: collapse;
  }
  th, td {
    border: 1px solid #ddd;
    text-align: center;
    padding: 0px;
    vertical-align: middle;
    width: 256px; /* 每列宽度固定 */
  }
  tr.text-row {
    height: 30px; /* 文字行高度 */
  }
  tr.image-row {
    height: 160px; /* 图片行高度 */
  }
  /* 默认表格中的图片大小 */
  img {
    width: 256px;
    height: 160px;
    object-fit: cover;
  }
  /* 只影响 vbench.png */
  .vbench-img {
    width: 785px !important;
    height: 698px !important;
    object-fit: contain; /* 让图片完整显示,不裁剪 */
  }
</style>








| Model               | I2V Subj | I2V Bkg | Cam Mot | Subj Cons | Bkg Cons | Mot Smo | Dyn Deg | Aes Qual | Img Qual | Total Score |
|---------------------|----------|---------|---------|-----------|-----------|----------|----------|-----------|-----------|--------------|
| CogVideoSFT         | 97.67%   | 98.76%  | 84.93%  | 95.47%    | 98.30%    | 98.35%   | 36.51%   | 59.76%    | 67.64%    | 87.98%       |
| CogVideoX-12V-5B    | 98.87%   | 99.08%  | 76.25%  | 96.99%    | 99.02%    | 98.85%   | 21.79%   | 60.76%    | 69.53%    | 88.21%       |
| Step-Video-T12V     | 97.44%   | 98.45%  | 48.15%  | 95.62%    | 96.92%    | 99.08%   | 48.78%   | 61.74%    | 70.17%    | 87.98%       |
| HunYuan             | -        | -       | -       | -         | 93.85%    | 99.39%   | -        | -         | -         | -            |
| Wan-2.1-14B         | -        | -       | -       | -         | 98.46%    | 96.07%   | -        | -         | -         | -            |
| Animate-Anything    | 98.76%   | 98.58%  | 13.08%  | 98.90%    | 98.19%    | 98.61%   | 2.68%    | 67.12%    | 72.09%    | 86.48%       |
| SEINE-512           | 97.15%   | 96.94%  | 20.97%  | 95.28%    | 97.12%    | 97.12%   | 27.07%   | 64.55%    | 71.39%    | 85.52%       |
| I2VGen-XL           | 96.48%   | 96.83%  | 18.46%  | 95.45%    | 96.42%    | 98.03%   | 24.08%   | 64.82%    | 69.14%    | 85.28%       |
| ConsistI2V          | 95.82%   | 95.95%  | 33.92%  | 95.27%    | 94.38%    | 97.38%   | 18.62%   | 59.00%    | 66.92%    | 84.91%       |
| DynamiCrafter-512   | 97.05%   | 97.56%  | 20.92%  | 94.74%    | 98.29%    | 97.83%   | 40.57%   | 58.71%    | 62.28%    | 85.25%       |
| Hummingbird-I2V     | 96.30%   | 96.39%  | 12.69%  | 97.10%    | 98.60%    | 98.24%   | 62.60%   | 64.45%    | 69.27%    | 87.05%       |