nielsr HF Staff commited on
Commit
b947b19
·
verified ·
1 Parent(s): 41561b3

Enhance model card for TLB-VFI: add metadata, links, and usage

Browse files

This PR significantly enhances the model card for the `ucfzl/TLBVFI` model by:
- Adding `pipeline_tag: image-to-video`, ensuring the model is discoverable under relevant tasks on the Hugging Face Hub.
- Specifying `library_name: diffusers`, which is the appropriate library for this diffusion model, enabling future integration and clearer usage guidance.
- Including `license: mit` for clarity on usage terms.
- Incorporating the full paper abstract to provide a comprehensive overview of the model.
- Adding direct links to the official project page and the GitHub repository for further details and code access.
- Embedding key images from the GitHub repository to visually explain the model and its results.
- Providing practical command-line usage examples for immediate inference, based on the GitHub repository's instructions.
- Including the BibTeX citation for proper academic attribution.

These changes will greatly improve the discoverability, usability, and documentation of the model on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +79 -1
README.md CHANGED
@@ -1 +1,79 @@
1
- Paper link: https://huggingface.co/papers/2507.04984
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-to-video
3
+ library_name: diffusers
4
+ license: mit
5
+ ---
6
+
7
+ # TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
8
+
9
+ This repository contains the official model weights for **TLB-VFI**, an efficient video-based diffusion model presented in the paper [TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation](https://huggingface.co/papers/2507.04984).
10
+
11
+ - 🌐 **Project Page**: [https://zonglinl.github.io/tlbvfi_page](https://zonglinl.github.io/tlbvfi_page)
12
+ - 💻 **Code**: [https://github.com/ZonglinL/TLB-VFI](https://github.com/ZonglinL/TLB-VFI)
13
+
14
+ <div align="center">
15
+ <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/visual1.png" width=95%>
16
+ </div>
17
+
18
+ ## Abstract
19
+
20
+ Video Frame Interpolation (VFI) aims to predict the intermediate frame $I_n$ (we use n to denote time in videos to avoid notation overload with the timestep $t$ in diffusion models) based on two consecutive neighboring frames $I_0$ and $I_1$. Recent approaches apply diffusion models (both image-based and video-based) in this task and achieve strong performance. However, image-based diffusion models are unable to extract temporal information and are relatively inefficient compared to non-diffusion methods. Video-based diffusion models can extract temporal information, but they are too large in terms of training scale, model size, and inference time. To mitigate the above issues, we propose Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation (TLB-VFI), an efficient video-based diffusion model. By extracting rich temporal information from video inputs through our proposed 3D-wavelet gating and temporal-aware autoencoder, our method achieves 20% improvement in FID on the most challenging datasets over recent SOTA of image-based diffusion models. Meanwhile, due to the existence of rich temporal information, our method achieves strong performance while having 3times fewer parameters. Such a parameter reduction results in 2.3x speed up. By incorporating optical flow guidance, our method requires 9000x less training data and achieves over 20x fewer parameters than video-based diffusion models.
21
+
22
+ ## Overview
23
+
24
+ TLB-VFI leverages temporal information extraction in the pixel space (3D wavelet) and latent space (3D convolution and attention) to improve the temporal consistency of the model.
25
+
26
+ <div align="center">
27
+ <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/overview.jpg" width=95%>
28
+ </div>
29
+
30
+ ## Quantitative Results
31
+
32
+ Our method achieves state-of-the-art performance in LPIPS/FloLPIPS/FID among all recent SOTAs.
33
+
34
+ <div align="center">
35
+ <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/quant.png" width=95%>
36
+ </div>
37
+
38
+ ## Qualitative Results
39
+
40
+ Our method achieves the best visual quality among all recent SOTAs. For more visualizations, please refer to our [project page](https://zonglinl.github.io/tlbvfi_page).
41
+
42
+ <div align="center">
43
+ <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/visual3.png" width=95%>
44
+ </div>
45
+
46
+ ## Usage
47
+
48
+ For detailed instructions on setup, training, and evaluation, please refer to the [official GitHub repository](https://github.com/ZonglinL/TLB-VFI).
49
+
50
+ ### Inference Example
51
+
52
+ You can perform inference using the provided scripts on the GitHub repository. Please ensure you have downloaded the trained model weights.
53
+
54
+ To interpolate 7 frames in between `frame0` and `frame1`:
55
+
56
+ ```bash
57
+ python interpolate.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
58
+ ```
59
+
60
+ To interpolate 1 frame in between:
61
+
62
+ ```bash
63
+ python interpolate_one.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
64
+ ```
65
+
66
+ ## Citation
67
+
68
+ If you find this repository helpful for your research, please cite the paper:
69
+
70
+ ```bibtex
71
+ @article{lyu2025tlbvfitemporalawarelatentbrownian,
72
+ title={TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation},
73
+ author={Zonglin Lyu and Chen Chen},
74
+ year={2025},
75
+ eprint={2507.04984},
76
+ archivePrefix={arXiv},
77
+ primaryClass={cs.CV},
78
+ }
79
+ ```