Enhance model card for TLB-VFI: add metadata, links, and usage

This PR significantly enhances the model card for the `ucfzl/TLBVFI` model by:
- Adding `pipeline_tag: image-to-video`, ensuring the model is discoverable under relevant tasks on the Hugging Face Hub.
- Specifying `library_name: diffusers`, which is the appropriate library for this diffusion model, enabling future integration and clearer usage guidance.
- Including `license: mit` for clarity on usage terms.
- Incorporating the full paper abstract to provide a comprehensive overview of the model.
- Adding direct links to the official project page and the GitHub repository for further details and code access.
- Embedding key images from the GitHub repository to visually explain the model and its results.
- Providing practical command-line usage examples for immediate inference, based on the GitHub repository's instructions.
- Including the BibTeX citation for proper academic attribution.

These changes will greatly improve the discoverability, usability, and documentation of the model on the Hugging Face Hub.

Files changed (1) hide show

README.md +79 -1

README.md CHANGED Viewed

	@@ -1 +1,79 @@
1	- ~~Paper link: https://huggingface.co/papers/2507.04984~~

+---
+pipeline_tag: image-to-video
+library_name: diffusers
+license: mit
+---
+# TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
+This repository contains the official model weights for **TLB-VFI**, an efficient video-based diffusion model presented in the paper [TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation](https://huggingface.co/papers/2507.04984).
+- 🌐 **Project Page**: [https://zonglinl.github.io/tlbvfi_page](https://zonglinl.github.io/tlbvfi_page)
+- 💻 **Code**: [https://github.com/ZonglinL/TLB-VFI](https://github.com/ZonglinL/TLB-VFI)
+<div align="center">
+  <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/visual1.png" width=95%>
+</div>
+## Abstract
+Video Frame Interpolation (VFI) aims to predict the intermediate frame $I_n$ (we use n to denote time in videos to avoid notation overload with the timestep $t$ in diffusion models) based on two consecutive neighboring frames $I_0$ and $I_1$. Recent approaches apply diffusion models (both image-based and video-based) in this task and achieve strong performance. However, image-based diffusion models are unable to extract temporal information and are relatively inefficient compared to non-diffusion methods. Video-based diffusion models can extract temporal information, but they are too large in terms of training scale, model size, and inference time. To mitigate the above issues, we propose Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation (TLB-VFI), an efficient video-based diffusion model. By extracting rich temporal information from video inputs through our proposed 3D-wavelet gating and temporal-aware autoencoder, our method achieves 20% improvement in FID on the most challenging datasets over recent SOTA of image-based diffusion models. Meanwhile, due to the existence of rich temporal information, our method achieves strong performance while having 3times fewer parameters. Such a parameter reduction results in 2.3x speed up. By incorporating optical flow guidance, our method requires 9000x less training data and achieves over 20x fewer parameters than video-based diffusion models.
+## Overview
+TLB-VFI leverages temporal information extraction in the pixel space (3D wavelet) and latent space (3D convolution and attention) to improve the temporal consistency of the model.
+<div align="center">
+  <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/overview.jpg" width=95%>
+</div>
+## Quantitative Results
+Our method achieves state-of-the-art performance in LPIPS/FloLPIPS/FID among all recent SOTAs.
+<div align="center">
+  <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/quant.png" width=95%>
+</div>
+## Qualitative Results
+Our method achieves the best visual quality among all recent SOTAs. For more visualizations, please refer to our [project page](https://zonglinl.github.io/tlbvfi_page).
+<div align="center">
+  <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/visual3.png" width=95%>
+</div>
+## Usage
+For detailed instructions on setup, training, and evaluation, please refer to the [official GitHub repository](https://github.com/ZonglinL/TLB-VFI).
+### Inference Example
+You can perform inference using the provided scripts on the GitHub repository. Please ensure you have downloaded the trained model weights.
+To interpolate 7 frames in between `frame0` and `frame1`:
+```bash
+python interpolate.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
+```
+To interpolate 1 frame in between:
+```bash
+python interpolate_one.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
+```
+## Citation
+If you find this repository helpful for your research, please cite the paper:
+```bibtex
+@article{lyu2025tlbvfitemporalawarelatentbrownian,
+      title={TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation},
+      author={Zonglin Lyu and Chen Chen},
+      year={2025},
+      eprint={2507.04984},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+}
+```