TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
This repository contains the official model weights for TLB-VFI, an efficient video-based diffusion model presented in the paper TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation.
- ๐ Project Page: https://zonglinl.github.io/tlbvfi_page
- ๐ป Code: https://github.com/ZonglinL/TLB-VFI

Abstract
Video Frame Interpolation (VFI) aims to predict the intermediate frame $I_n$ (we use n to denote time in videos to avoid notation overload with the timestep $t$ in diffusion models) based on two consecutive neighboring frames $I_0$ and $I_1$. Recent approaches apply diffusion models (both image-based and video-based) in this task and achieve strong performance. However, image-based diffusion models are unable to extract temporal information and are relatively inefficient compared to non-diffusion methods. Video-based diffusion models can extract temporal information, but they are too large in terms of training scale, model size, and inference time. To mitigate the above issues, we propose Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation (TLB-VFI), an efficient video-based diffusion model. By extracting rich temporal information from video inputs through our proposed 3D-wavelet gating and temporal-aware autoencoder, our method achieves 20% improvement in FID on the most challenging datasets over recent SOTA of image-based diffusion models. Meanwhile, due to the existence of rich temporal information, our method achieves strong performance while having 3times fewer parameters. Such a parameter reduction results in 2.3x speed up. By incorporating optical flow guidance, our method requires 9000x less training data and achieves over 20x fewer parameters than video-based diffusion models.
Overview
TLB-VFI leverages temporal information extraction in the pixel space (3D wavelet) and latent space (3D convolution and attention) to improve the temporal consistency of the model.

Quantitative Results
Our method achieves state-of-the-art performance in LPIPS/FloLPIPS/FID among all recent SOTAs.

Qualitative Results
Our method achieves the best visual quality among all recent SOTAs. For more visualizations, please refer to our project page.

Usage
For detailed instructions on setup, training, and evaluation, please refer to the official GitHub repository.
Inference Example
You can perform inference using the provided scripts on the GitHub repository. Please ensure you have downloaded the trained model weights.
To interpolate 7 frames in between frame0
and frame1
:
python interpolate.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
To interpolate 1 frame in between:
python interpolate_one.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
Citation
If you find this repository helpful for your research, please cite the paper:
@article{lyu2025tlbvfitemporalawarelatentbrownian,
title={TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation},
author={Zonglin Lyu and Chen Chen},
year={2025},
eprint={2507.04984},
archivePrefix={arXiv},
primaryClass={cs.CV},
}
- Downloads last month
- 19