TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

This repository contains the official model weights for TLB-VFI, an efficient video-based diffusion model presented in the paper TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation.

🌐 Project Page: https://zonglinl.github.io/tlbvfi_page
💻 Code: https://github.com/ZonglinL/TLB-VFI

Abstract

Video Frame Interpolation (VFI) aims to predict the intermediate frame $I_n$ (we use n to denote time in videos to avoid notation overload with the timestep $t$ in diffusion models) based on two consecutive neighboring frames $I_0$ and $I_1$. Recent approaches apply diffusion models (both image-based and video-based) in this task and achieve strong performance. However, image-based diffusion models are unable to extract temporal information and are relatively inefficient compared to non-diffusion methods. Video-based diffusion models can extract temporal information, but they are too large in terms of training scale, model size, and inference time. To mitigate the above issues, we propose Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation (TLB-VFI), an efficient video-based diffusion model. By extracting rich temporal information from video inputs through our proposed 3D-wavelet gating and temporal-aware autoencoder, our method achieves 20% improvement in FID on the most challenging datasets over recent SOTA of image-based diffusion models. Meanwhile, due to the existence of rich temporal information, our method achieves strong performance while having 3times fewer parameters. Such a parameter reduction results in 2.3x speed up. By incorporating optical flow guidance, our method requires 9000x less training data and achieves over 20x fewer parameters than video-based diffusion models.

Overview

TLB-VFI leverages temporal information extraction in the pixel space (3D wavelet) and latent space (3D convolution and attention) to improve the temporal consistency of the model.

Quantitative Results

Our method achieves state-of-the-art performance in LPIPS/FloLPIPS/FID among all recent SOTAs.

Qualitative Results

Our method achieves the best visual quality among all recent SOTAs. For more visualizations, please refer to our project page.

Usage

For detailed instructions on setup, training, and evaluation, please refer to the official GitHub repository.

Inference Example

You can perform inference using the provided scripts on the GitHub repository. Please ensure you have downloaded the trained model weights.

To interpolate 7 frames in between frame0 and frame1:

python interpolate.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame

To interpolate 1 frame in between:

python interpolate_one.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame

Citation

If you find this repository helpful for your research, please cite the paper:

@article{lyu2025tlbvfitemporalawarelatentbrownian,
      title={TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation}, 
      author={Zonglin Lyu and Chen Chen},
      year={2025},
      eprint={2507.04984},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
}