Diffusers
Safetensors
English

Pyramidal Spectrum

Frequency-based Hierarchically Vector Quantized VAE for Videos

Official Implementation β€” WACV 2026

This repository provides the official implementation of the paper:

Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos
Accepted at WACV 2026

We introduce a new autoencoder trained on 4K-resolution video data, featuring a hierarchical frequency-based vector quantization method.
The model leverages a pyramidal spectral representation to produce high-fidelity video reconstructions with an efficient latent structure.


πŸ“¦ Installation

This implementation requires installing Diffusers from the custom branch:

pip install git+https://github.com/Onkarsus13/diffusers@MMVQVae

πŸš€ Features

  • Novel hierarchical frequency-domain quantization
  • Trained on 4K-resolution video datasets
  • Multi-level pyramidal spectral decomposition
  • Highly efficient latent video representation
  • High-quality reconstructions suitable for generative pipelines

@inproceedings{pyramidal_spectrum_wacv2026,
  title     = {Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos},
  author    = {Tushar, Prakash and Onkar, Susladkar and Inderjit, 
              Inderjit Dhillon and Sparsh Mittal},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year      = {2026}
}
Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train onkarsus13/MMVQVae