Flow Poke Transformer (FPT)

Paper and Abstract

The Flow Poke Transformer (FPT) was presented in the paper What If : Understanding Motion Through Sparse Interactions.

FPT is a novel framework for directly predicting the distribution of local motion, conditioned on sparse interactions termed "pokes". Unlike traditional methods that typically only enable dense sampling of a single realization of scene dynamics, FPT provides an interpretable, directly accessible representation of multi-modal scene motion, its dependency on physical interactions, and the inherent uncertainties of scene dynamics. The model has been evaluated on several downstream tasks, demonstrating competitive performance in dense face motion generation, articulated object motion estimation, and moving part segmentation from pokes.

Project Page and Code

Project Page: https://compvis.github.io/flow-poke-transformer/
GitHub Repository: https://github.com/CompVis/flow-poke-transformer

FPT predicts distributions of potential motion for sparse points. Left: the paw pushing the hand down will force the hand downwards, resulting in a unimodal distribution. Right: the hand moving down results in two modes, the paw following along or staying put.

Usage

The easiest way to try FPT is via our interactive demo:

python -m scripts.demo.app --compile True --warmup_compiled_paths True

Compilation is optional but recommended for a better user experience. A checkpoint will be downloaded from Hugging Face by default if not explicitly specified via the CLI.

For programmatic usage, the simplest way to use FPT is via torch.hub:

import torch

model = torch.hub.load("CompVis/flow_poke_transformer", "fpt_base")

If you wish to integrate FPT into your own codebase, you can copy model.py and dinov2.py from the GitHub repository. The model can then be instantiated as follows:

import torch
from flow_poke.model import FlowPokeTransformer_Base

model: FlowPokeTransformer_Base = FlowPokeTransformer_Base()
state_dict = torch.load("fpt_base.pt") # You would need to download the weights separately
model.load_state_dict(state_dict)
model.requires_grad_(False)
model.eval()

The FlowPokeTransformer class contains all necessary methods for various applications. For high-level usage, refer to the FlowPokeTransformer.predict_*() methods. For low-level usage, the module's forward() can be used.

Citation

If you find our model or code useful, please cite our paper:

@inproceedings{baumann2025whatif,
    title={What If: Understanding Motion Through Sparse Interactions}, 
    author={Stefan Andreas Baumann and Nick Stracke and Timy Phan and Bj{\"o}rn Ommer},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track