Flow Poke Transformer (FPT)
Paper and Abstract
The Flow Poke Transformer (FPT) was presented in the paper What If : Understanding Motion Through Sparse Interactions.
FPT is a novel framework for directly predicting the distribution of local motion, conditioned on sparse interactions termed "pokes". Unlike traditional methods that typically only enable dense sampling of a single realization of scene dynamics, FPT provides an interpretable, directly accessible representation of multi-modal scene motion, its dependency on physical interactions, and the inherent uncertainties of scene dynamics. The model has been evaluated on several downstream tasks, demonstrating competitive performance in dense face motion generation, articulated object motion estimation, and moving part segmentation from pokes.
Project Page and Code
- Project Page: https://compvis.github.io/flow-poke-transformer/
- GitHub Repository: https://github.com/CompVis/flow-poke-transformer
FPT predicts distributions of potential motion for sparse points. Left: the paw pushing the hand down will force the hand downwards, resulting in a unimodal distribution. Right: the hand moving down results in two modes, the paw following along or staying put.
Usage
The easiest way to try FPT is via our interactive demo:
python -m scripts.demo.app --compile True --warmup_compiled_paths True
Compilation is optional but recommended for a better user experience. A checkpoint will be downloaded from Hugging Face by default if not explicitly specified via the CLI.
For programmatic usage, the simplest way to use FPT is via torch.hub
:
import torch
model = torch.hub.load("CompVis/flow_poke_transformer", "fpt_base")
If you wish to integrate FPT into your own codebase, you can copy model.py
and dinov2.py
from the GitHub repository. The model can then be instantiated as follows:
import torch
from flow_poke.model import FlowPokeTransformer_Base
model: FlowPokeTransformer_Base = FlowPokeTransformer_Base()
state_dict = torch.load("fpt_base.pt") # You would need to download the weights separately
model.load_state_dict(state_dict)
model.requires_grad_(False)
model.eval()
The FlowPokeTransformer
class contains all necessary methods for various applications. For high-level usage, refer to the FlowPokeTransformer.predict_*()
methods. For low-level usage, the module's forward()
can be used.
Citation
If you find our model or code useful, please cite our paper:
@inproceedings{baumann2025whatif,
title={What If: Understanding Motion Through Sparse Interactions},
author={Stefan Andreas Baumann and Nick Stracke and Timy Phan and Bj{\"o}rn Ommer},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2025}
}