Orient Anything V2: Unifying Orientation and Rotation Understanding
Abstract
Orient Anything V2 enhances 3D orientation understanding through scalable 3D asset synthesis, symmetry-aware periodic distribution fitting, and multi-frame relative rotation prediction, achieving state-of-the-art performance across multiple benchmarks.
This work presents Orient Anything V2, an enhanced foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Building upon Orient Anything V1, which defines orientation via a single unique front face, V2 extends this capability to handle objects with diverse rotational symmetries and directly estimate relative rotations. These improvements are enabled by four key innovations: 1) Scalable 3D assets synthesized by generative models, ensuring broad category coverage and balanced data distribution; 2) An efficient, model-in-the-loop annotation system that robustly identifies 0 to N valid front faces for each object; 3) A symmetry-aware, periodic distribution fitting objective that captures all plausible front-facing orientations, effectively modeling object rotational symmetry; 4) A multi-frame architecture that directly predicts relative object rotations. Extensive experiments show that Orient Anything V2 achieves state-of-the-art zero-shot performance on orientation estimation, 6DoF pose estimation, and object symmetry recognition across 11 widely used benchmarks. The model demonstrates strong generalization, significantly broadening the applicability of orientation estimation in diverse downstream tasks.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning (2025)
- OPFormer: Object Pose Estimation leveraging foundation model with geometric encoding (2025)
- Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer (2025)
- AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment (2025)
- PatchAlign3D: Local Feature Alignment for Dense 3D Shape understanding (2026)
- Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment (2025)
- CoordAR: One-Reference 6D Pose Estimation of Novel Objects via Autoregressive Coordinate Map Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 3
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper