Dens3R: A Foundation Model for 3D Geometry Prediction
Abstract
Dens3R is a 3D foundation model that jointly predicts multiple geometric quantities using a two-stage training framework, enhancing consistency and performance in dense 3D reconstruction tasks.
Recent advances in dense 3D reconstruction have led to significant progress, yet achieving accurate unified geometric prediction remains a major challenge. Most existing methods are limited to predicting a single geometry quantity from input images. However, geometric quantities such as depth, surface normals, and point maps are inherently correlated, and estimating them in isolation often fails to ensure consistency, thereby limiting both accuracy and practical applicability. This motivates us to explore a unified framework that explicitly models the structural coupling among different geometric properties to enable joint regression. In this paper, we present Dens3R, a 3D foundation model designed for joint geometric dense prediction and adaptable to a wide range of downstream tasks. Dens3R adopts a two-stage training framework to progressively build a pointmap representation that is both generalizable and intrinsically invariant. Specifically, we design a lightweight shared encoder-decoder backbone and introduce position-interpolated rotary positional encoding to maintain expressive power while enhancing robustness to high-resolution inputs. By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities such as surface normals and depth, achieving consistent geometry perception from single-view to multi-view inputs. Additionally, we propose a post-processing pipeline that supports geometrically consistent multi-view inference. Extensive experiments demonstrate the superior performance of Dens3R across various dense 3D prediction tasks and highlight its potential for broader applications.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views (2025)
- VEIGAR: View-consistent Explicit Inpainting and Geometry Alignment for 3D object Removal (2025)
- Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey (2025)
- Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction (2025)
- MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details (2025)
- $\pi^3$: Scalable Permutation-Equivariant Visual Geometry Learning (2025)
- Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper