Image-to-3D
nielsr HF Staff commited on
Commit
ec475ae
·
verified ·
1 Parent(s): eb0088c

Improve model card for 3D-MOOD: Add metadata, usage, and rich description

Browse files

This PR significantly enhances the model card for **3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection** by:

- Adding the `pipeline_tag: image-to-3d` to the metadata, ensuring the model is discoverable under the "image-to-3d" pipeline on the Hugging Face Hub.
- Adding the `license: apache-2.0` to the metadata for clarity regarding usage rights.
- Removing the non-standard `library_name: 3D-MOOD` from the metadata, as it does not correspond to a Hugging Face-supported library for automated code snippets. The model's framework, Vis4D, is mentioned in the content.
- Providing a detailed content section including the paper title, abstract, direct links to the Hugging Face paper page, the project page, and the GitHub repository.
- Integrating a clear "Demo" section with a code snippet and visualization directly from the GitHub README to showcase sample usage.
- Including relevant images from the GitHub repository to enrich the model card's visual appeal and information.

Please review and merge this PR.

Files changed (1) hide show
  1. README.md +54 -2
README.md CHANGED
@@ -1,7 +1,59 @@
1
  ---
2
- library_name: 3D-MOOD
3
  datasets:
4
  - RoyYang0714/3D-MOOD
 
 
5
  ---
6
 
7
- - Library: https://github.com/cvg/3D-MOOD
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  datasets:
3
  - RoyYang0714/3D-MOOD
4
+ license: apache-2.0
5
+ pipeline_tag: image-to-3d
6
  ---
7
 
8
+ <div align="center">
9
+
10
+ # 3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
11
+
12
+ <a href="https://huggingface.co/papers/2507.23567"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a>
13
+ <a href='https://royyang0714.github.io/3D-MOOD'><img src='https://img.shields.io/badge/Project%20Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a>
14
+ <a href='https://huggingface.co/spaces/RoyYang0714/3D-MOOD'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live%20Demo-blue'></a> \
15
+ <a href='https://huggingface.co/RoyYang0714/3D-MOOD'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue'></a>
16
+ <a href='https://huggingface.co/datasets/RoyYang0714/3D-MOOD'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Datasets-blue'></a>
17
+
18
+ </div>
19
+
20
+ <div>
21
+ <img src="https://github.com/cvg/3D-MOOD/raw/main/assets/overview.png" width="100%" alt="Banner 2" align="center">
22
+ </div>
23
+
24
+ This repository contains the official models and code for the paper [3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection](https://huggingface.co/papers/2507.23567).
25
+
26
+ Project Page: https://royyang0714.github.io/3D-MOOD
27
+ Code: https://github.com/cvg/3D-MOOD
28
+
29
+ ## Abstract
30
+
31
+ Monocular 3D object detection is valuable for various applications such as robotics and AR/VR. Existing methods are confined to closed-set settings, where the training and testing sets consist of the same scenes and/or object categories. However, real-world applications often introduce new environments and novel object categories, posing a challenge to these methods. In this paper, we address monocular 3D object detection in an open-set setting and introduce the first end-to-end 3D Monocular Open-set Object Detector (3D-MOOD). We propose to lift the open-set 2D detection into 3D space through our designed 3D bounding box head, enabling end-to-end joint training for both 2D and 3D tasks to yield better overall performance. We condition the object queries with geometry prior and overcome the generalization for 3D estimation across diverse scenes. To further improve performance, we design the canonical image space for more efficient cross-dataset training. We evaluate 3D-MOOD on both closed-set settings (Omni3D) and open-set settings (Omni3D to Argoverse 2, ScanNet), and achieve new state-of-the-art results.
32
+
33
+ ## Demo
34
+
35
+ We provide the [`demo.py`](https://github.com/cvg/3D-MOOD/blob/main/scripts/demo.py) to test whether the installation is complete.
36
+
37
+ ```bash
38
+ python scripts/demo.py
39
+ ```
40
+
41
+ It will save the prediction as follow to `assets/demo/output.png`.
42
+
43
+ <div align="center">
44
+ <img src="https://github.com/cvg/3D-MOOD/raw/main/assets/demo/output.png" alt="Demo Output">
45
+ </div>
46
+
47
+ You can also try the live demo on [Hugging Face Spaces](https://huggingface.co/spaces/RoyYang0714/3D-MOOD)!
48
+
49
+ ## Citation
50
+
51
+ If you find our work useful in your research please consider citing our publications:
52
+ ```bibtex
53
+ @article{yang20253d,
54
+ title={3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection},
55
+ author={Yang, Yung-Hsu and Piccinelli, Luigi and Segu, Mattia and Li, Siyuan and Huang, Rui and Fu, Yuqian and Pollefeys, Marc and Blum, Hermann and Bauer, Zuria},
56
+ journal={arXiv preprint arXiv:2507.23567},
57
+ year={2025}
58
+ }
59
+ ```