dfki-av
/

spinepose

 license: cc-by-nc-4.0
 datasets:
 - saifkhichi96/spinetrack
+base_model:
+- Tau-J/RTMPose
+tags:
+- 2d-human-pose-estimation
+- computer-vision
+- keypoint-detection
+- spinepose
+- spinetrack
+language:
+- en
+---
+# 🩻 Model Card for **SpinePose** Family
+**SpinePose** is a family of 2D human pose estimation models trained to estimate a **37-keypoint skeleton**, extending standard human body models to include the **spine**, **pelvis**, and **feet** regions in detail.
+It offers high anatomical precision for **biomechanical analysis**, **ergonomic assessment**, and **clinical pose tracking**, while maintaining compatibility with COCO-style keypoint definitions.
+---
+## 📘 Model Details
+### Description
+- **Developed by:** [Muhammad Saif Ullah Khan](https://saifkhichi.com/)
+- **Affiliation:** Technical University of Kaiserslautern & [DFKI](https://av.dfki.de/)
+- **Funding:** DFKI GmbH
+- **Model Type:** Top-down 2D keypoint estimator
+- **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/)
+- **Frameworks:** PyTorch, ONNX Runtime
+- **Input Resolution:** 256×192 or 384×288 (depending on variant)
+### Sources
+- **Repository:** [github.com/dfki-av/spinepose](https://github.com/dfki-av/spinepose)
+- **Paper:** [CVPR Workshops 2025 (CVSPORTS)](https://openaccess.thecvf.com/content/CVPR2025W/CVSPORTS/html/Khan_Towards_Unconstrained_2D_Pose_Estimation_of_the_Human_Spine_CVPRW_2025_paper.html)
+- **Demo:** [saifkhichi.com/research/spinepose](https://www.saifkhichi.com/research/spinepose/)
+---
+## Intended Uses
+### Direct Use
+- Human body and spine joint localization from RGB images or videos
+- Real-time motion analysis for research, animation, or sports applications
+- Augmentation of general-purpose pose estimators for anatomically rich tasks
+### Downstream Use
+- Integration with clinical posture tracking systems
+- 3D pose lifting or musculoskeletal modeling (via SpineTrack synthetic subset)
+- Fine-tuning on domain-specific datasets (industrial, rehabilitation, yoga)
+### Out-of-Scope Use
+- Any medical diagnosis or treatment application without human oversight
+- Full-body 3D reconstruction (requires separate lifting model)
+- Unverified use in safety-critical systems
+---
+## Bias, Risks, and Limitations
+- Model trained primarily on controlled and synthetic datasets; may underperform in occluded or extreme poses.
+- Limited diversity in body types and cultural attire representation.
+- Bias inherited from COCO/Body8 datasets used for pretraining the teachers.
+### Recommendations
+Evaluate the model on your specific domain and retrain or augment using domain-specific samples to mitigate dataset bias.
+---
+## Getting Started
+### Installation
+```bash
+pip install spinepose
+```
+On Linux/Windows with CUDA available, install the GPU version:
+```bash
+pip install spinepose[gpu]
+```
+### CLI Usage
+```bash
+spinepose -i /path/to/image_or_video -o /path/to/output
+```
+This automatically downloads the correct ONNX checkpoint.
+Run `spinepose -h` for detailed usage options.
+### Python API
+```python
+import cv2
+from spinepose import SpinePoseEstimator
+# Initialize estimator (downloads ONNX model if not found locally)
+estimator = SpinePoseEstimator(device='cuda')
+# Perform inference on a single image
+image = cv2.imread('path/to/image.jpg')
+keypoints, scores = estimator.predict(image)
+visualized = estimator.visualize(image, keypoints, scores)
+cv2.imwrite('output.jpg', visualized)
+```
+For higher-level use:
+```python
+from spinepose.inference import infer_image, infer_video
+# Single image inference
+infer_image('path/to/image.jpg', 'output.jpg')
+# Video inference with optional temporal smoothing
+infer_video('path/to/video.mp4', 'output_video.mp4', use_smoothing=True)
+```
+## Evaluation
+To reproduce results, prepare the following directory layout:
+```plaintext
+<PROJECT_DIR>/
+├─ data/
+│  ├─ spinetrack/
+│  ├─ coco/
+│  └─ halpe/
+└─ checkpoints/
+   ├─ spinepose-s_32xb256-10e_spinetrack-256x192.pth
+   ├─ spinepose-m_32xb256-10e_spinetrack-256x192.pth
+   ├─ spinepose-l_32xb256-10e_spinetrack-256x192.pth
+   └─ spinepose-x_32xb128-10e_spinetrack-384x288.pth
+```
+Each PyTorch checkpoint contains both `teacher` and `student` weights, with only the `student` used during inference. Exported ONNX checkpoints only contain the `student`.
+### Metrics
+We report **Average Precision (AP)** and **Average Recall (AR)** under varying Object Keypoint Similarity (OKS) thresholds, consistent with COCO conventions but extended to the 37-keypoint SpineTrack format.
+### Results
+<table border="1" cellspacing="0" cellpadding="6" style="border-collapse:collapse; text-align:center; font-family:Arial; font-size:13px;">
+  <thead style="background-color:#f0f0f0; font-weight:bold;">
+    <tr>
+      <th>Method</th>
+      <th>Train Data</th>
+      <th>Kpts</th>
+      <th colspan="2">COCO</th>
+      <th colspan="2">Halpe26</th>
+      <th colspan="2">Body</th>
+      <th colspan="2">Feet</th>
+      <th colspan="2">Spine</th>
+      <th colspan="2">Overall</th>
+      <th>Params (M)</th>
+      <th>FLOPs (G)</th>
+    </tr>
+    <tr>
+      <th></th><th></th><th></th>
+      <th>AP</th><th>AR</th>
+      <th>AP</th><th>AR</th>
+      <th>AP</th><th>AR</th>
+      <th>AP</th><th>AR</th>
+      <th>AP</th><th>AR</th>
+      <th>AP</th><th>AR</th>
+      <th></th><th></th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr><td>SimCC-MBV2</td><td>COCO</td><td>17</td><td>62.0</td><td>67.8</td><td>33.2</td><td>43.9</td><td>72.1</td><td>75.6</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.1</td><td>0.1</td><td>2.29</td><td>0.31</td></tr>
+    <tr><td>RTMPose-t</td><td>Body8</td><td>26</td><td>65.9</td><td>71.3</td><td>68.0</td><td>73.2</td><td>76.9</td><td>80.0</td><td>74.1</td><td>79.7</td><td>0.0</td><td>0.0</td><td>15.8</td><td>17.9</td><td>3.51</td><td>0.37</td></tr>
+    <tr><td>RTMPose-s</td><td>Body8</td><td>26</td><td>69.7</td><td>74.7</td><td>72.0</td><td>76.7</td><td>80.9</td><td>83.6</td><td>78.9</td><td>83.5</td><td>0.0</td><td>0.0</td><td>17.2</td><td>19.4</td><td>5.70</td><td>0.70</td></tr>
+    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-s</td><td>SpineTrack</td><td>37</td><td>68.2</td><td>73.1</td><td>70.6</td><td>75.2</td><td>79.1</td><td>82.1</td><td>77.5</td><td>82.9</td><td>89.6</td><td>90.7</td><td>84.2</td><td>86.2</td><td>5.98</td><td>0.72</td></tr>
+    <tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
+    <tr><td>SimCC-ViPNAS</td><td>COCO</td><td>17</td><td>69.5</td><td>75.5</td><td>36.9</td><td>49.7</td><td>79.6</td><td>83.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.2</td><td>0.2</td><td>8.65</td><td>0.80</td></tr>
+    <tr><td>RTMPose-m</td><td>Body8</td><td>26</td><td>75.1</td><td>80.0</td><td>76.7</td><td>81.3</td><td>85.5</td><td>87.9</td><td>84.1</td><td>88.2</td><td>0.0</td><td>0.0</td><td>19.4</td><td>21.4</td><td>13.93</td><td>1.95</td></tr>
+    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-m</td><td>SpineTrack</td><td>37</td><td>73.0</td><td>77.5</td><td>75.0</td><td>79.2</td><td>84.0</td><td>86.4</td><td>83.5</td><td>87.4</td><td>91.4</td><td>92.5</td><td>88.0</td><td>89.5</td><td>14.34</td><td>1.98</td></tr>
+    <tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
+    <tr><td>RTMPose-l</td><td>Body8</td><td>26</td><td>76.9</td><td>81.5</td><td>78.4</td><td>82.9</td><td>86.8</td><td>89.2</td><td>86.9</td><td>90.0</td><td>0.0</td><td>0.0</td><td>20.0</td><td>22.0</td><td>28.11</td><td>4.19</td></tr>
+    <tr><td>RTMW-m</td><td>Cocktail14</td><td>133</td><td>73.8</td><td>78.7</td><td>63.8</td><td>68.5</td><td>84.3</td><td>86.7</td><td>83.0</td><td>87.2</td><td>0.0</td><td>0.0</td><td>6.2</td><td>7.6</td><td>32.26</td><td>4.31</td></tr>
+    <tr><td>SimCC-ResNet50</td><td>COCO</td><td>17</td><td>72.1</td><td>78.2</td><td>38.7</td><td>51.6</td><td>81.8</td><td>85.2</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.2</td><td>0.2</td><td>36.75</td><td>5.50</td></tr>
+    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-l</td><td>SpineTrack</td><td>37</td><td>75.2</td><td>79.5</td><td>77.0</td><td>81.1</td><td>85.4</td><td>87.7</td><td>85.5</td><td>89.2</td><td>91.0</td><td>92.2</td><td>88.4</td><td>90.0</td><td>28.66</td><td>4.22</td></tr>
+    <tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
+    <tr><td>SimCC-ResNet50*</td><td>COCO</td><td>17</td><td>73.4</td><td>79.0</td><td>39.8</td><td>52.4</td><td>83.2</td><td>86.2</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.3</td><td>0.3</td><td>43.29</td><td>12.42</td></tr>
+    <tr><td>RTMPose-x*</td><td>Body8</td><td>26</td><td>78.8</td><td>83.4</td><td>80.0</td><td>84.4</td><td>88.6</td><td>90.6</td><td>88.4</td><td>91.4</td><td>0.0</td><td>0.0</td><td>21.0</td><td>22.9</td><td>50.00</td><td>17.29</td></tr>
+    <tr><td>RTMW-l*</td><td>Cocktail14</td><td>133</td><td>75.6</td><td>80.4</td><td>65.4</td><td>70.1</td><td>86.0</td><td>88.3</td><td>85.6</td><td>89.2</td><td>0.0</td><td>0.0</td><td>8.1</td><td>8.1</td><td>57.20</td><td>7.91</td></tr>
+    <tr><td>RTMW-l*</td><td>Cocktail14</td><td>133</td><td>77.2</td><td>82.3</td><td>66.6</td><td>71.8</td><td>87.3</td><td>89.9</td><td>88.3</td><td>91.3</td><td>0.0</td><td>0.0</td><td>8.6</td><td>8.6</td><td>57.35</td><td>17.69</td></tr>
+    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-x*</td><td>SpineTrack</td><td>37</td><td>75.9</td><td>80.1</td><td>77.6</td><td>81.8</td><td>86.3</td><td>88.5</td><td>86.3</td><td>89.7</td><td>89.3</td><td>91.0</td><td>88.9</td><td>89.9</td><td>50.69</td><td>17.37</td></tr>
+  </tbody>
+</table>
+## SpineTrack Dataset
+The **SpineTrack** dataset comprises both real and synthetic data:
+- **SpineTrack-Real**: Annotated natural images with nine detailed spinal landmarks in addition to COCO joints.
+- **SpineTrack-Unreal**: Synthetic subset rendered in Unreal Engine with biomechanically aligned OpenSim annotations.
+To download:
+```bash
+git lfs install
+git clone https://huggingface.co/datasets/saifkhichi96/spinetrack
+```
+Alternatively, use `wget` to download the dataset directly:
+```bash
+wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/annotations.zip
+wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/images.zip
+```
+In both cases, the dataset will download two zipped folders: `annotations` (24.8 MB) and `images` (19.4 GB), which can be unzipped to obtain the following structure:
+```plaintext
+spinetrack
+├── annotations/
+│   ├── person_keypoints_train-real-coco.json
+│   ├── person_keypoints_train-real-yoga.json
+│   ├── person_keypoints_train-unreal.json
+│   └── person_keypoints_val2017.json
+└── images/
+    ├── train-real-coco/
+    ├── train-real-yoga/
+    ├── train-unreal/
+    └── val2017/
+```
+All annotations follow the COCO format, directly compatible with MMPose, Detectron2, or similar frameworks.
+The synthetic subset was primarily employed within the **active learning pipeline** used to bootstrap and refine annotations for real-world images.
+All released **SpinePose** models were trained exclusively on the **real** portion of the dataset.
+> [!WARNING]
+> A small number of annotations in the synthetic subset are corrupted.
+> We recommend avoiding their use until the updated labels are released in the next dataset version.
+## Citation
+If you use SpinePose or SpineTrack in your research, please cite:
+**BibTeX:**
+```bibtex
+@InProceedings{Khan_2025_CVPR,
+    author    = {Khan, Muhammad Saif Ullah and Krau{\ss}, Stephan and Stricker, Didier},
+    title     = {Towards Unconstrained 2D Pose Estimation of the Human Spine},
+    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
+    month     = {June},
+    year      = {2025},
+    pages     = {6171-6180}
+}
+```
+**APA:**
+_Khan, M. S. U., Krauß, S., & Stricker, D. (2025). Towards Unconstrained 2D Pose Estimation of the Human Spine. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 6172-6181)._
+## Model Card Contact
+[Muhammad Saif Ullah Khan]([email protected])