RTMPose3D
Real-time multi-person 3D whole-body pose estimation with 133 keypoints per person.
Model Description
RTMPose3D is a real-time 3D pose estimation model that detects and tracks 133 keypoints per person:
- 17 body keypoints (COCO format)
- 6 foot keypoints
- 68 facial landmarks
- 42 hand keypoints (21 per hand)
The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.
Model Variants
This repository contains checkpoints for:
| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
|---|---|---|---|---|
| RTMDet-M (Detector) | ~50M | Fast | - | rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth |
| RTMW3D-L (Large) | ~65M | Real-time | 0.678 | rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth |
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.680 | rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth |
The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.
Model Variants
This repository contains checkpoints for:
| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
|---|---|---|---|---|
| RTMDet-M (Detector) | ~50M | Fast | - | rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth |
| RTMW3D-L (Large) | ~65M | Real-time | 0.045 | rtmw3d-l_cock14-0d4ad840_20240422.pth |
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.057 | rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth |
Installation
pip install git+https://github.com/mutedeparture/rtmpose3d.git
Or clone and install locally:
git clone https://github.com/mutedeparture/rtmpose3d.git
cd rtmpose3d
pip install -r requirements.txt
pip install -e .
Quick Start
Using the HuggingFace Transformers-style API
import cv2
from rtmpose3d import RTMPose3D
# Initialize model (auto-downloads checkpoints from this repo)
model = RTMPose3D.from_pretrained('rbarac/rtmpose3d', device='cuda:0')
# Run inference
image = cv2.imread('person.jpg')
results = model(image, return_tensors='np')
# Access results
keypoints_3d = results['keypoints_3d'] # [N, 133, 3] - 3D coords in meters
keypoints_2d = results['keypoints_2d'] # [N, 133, 2] - pixel coords
scores = results['scores'] # [N, 133] - confidence [0, 1]
Using the Simple Inference API
from rtmpose3d import RTMPose3DInference
# Initialize with model size
model = RTMPose3DInference(model_size='l', device='cuda:0') # or 'x' for extra large
# Run inference
results = model(image)
print(results['keypoints_3d'].shape) # [N, 133, 3]
Single Person Detection
Detect only the most prominent person in the image:
# Works with both APIs
results = model(image, single_person=True) # Returns only N=1
Output Format
{
'keypoints_3d': np.ndarray, # [N, 133, 3] - (X, Y, Z) in meters
'keypoints_2d': np.ndarray, # [N, 133, 2] - (x, y) pixel coordinates
'scores': np.ndarray, # [N, 133] - confidence scores [0, 1]
'bboxes': np.ndarray # [N, 4] - bounding boxes [x1, y1, x2, y2]
}
Where N is the number of detected persons.
Coordinate Systems
2D Keypoints - Pixel coordinates:
- X: horizontal position [0, image_width]
- Y: vertical position [0, image_height]
3D Keypoints - Camera-relative coordinates in meters (Z-up convention):
- X: horizontal (negative=left, positive=right)
- Y: depth (negative=closer, positive=farther)
- Z: vertical (negative=down, positive=up)
Keypoint Indices
| Index Range | Body Part | Count | Description |
|---|---|---|---|
| 0-16 | Body | 17 | Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles |
| 17-22 | Feet | 6 | Foot keypoints |
| 23-90 | Face | 68 | Facial landmarks |
| 91-111 | Left Hand | 21 | Left hand keypoints |
| 112-132 | Right Hand | 21 | Right hand keypoints |
Training Data
The models were trained on the Cocktail14 dataset, which combines 14 public 3D pose datasets:
- Human3.6M
- COCO-WholeBody
- UBody
- And 11 more datasets
Performance
Evaluated on standard 3D pose benchmarks:
- RTMW3D-L: 0.045 MPJPE, real-time inference (~30 FPS on RTX 3090)
- RTMW3D-X: 0.057 MPJPE, slower but higher accuracy
Requirements
- Python >= 3.8
- PyTorch >= 2.0.0
- CUDA-capable GPU (4GB+ VRAM recommended)
- mmcv >= 2.0.0
- MMPose >= 1.0.0
- MMDetection >= 3.0.0
Citation
@misc{rtmpose3d2025,
title={RTMPose3D: Real-Time Multi-Person 3D Pose Estimation},
author={Arac, Bahadir},
year={2025},
publisher={GitHub},
url={https://github.com/mutedeparture/rtmpose3d}
}
License
Apache 2.0
Acknowledgments
Built on MMPose by OpenMMLab. Models trained by the MMPose team on the Cocktail14 dataset.
Links
- GitHub Repository: mutedeparture/rtmpose3d
- Documentation: See README in the repository
- MMPose: open-mmlab/mmpose