RTMPose3D

Real-time multi-person 3D whole-body pose estimation with 133 keypoints per person.

Model Description

RTMPose3D is a real-time 3D pose estimation model that detects and tracks 133 keypoints per person:

17 body keypoints (COCO format)
6 foot keypoints
68 facial landmarks
42 hand keypoints (21 per hand)

The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.

Model Variants

This repository contains checkpoints for:

Model	Parameters	Speed	Accuracy (MPJPE)	Checkpoint File
RTMDet-M (Detector)	~50M	Fast	-	`rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth`
RTMW3D-L (Large)	~65M	Real-time	0.678	`rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth`
RTMW3D-X (Extra Large)	~98M	Slower	0.680	`rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth`

The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.

Model Variants

This repository contains checkpoints for:

Model	Parameters	Speed	Accuracy (MPJPE)	Checkpoint File
RTMDet-M (Detector)	~50M	Fast	-	`rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth`
RTMW3D-L (Large)	~65M	Real-time	0.045	`rtmw3d-l_cock14-0d4ad840_20240422.pth`
RTMW3D-X (Extra Large)	~98M	Slower	0.057	`rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth`

Installation

pip install git+https://github.com/mutedeparture/rtmpose3d.git

Or clone and install locally:

git clone https://github.com/mutedeparture/rtmpose3d.git
cd rtmpose3d
pip install -r requirements.txt
pip install -e .

Quick Start

Using the HuggingFace Transformers-style API

import cv2
from rtmpose3d import RTMPose3D

# Initialize model (auto-downloads checkpoints from this repo)
model = RTMPose3D.from_pretrained('rbarac/rtmpose3d', device='cuda:0')

# Run inference
image = cv2.imread('person.jpg')
results = model(image, return_tensors='np')

# Access results
keypoints_3d = results['keypoints_3d']  # [N, 133, 3] - 3D coords in meters
keypoints_2d = results['keypoints_2d']  # [N, 133, 2] - pixel coords
scores = results['scores']              # [N, 133] - confidence [0, 1]

Using the Simple Inference API

from rtmpose3d import RTMPose3DInference

# Initialize with model size
model = RTMPose3DInference(model_size='l', device='cuda:0')  # or 'x' for extra large

# Run inference
results = model(image)
print(results['keypoints_3d'].shape)  # [N, 133, 3]

Single Person Detection

Detect only the most prominent person in the image:

# Works with both APIs
results = model(image, single_person=True)  # Returns only N=1

Output Format

{
    'keypoints_3d': np.ndarray,  # [N, 133, 3] - (X, Y, Z) in meters
    'keypoints_2d': np.ndarray,  # [N, 133, 2] - (x, y) pixel coordinates
    'scores': np.ndarray,        # [N, 133] - confidence scores [0, 1]
    'bboxes': np.ndarray         # [N, 4] - bounding boxes [x1, y1, x2, y2]
}

Where N is the number of detected persons.

Coordinate Systems

2D Keypoints - Pixel coordinates:

X: horizontal position [0, image_width]
Y: vertical position [0, image_height]

3D Keypoints - Camera-relative coordinates in meters (Z-up convention):

X: horizontal (negative=left, positive=right)
Y: depth (negative=closer, positive=farther)
Z: vertical (negative=down, positive=up)

Keypoint Indices

Index Range	Body Part	Count	Description
0-16	Body	17	Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles
17-22	Feet	6	Foot keypoints
23-90	Face	68	Facial landmarks
91-111	Left Hand	21	Left hand keypoints
112-132	Right Hand	21	Right hand keypoints

Training Data

The models were trained on the Cocktail14 dataset, which combines 14 public 3D pose datasets:

Human3.6M
COCO-WholeBody
UBody
And 11 more datasets

Performance

Evaluated on standard 3D pose benchmarks:

RTMW3D-L: 0.045 MPJPE, real-time inference (~30 FPS on RTX 3090)
RTMW3D-X: 0.057 MPJPE, slower but higher accuracy

Requirements

Python >= 3.8
PyTorch >= 2.0.0
CUDA-capable GPU (4GB+ VRAM recommended)
mmcv >= 2.0.0
MMPose >= 1.0.0
MMDetection >= 3.0.0

Citation

@misc{rtmpose3d2025,
  title={RTMPose3D: Real-Time Multi-Person 3D Pose Estimation},
  author={Arac, Bahadir},
  year={2025},
  publisher={GitHub},
  url={https://github.com/mutedeparture/rtmpose3d}
}

License

Apache 2.0

Acknowledgments

Built on MMPose by OpenMMLab. Models trained by the MMPose team on the Cocktail14 dataset.

rbarac
/

rtmpose3d

RTMPose3D

Model Description

Model Variants

Model Variants

Installation

Quick Start

Using the HuggingFace Transformers-style API

Using the Simple Inference API

Single Person Detection

Output Format

Coordinate Systems

Keypoint Indices

Training Data

Performance

Requirements

Citation

License

Acknowledgments

Links