File size: 6,033 Bytes
58343f2 0256bd6 58343f2 f98dfcd 58343f2 f98dfcd 58343f2 f98dfcd 58343f2 f98dfcd 58343f2 f98dfcd 58343f2 f98dfcd 58343f2 bf859a1 f98dfcd bf859a1 f98dfcd 58343f2 d74c37a 58343f2 7edd1ad ef56c8b 488bfc5 ef56c8b 488bfc5 ef56c8b 7edd1ad c2d30eb f98dfcd 7edd1ad 58343f2 f98dfcd 58343f2 bf859a1 58343f2 f98dfcd 58343f2 f98dfcd 58343f2 f98dfcd 58343f2 0256bd6 beee322 f98dfcd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
---
license: agpl-3.0
tags:
- yolo
- yolo12
- segmentation
- object-detection
- person-detection
- instance-segmentation
- pytorch
- ultralytics
- computer-vision
datasets:
- coco
---
# YOLO12-seg Person Segmentation Model
A YOLO12-large (YOLO12l) instance segmentation model trained specifically for detecting and
segmenting people with high precision.
## Model Description
This model is a fine-tuned YOLO12-seg model optimized exclusively for person segmentation. It uses
the large (L) scale configuration of YOLO12, featuring 28.76M parameters and 510 layers with a depth
and width of 1.0.
### Key Features
- **Single-Class Focus**: Specialized in detecting only people
- **Detailed Segmentation**: Provides pixel-perfect segmentation masks
- **High Throughput**: Optimized for processing hundreds of images per minute
- **Quality-Optimized**: Trained specifically for accurate boundary delineation
- **GPU-Optimized**: The Large (L) model is designed for GPU deployment, not edge devices or
mobile phones
### Available Models
This repository contains two model versions:
- `yolo12l-person-seg.pt`: The original model trained for 100 epochs.
- `yolo12l-person-seg-extended.pt`: The improved model after extended training for 300 epochs
(recommended).
## Training
The model was trained on a filtered version of the COCO dataset containing only images with people:
- **Training Images**: 64,114 images containing people
- **Validation Images**: 2,693 images containing people
- **Training Details**:
- Initially trained for 100 epochs, then extended training continued for a total of 300
epochs.
- Input resolution: 640×640
- Class-focused optimization with `single_cls=True` and `classes=0`
- Optimized for segmentation with `overlap_mask=True` and `mask_ratio=4`
## Performance
The model achieves the following metrics on the COCO person validation set:
| Metric | Value |
| ------------------- | ----- |
| Box mAP50-95 (COCO) | 0.642 |
| Box mAP50 (COCO) | 0.851 |
| Mask mAP50-95 | 0.537 |
| Mask mAP50 | 0.837 |
| Box Precision | 0.840 |
| Box Recall | 0.759 |
| Mask Precision | 0.843 |
| Mask Recall | 0.748 |
Note: These metrics reflect the performance of the extended 300-epoch model
(`yolo12l-person-seg-extended.pt`).
These metrics were computed on the standard COCO `val2017` validation set.
## Example Results
<table>
<tr>
<td><img src="examples/example2.png" alt="Person segmentation example 2" /></td>
<td><img src="examples/example4.png" alt="Person segmentation example 4" /></td>
</tr>
<tr>
<td><img src="examples/example1.png" alt="Person segmentation example 1" /></td>
<td><img src="examples/example3.png" alt="Person segmentation example 3" /></td>
</tr>
</table>
<div align="center">
<img src="examples/example5.png" alt="Person segmentation example 5" style="max-width:90%;" />
</div>
The model effectively segments people in various poses, lighting conditions, and contexts, providing
accurate masks even with complex backgrounds. As shown in these examples, the segmentation masks
(highlighted in color) precisely outline the human subjects, making this model ideal for
applications requiring detailed person isolation.
## Use Cases
This model is ideal for applications requiring precise person segmentation:
- Human-centric image editing
- Background removal focused on people
- Virtual try-on applications
- People counting and crowd analysis
- Smart surveillance systems
## Usage
The model can be used directly with Ultralytics YOLO:
```python
from ultralytics import YOLO
# Load the model
model = YOLO('path/to/yolo12l-person-seg-extended.pt') # Or yolo12l-person-seg.pt for the original
# Perform inference
results = model('image.jpg')
# Process results (segmentation masks and bounding boxes)
for result in results:
boxes = result.boxes # Tensor operations can be performed on boxes
masks = result.masks # Segmentation masks
if masks is not None:
# Process masks
for mask in masks:
# Use the mask for your application
pass
```
For segmentation visualization:
```python
import cv2
import numpy as np
from ultralytics import YOLO
# Load the model and image
model = YOLO('path/to/yolo12l-person-seg-extended.pt') # Or yolo12l-person-seg.pt for the original
image = cv2.imread('image.jpg')
# Perform inference
results = model(image)
# Process and visualize the first result
result = results[0]
if result.masks is not None:
masks = result.masks.data.cpu().numpy()
for i, mask in enumerate(masks):
# Create a colored overlay for each mask
color = [np.random.randint(0, 255) for _ in range(3)]
mask_image = np.zeros_like(image, dtype=np.uint8)
mask_image[mask.astype(bool)] = color
image = cv2.addWeighted(image, 1.0, mask_image, 0.5, 0)
# Display or save the image
cv2.imwrite('segmented_image.jpg', image)
```
## Limitations
- This model is optimized for person segmentation only and won't detect other classes
- Performance may be reduced in extreme lighting conditions
- Occluded persons may have incomplete segmentation masks
- Small or distant people might not be detected as reliably as those in foreground
- **GPU Recommended**: As a Large (L) model, real-time inference performance benefits from a
dedicated GPU
- **Edge Device Limitations**: Not optimized for mobile or edge deployment (consider YOLO12n or
YOLO12s for those use cases)
## License
This model is available under the GNU Affero General Public License v3.0 (AGPL-3.0).
### License Note
This model was trained using the Ultralytics YOLO framework, which is licensed under the GNU Affero
General Public License v3.0 (AGPL-3.0). As per the terms of the AGPL-3.0 license, any derivative
works (including trained models) must also be distributed under the same license.
|