yolo12l-person-seg / README.md

Ryan Pfister

feat: Add extended 300-epoch model and update stats

f98dfcd 6 months ago

6.03 kB

metadata

license: agpl-3.0
tags:
  - yolo
  - yolo12
  - segmentation
  - object-detection
  - person-detection
  - instance-segmentation
  - pytorch
  - ultralytics
  - computer-vision
datasets:
  - coco

YOLO12-seg Person Segmentation Model

A YOLO12-large (YOLO12l) instance segmentation model trained specifically for detecting and segmenting people with high precision.

Model Description

This model is a fine-tuned YOLO12-seg model optimized exclusively for person segmentation. It uses the large (L) scale configuration of YOLO12, featuring 28.76M parameters and 510 layers with a depth and width of 1.0.

Key Features

Single-Class Focus: Specialized in detecting only people
Detailed Segmentation: Provides pixel-perfect segmentation masks
High Throughput: Optimized for processing hundreds of images per minute
Quality-Optimized: Trained specifically for accurate boundary delineation
GPU-Optimized: The Large (L) model is designed for GPU deployment, not edge devices or mobile phones

Available Models

This repository contains two model versions:

yolo12l-person-seg.pt: The original model trained for 100 epochs.
yolo12l-person-seg-extended.pt: The improved model after extended training for 300 epochs (recommended).

Training

The model was trained on a filtered version of the COCO dataset containing only images with people:

Training Images: 64,114 images containing people
Validation Images: 2,693 images containing people
Training Details:
- Initially trained for 100 epochs, then extended training continued for a total of 300 epochs.
- Input resolution: 640×640
- Class-focused optimization with single_cls=True and classes=0
- Optimized for segmentation with overlap_mask=True and mask_ratio=4

Performance

The model achieves the following metrics on the COCO person validation set:

Metric	Value
Box mAP50-95 (COCO)	0.642
Box mAP50 (COCO)	0.851
Mask mAP50-95	0.537
Mask mAP50	0.837
Box Precision	0.840
Box Recall	0.759
Mask Precision	0.843
Mask Recall	0.748

Note: These metrics reflect the performance of the extended 300-epoch model (yolo12l-person-seg-extended.pt).

These metrics were computed on the standard COCO val2017 validation set.

Example Results

The model effectively segments people in various poses, lighting conditions, and contexts, providing accurate masks even with complex backgrounds. As shown in these examples, the segmentation masks (highlighted in color) precisely outline the human subjects, making this model ideal for applications requiring detailed person isolation.

Use Cases

This model is ideal for applications requiring precise person segmentation:

Human-centric image editing
Background removal focused on people
Virtual try-on applications
People counting and crowd analysis
Smart surveillance systems

Usage

The model can be used directly with Ultralytics YOLO:

from ultralytics import YOLO

# Load the model
model = YOLO('path/to/yolo12l-person-seg-extended.pt') # Or yolo12l-person-seg.pt for the original

# Perform inference
results = model('image.jpg')

# Process results (segmentation masks and bounding boxes)
for result in results:
    boxes = result.boxes  # Tensor operations can be performed on boxes
    masks = result.masks  # Segmentation masks

    if masks is not None:
        # Process masks
        for mask in masks:
            # Use the mask for your application
            pass

For segmentation visualization:

import cv2
import numpy as np
from ultralytics import YOLO

# Load the model and image
model = YOLO('path/to/yolo12l-person-seg-extended.pt') # Or yolo12l-person-seg.pt for the original
image = cv2.imread('image.jpg')

# Perform inference
results = model(image)

# Process and visualize the first result
result = results[0]
if result.masks is not None:
    masks = result.masks.data.cpu().numpy()
    for i, mask in enumerate(masks):
        # Create a colored overlay for each mask
        color = [np.random.randint(0, 255) for _ in range(3)]
        mask_image = np.zeros_like(image, dtype=np.uint8)
        mask_image[mask.astype(bool)] = color
        image = cv2.addWeighted(image, 1.0, mask_image, 0.5, 0)

    # Display or save the image
    cv2.imwrite('segmented_image.jpg', image)

Limitations

This model is optimized for person segmentation only and won't detect other classes
Performance may be reduced in extreme lighting conditions
Occluded persons may have incomplete segmentation masks
Small or distant people might not be detected as reliably as those in foreground
GPU Recommended: As a Large (L) model, real-time inference performance benefits from a dedicated GPU
Edge Device Limitations: Not optimized for mobile or edge deployment (consider YOLO12n or YOLO12s for those use cases)

License

This model is available under the GNU Affero General Public License v3.0 (AGPL-3.0).

License Note

This model was trained using the Ultralytics YOLO framework, which is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). As per the terms of the AGPL-3.0 license, any derivative works (including trained models) must also be distributed under the same license.