Add YOLO12l-seg person segmentation model with documentation and example code

Files changed (4) hide show

README.md +139 -0
requirements.txt +5 -0
sample_inference.py +124 -0
yolo12l-person-seg.pt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,139 @@

+---
+license: apache-2.0
+tags:
+  - yolo
+  - yolo12
+  - segmentation
+  - object-detection
+  - person-detection
+  - instance-segmentation
+  - pytorch
+  - ultralytics
+  - computer-vision
+datasets:
+  - coco
+---
+# YOLO12-seg Person Segmentation Model
+A YOLO12-large (YOLO12l) instance segmentation model trained specifically for detecting and segmenting people with high precision.
+## Model Description
+This model is a fine-tuned YOLO12-seg model optimized exclusively for person segmentation. It uses the large (L) scale configuration of YOLO12, featuring 28.76M parameters and 510 layers with a depth and width of 1.0.
+### Key Features
+- **Single-Class Focus**: Specialized in detecting only people
+- **Detailed Segmentation**: Provides pixel-perfect segmentation masks
+- **High Throughput**: Optimized for processing hundreds of images per minute
+- **Quality-Optimized**: Trained specifically for accurate boundary delineation
+- **GPU-Optimized**: The Large (L) model is designed for GPU deployment, not edge devices or mobile phones
+## Training
+The model was trained on a filtered version of the COCO dataset containing only images with people:
+- **Training Images**: 64,114 images containing people
+- **Validation Images**: 2,693 images containing people
+- **Training Details**:
+  - Initially trained for 100 epochs
+  - Extended training for additional 200 epochs (300 total)
+  - Input resolution: 640×640
+  - Class-focused optimization with `single_cls=True` and `classes=0`
+  - Optimized for segmentation with `overlap_mask=True` and `mask_ratio=4`
+  - Extended training with cosine learning rate schedule and patience=20
+## Performance
+The model achieves the following metrics on the COCO person validation set:
+| Metric                    | Value      |
+| ------------------------- | ---------- |
+| Box mAP50-95 (COCO)       | 0.628      |
+| Box mAP50 (COCO)          | 0.840      |
+| Mask mAP50-95             | 0.524      |
+| Mask mAP50                | 0.821      |
+| Box Precision             | 0.835      |
+| Box Recall                | 0.745      |
+| Mask Precision            | 0.843      |
+| Mask Recall               | 0.723      |
+These metrics were computed on a validation set of 5,000 images with 10,777 instances.
+## Use Cases
+This model is ideal for applications requiring precise person segmentation:
+- Human-centric image editing
+- Background removal focused on people
+- Virtual try-on applications
+- People counting and crowd analysis
+- Smart surveillance systems
+## Usage
+The model can be used directly with Ultralytics YOLOv8:
+```python
+from ultralytics import YOLO
+# Load the model
+model = YOLO('path/to/yolo12l-person-seg.pt')
+# Perform inference
+results = model('image.jpg')
+# Process results (segmentation masks and bounding boxes)
+for result in results:
+    boxes = result.boxes  # Tensor operations can be performed on boxes
+    masks = result.masks  # Segmentation masks
+    if masks is not None:
+        # Process masks
+        for mask in masks:
+            # Use the mask for your application
+            pass
+```
+For segmentation visualization:
+```python
+import cv2
+import numpy as np
+from ultralytics import YOLO
+# Load the model and image
+model = YOLO('path/to/yolo12l-person-seg.pt')
+image = cv2.imread('image.jpg')
+# Perform inference
+results = model(image)
+# Process and visualize the first result
+result = results[0]
+if result.masks is not None:
+    masks = result.masks.data.cpu().numpy()
+    for i, mask in enumerate(masks):
+        # Create a colored overlay for each mask
+        color = [np.random.randint(0, 255) for _ in range(3)]
+        mask_image = np.zeros_like(image, dtype=np.uint8)
+        mask_image[mask.astype(bool)] = color
+        image = cv2.addWeighted(image, 1.0, mask_image, 0.5, 0)
+    # Display or save the image
+    cv2.imwrite('segmented_image.jpg', image)
+```
+## Limitations
+- This model is optimized for person segmentation only and won't detect other classes
+- Performance may be reduced in extreme lighting conditions
+- Occluded persons may have incomplete segmentation masks
+- Small or distant people might not be detected as reliably as those in foreground
+- **GPU Recommended**: As a Large (L) model, real-time inference performance benefits from a dedicated GPU
+- **Edge Device Limitations**: Not optimized for mobile or edge deployment (consider YOLO12n or YOLO12s for those use cases)
+## License
+This model is available under the Apache 2.0 license.

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+ultralytics>=8.3.0
+torch>=2.0.0
+opencv-python>=4.7.0
+numpy>=1.22.0
+Pillow>=9.5.0

sample_inference.py ADDED Viewed

	@@ -0,0 +1,124 @@

+# sample_inference.py
+import argparse
+import torch
+from ultralytics import YOLO
+import cv2
+import numpy as np
+import json
+from PIL import Image
+def main():
+    parser = argparse.ArgumentParser(description='Run person segmentation with YOLO12l-seg model')
+    parser.add_argument('--model', type=str, default='yolo12l-person-seg.pt', help='Model path')
+    parser.add_argument('--image', type=str, required=True, help='Image path for inference')
+    parser.add_argument('--output', type=str, default='output.jpg', help='Output visualization image path')
+    parser.add_argument('--json', type=str, default='detections.json', help='JSON output file for detection data')
+    parser.add_argument('--conf', type=float, default=0.5, help='Confidence threshold')
+    args = parser.parse_args()
+    # Load the model
+    model = YOLO(args.model)
+    # Move to appropriate device if available
+    if torch.cuda.is_available():
+        print(f"Using CUDA device: {torch.cuda.get_device_name(0)}")
+        model.to('cuda')
+        device = 'cuda'
+        use_half = True
+    elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
+        print("Using Apple Silicon MPS")
+        model.to('mps')
+        device = 'mps'
+        use_half = False
+    else:
+        print("Using CPU")
+        device = None
+        use_half = False
+    # Load and check input image
+    try:
+        img = Image.open(args.image)
+        img_width, img_height = img.size
+        print(f"Image dimensions: {img_width}x{img_height}")
+    except Exception as e:
+        print(f"Error opening image: {e}")
+        return
+    # Run inference
+    if device == 'cuda':
+        results = model(args.image, classes=0, conf=args.conf, device=device, half=use_half)
+    elif device == 'mps':
+        results = model(args.image, classes=0, conf=args.conf, device=device)
+    else:
+        results = model(args.image, classes=0, conf=args.conf)
+    # Process results
+    detections = []
+    visualization_img = cv2.imread(args.image)
+    for result in results:
+        masks = result.masks
+        boxes = result.boxes
+        if boxes is None or len(boxes) == 0:
+            print("No people detected in the image")
+            return
+        person_count = len(boxes)
+        print(f"Detected {person_count} people")
+        # Visualize and extract data
+        if masks is not None:
+            for i, (mask, box) in enumerate(zip(masks.xy, boxes)):
+                confidence = float(box.conf[0])
+                x1, y1, x2, y2 = map(int, box.xyxy[0])
+                # Extract mask points
+                polygon_points = mask.tolist()
+                # Calculate percentages of image dimensions
+                x_coords = [point[0] for point in polygon_points]
+                y_coords = [point[1] for point in polygon_points]
+                min_x, max_x = min(x_coords), max(x_coords)
+                min_y, max_y = min(y_coords), max(y_coords)
+                width_pct = (max_x - min_x) / img_width
+                height_pct = (max_y - min_y) / img_height
+                # Create detection record
+                detection = {
+                    "id": i,
+                    "confidence": confidence,
+                    "box": [x1, y1, x2, y2],
+                    "points": polygon_points,
+                    "width_pct": width_pct,
+                    "height_pct": height_pct,
+                }
+                detections.append(detection)
+                # Draw bounding box
+                cv2.rectangle(visualization_img, (x1, y1), (x2, y2), (0, 255, 0), 2)
+                cv2.putText(visualization_img, f'Person: {confidence:.2f}', (x1, y1 - 10),
+                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
+                # Draw segmentation mask
+                color_mask = np.zeros_like(visualization_img, dtype=np.uint8)
+                mask_points = np.array(polygon_points, dtype=np.int32)
+                cv2.fillPoly(color_mask, [mask_points], (0, 0, 255))
+                # Blend the mask with the original image
+                visualization_img = cv2.addWeighted(visualization_img, 1.0, color_mask, 0.5, 0)
+    # Save visualization
+    cv2.imwrite(args.output, visualization_img)
+    print(f"Visualization saved to {args.output}")
+    # Save detection data to JSON
+    with open(args.json, 'w') as f:
+        json.dump({
+            "person_count": person_count,
+            "detections": detections
+        }, f, indent=4)
+    print(f"Detection data saved to {args.json}")
+if __name__ == "__main__":
+    main()

yolo12l-person-seg.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:abc090155cfe7a883fcc613868f482fa7db04ea67a6b4366c58c07deaa4c2ba1
+size 58148802