Ryan Pfister commited on
Commit
58343f2
·
1 Parent(s): 7c97aba

Add YOLO12l-seg person segmentation model with documentation and example code

Browse files
Files changed (4) hide show
  1. README.md +139 -0
  2. requirements.txt +5 -0
  3. sample_inference.py +124 -0
  4. yolo12l-person-seg.pt +3 -0
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - yolo
5
+ - yolo12
6
+ - segmentation
7
+ - object-detection
8
+ - person-detection
9
+ - instance-segmentation
10
+ - pytorch
11
+ - ultralytics
12
+ - computer-vision
13
+ datasets:
14
+ - coco
15
+ ---
16
+
17
+ # YOLO12-seg Person Segmentation Model
18
+
19
+ A YOLO12-large (YOLO12l) instance segmentation model trained specifically for detecting and segmenting people with high precision.
20
+
21
+ ## Model Description
22
+
23
+ This model is a fine-tuned YOLO12-seg model optimized exclusively for person segmentation. It uses the large (L) scale configuration of YOLO12, featuring 28.76M parameters and 510 layers with a depth and width of 1.0.
24
+
25
+ ### Key Features
26
+
27
+ - **Single-Class Focus**: Specialized in detecting only people
28
+ - **Detailed Segmentation**: Provides pixel-perfect segmentation masks
29
+ - **High Throughput**: Optimized for processing hundreds of images per minute
30
+ - **Quality-Optimized**: Trained specifically for accurate boundary delineation
31
+ - **GPU-Optimized**: The Large (L) model is designed for GPU deployment, not edge devices or mobile phones
32
+
33
+ ## Training
34
+
35
+ The model was trained on a filtered version of the COCO dataset containing only images with people:
36
+
37
+ - **Training Images**: 64,114 images containing people
38
+ - **Validation Images**: 2,693 images containing people
39
+ - **Training Details**:
40
+ - Initially trained for 100 epochs
41
+ - Extended training for additional 200 epochs (300 total)
42
+ - Input resolution: 640×640
43
+ - Class-focused optimization with `single_cls=True` and `classes=0`
44
+ - Optimized for segmentation with `overlap_mask=True` and `mask_ratio=4`
45
+ - Extended training with cosine learning rate schedule and patience=20
46
+
47
+ ## Performance
48
+
49
+ The model achieves the following metrics on the COCO person validation set:
50
+
51
+ | Metric | Value |
52
+ | ------------------------- | ---------- |
53
+ | Box mAP50-95 (COCO) | 0.628 |
54
+ | Box mAP50 (COCO) | 0.840 |
55
+ | Mask mAP50-95 | 0.524 |
56
+ | Mask mAP50 | 0.821 |
57
+ | Box Precision | 0.835 |
58
+ | Box Recall | 0.745 |
59
+ | Mask Precision | 0.843 |
60
+ | Mask Recall | 0.723 |
61
+
62
+ These metrics were computed on a validation set of 5,000 images with 10,777 instances.
63
+
64
+ ## Use Cases
65
+
66
+ This model is ideal for applications requiring precise person segmentation:
67
+
68
+ - Human-centric image editing
69
+ - Background removal focused on people
70
+ - Virtual try-on applications
71
+ - People counting and crowd analysis
72
+ - Smart surveillance systems
73
+
74
+ ## Usage
75
+
76
+ The model can be used directly with Ultralytics YOLOv8:
77
+
78
+ ```python
79
+ from ultralytics import YOLO
80
+
81
+ # Load the model
82
+ model = YOLO('path/to/yolo12l-person-seg.pt')
83
+
84
+ # Perform inference
85
+ results = model('image.jpg')
86
+
87
+ # Process results (segmentation masks and bounding boxes)
88
+ for result in results:
89
+ boxes = result.boxes # Tensor operations can be performed on boxes
90
+ masks = result.masks # Segmentation masks
91
+
92
+ if masks is not None:
93
+ # Process masks
94
+ for mask in masks:
95
+ # Use the mask for your application
96
+ pass
97
+ ```
98
+
99
+ For segmentation visualization:
100
+
101
+ ```python
102
+ import cv2
103
+ import numpy as np
104
+ from ultralytics import YOLO
105
+
106
+ # Load the model and image
107
+ model = YOLO('path/to/yolo12l-person-seg.pt')
108
+ image = cv2.imread('image.jpg')
109
+
110
+ # Perform inference
111
+ results = model(image)
112
+
113
+ # Process and visualize the first result
114
+ result = results[0]
115
+ if result.masks is not None:
116
+ masks = result.masks.data.cpu().numpy()
117
+ for i, mask in enumerate(masks):
118
+ # Create a colored overlay for each mask
119
+ color = [np.random.randint(0, 255) for _ in range(3)]
120
+ mask_image = np.zeros_like(image, dtype=np.uint8)
121
+ mask_image[mask.astype(bool)] = color
122
+ image = cv2.addWeighted(image, 1.0, mask_image, 0.5, 0)
123
+
124
+ # Display or save the image
125
+ cv2.imwrite('segmented_image.jpg', image)
126
+ ```
127
+
128
+ ## Limitations
129
+
130
+ - This model is optimized for person segmentation only and won't detect other classes
131
+ - Performance may be reduced in extreme lighting conditions
132
+ - Occluded persons may have incomplete segmentation masks
133
+ - Small or distant people might not be detected as reliably as those in foreground
134
+ - **GPU Recommended**: As a Large (L) model, real-time inference performance benefits from a dedicated GPU
135
+ - **Edge Device Limitations**: Not optimized for mobile or edge deployment (consider YOLO12n or YOLO12s for those use cases)
136
+
137
+ ## License
138
+
139
+ This model is available under the Apache 2.0 license.
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ ultralytics>=8.3.0
2
+ torch>=2.0.0
3
+ opencv-python>=4.7.0
4
+ numpy>=1.22.0
5
+ Pillow>=9.5.0
sample_inference.py ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # sample_inference.py
2
+ import argparse
3
+ import torch
4
+ from ultralytics import YOLO
5
+ import cv2
6
+ import numpy as np
7
+ import json
8
+ from PIL import Image
9
+
10
+ def main():
11
+ parser = argparse.ArgumentParser(description='Run person segmentation with YOLO12l-seg model')
12
+ parser.add_argument('--model', type=str, default='yolo12l-person-seg.pt', help='Model path')
13
+ parser.add_argument('--image', type=str, required=True, help='Image path for inference')
14
+ parser.add_argument('--output', type=str, default='output.jpg', help='Output visualization image path')
15
+ parser.add_argument('--json', type=str, default='detections.json', help='JSON output file for detection data')
16
+ parser.add_argument('--conf', type=float, default=0.5, help='Confidence threshold')
17
+ args = parser.parse_args()
18
+
19
+ # Load the model
20
+ model = YOLO(args.model)
21
+
22
+ # Move to appropriate device if available
23
+ if torch.cuda.is_available():
24
+ print(f"Using CUDA device: {torch.cuda.get_device_name(0)}")
25
+ model.to('cuda')
26
+ device = 'cuda'
27
+ use_half = True
28
+ elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
29
+ print("Using Apple Silicon MPS")
30
+ model.to('mps')
31
+ device = 'mps'
32
+ use_half = False
33
+ else:
34
+ print("Using CPU")
35
+ device = None
36
+ use_half = False
37
+
38
+ # Load and check input image
39
+ try:
40
+ img = Image.open(args.image)
41
+ img_width, img_height = img.size
42
+ print(f"Image dimensions: {img_width}x{img_height}")
43
+ except Exception as e:
44
+ print(f"Error opening image: {e}")
45
+ return
46
+
47
+ # Run inference
48
+ if device == 'cuda':
49
+ results = model(args.image, classes=0, conf=args.conf, device=device, half=use_half)
50
+ elif device == 'mps':
51
+ results = model(args.image, classes=0, conf=args.conf, device=device)
52
+ else:
53
+ results = model(args.image, classes=0, conf=args.conf)
54
+
55
+ # Process results
56
+ detections = []
57
+ visualization_img = cv2.imread(args.image)
58
+
59
+ for result in results:
60
+ masks = result.masks
61
+ boxes = result.boxes
62
+
63
+ if boxes is None or len(boxes) == 0:
64
+ print("No people detected in the image")
65
+ return
66
+
67
+ person_count = len(boxes)
68
+ print(f"Detected {person_count} people")
69
+
70
+ # Visualize and extract data
71
+ if masks is not None:
72
+ for i, (mask, box) in enumerate(zip(masks.xy, boxes)):
73
+ confidence = float(box.conf[0])
74
+ x1, y1, x2, y2 = map(int, box.xyxy[0])
75
+
76
+ # Extract mask points
77
+ polygon_points = mask.tolist()
78
+
79
+ # Calculate percentages of image dimensions
80
+ x_coords = [point[0] for point in polygon_points]
81
+ y_coords = [point[1] for point in polygon_points]
82
+ min_x, max_x = min(x_coords), max(x_coords)
83
+ min_y, max_y = min(y_coords), max(y_coords)
84
+ width_pct = (max_x - min_x) / img_width
85
+ height_pct = (max_y - min_y) / img_height
86
+
87
+ # Create detection record
88
+ detection = {
89
+ "id": i,
90
+ "confidence": confidence,
91
+ "box": [x1, y1, x2, y2],
92
+ "points": polygon_points,
93
+ "width_pct": width_pct,
94
+ "height_pct": height_pct,
95
+ }
96
+ detections.append(detection)
97
+
98
+ # Draw bounding box
99
+ cv2.rectangle(visualization_img, (x1, y1), (x2, y2), (0, 255, 0), 2)
100
+ cv2.putText(visualization_img, f'Person: {confidence:.2f}', (x1, y1 - 10),
101
+ cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
102
+
103
+ # Draw segmentation mask
104
+ color_mask = np.zeros_like(visualization_img, dtype=np.uint8)
105
+ mask_points = np.array(polygon_points, dtype=np.int32)
106
+ cv2.fillPoly(color_mask, [mask_points], (0, 0, 255))
107
+
108
+ # Blend the mask with the original image
109
+ visualization_img = cv2.addWeighted(visualization_img, 1.0, color_mask, 0.5, 0)
110
+
111
+ # Save visualization
112
+ cv2.imwrite(args.output, visualization_img)
113
+ print(f"Visualization saved to {args.output}")
114
+
115
+ # Save detection data to JSON
116
+ with open(args.json, 'w') as f:
117
+ json.dump({
118
+ "person_count": person_count,
119
+ "detections": detections
120
+ }, f, indent=4)
121
+ print(f"Detection data saved to {args.json}")
122
+
123
+ if __name__ == "__main__":
124
+ main()
yolo12l-person-seg.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abc090155cfe7a883fcc613868f482fa7db04ea67a6b4366c58c07deaa4c2ba1
3
+ size 58148802