YOLO11m Widget Detector

YOLO11m Widget Detector is a 20.1 million parameter object detector trained on the dataset from the paper CommonForms: A Large, Diverse Dataset for Form Field Detection. The model detects widgets from among three classes: TextBoxes (text_input), ChoiceButtons (choice_button / checkboxes), and Signature fields (signature).

Results

Model	Text	Choice	Signature	mAP@50 (↑)
YOLO11m v3 (1024px)	81.4	70.9	83.8	78.7
YOLO11m v4 (1024px)	83.9	72.1	86.6	80.9

Installation

The psynx-widget-detector package can be installed with either uv or pip, feel free to choose your package manager flavor. The uv command:

uv pip install psynx-widget-detector

The pip command:

pip install psynx-widget-detector

Once it's installed, you should be able to run inference on ~any PDF.

Python API

The simplest usage will run inference using the default suggested settings. The model weights will automatically download from Hugging Face on your first run.

from widget_detector import WidgetDetector

# Initialize the detector
# (Downloads PSynx/widget-detector-yolo automatically)
detector = WidgetDetector(
    conf=0.25,        # Confidence threshold
    iou=0.45,         # NMS IoU threshold
    imgsz=1024,       # Inference resolution
    device="cpu"      # "cuda" for GPU, "cpu" for CPU
)

# Process a PDF or Image
result = detector.detect_path("input.pdf")

# Print results
for page in result.pages:
    print(f"Page {page.page}: Found {len(page.widgets)} widgets")
    for w in page.widgets:
        print(f" - {w.class_name} ({w.confidence:.2f})")

# Save output to JSON
result.save("output.json")

Example Output

Here is an example of the model's output on a sample document:

References

CommonForms: A Large, Diverse Dataset for Form Field Detection

Downloads last month: 22