Anime Style Classifier - EfficientNet-B0

A fine-tuned EfficientNet-B0 model for classifying anime/visual novel images into 6 distinct art styles.

Model Description

  • Model Architecture: EfficientNet-B0 (~5.3M parameters)
  • Base Model: ImageNet pretrained weights
  • Task: Multi-class image classification (6 styles)
  • Input Resolution: 224x224 RGB
  • Framework: PyTorch
  • License: MIT

Performance

Test Set Results (Holdout)

  • Accuracy: 100.0%
  • Macro F1-Score: 1.000
  • Validation Accuracy: 98.18%

Perfect classification across all 120 holdout images (20 per class). Note: with n=120 the 95% Wilson confidence interval for this result is approximately 96.90%β€”100.00%, so the perfect score should be interpreted cautiously alongside validation metrics. Taking both validation and holdout into account, a realistic estimate of the model's true accuracy is likely in the mid-to-high 90s (β‰ˆ96–98%) β€” still very strong and, for most applications, likely fit for purpose.

Per-Class Performance

Style Precision Recall F1-Score Support
dark 1.000 1.000 1.000 20
flat 1.000 1.000 1.000 20
modern 1.000 1.000 1.000 20
moe 1.000 1.000 1.000 20
painterly 1.000 1.000 1.000 20
retro 1.000 1.000 1.000 20

Style Definitions

  1. dark: Low-key lighting, chiaroscuro, desaturated palette, high contrast shadows, moody atmosphere
  2. flat: Minimalist flat colors, vector illustration, solid color blocks, no gradients or shading
  3. modern: Clean digital rendering, smooth gradients, glossy finish, contemporary anime aesthetic
  4. moe: Soft pastel colors, rounded features, cute/adorable character focus, gentle shading
  5. painterly: Watercolor or gouache appearance, visible brush strokes, paper texture, artistic feel
  6. retro: 80s/90s anime aesthetic, vintage color palette, classic cel animation style

Training Details

Dataset

  • Training Images: 933 (scene-level split)
  • Validation Images: 165
  • Holdout Images: 120
  • Total Scenes: 203 perfectly balanced scenes
  • Images per Style: 183 training + 20 holdout = 203 each
  • Source Resolution: 1920x1088
  • Training Resolution: 224x224

Data Split Strategy: Scene-level 90/10 split to prevent data leakage. All 6 style variants of each scene are kept together in either training or holdout set.

Data Generation: Synthetic images generated via ComfyUI with Flux diffusion model, validated by Gemma-12B vision-language model. Only scenes with 6/6 style agreement (all variants correctly classified) were included.

Training Regime

Architecture: EfficientNet-B0
Pretrained: ImageNet weights
Optimizer: AdamW
Learning Rate: 0.001
Weight Decay: 1e-05
Batch Size: 16
Epochs: 30 (early stopping at ~12-15 epochs typical)
Scheduler: CosineAnnealingLR
Loss: CrossEntropyLoss
Early Stopping: 10 epochs patience (val accuracy)

Data Augmentation (Training Only)

  • Resize to 256x256
  • Random crop to 224x224
  • Random horizontal flip (p=0.5)
  • Color jitter (brightness=0.1, saturation=0.1, hue=0.05)
  • ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

Hardware

  • GPU: NVIDIA GPU (CUDA)
  • Training Time: ~15 minutes (with early stopping)

Usage

Installation

pip install torch torchvision pillow

Inference

This repository includes a small CLI inference script, inference.py, which auto-detects .safetensors (preferred) or a PyTorch .pth checkpoint and provides a convenient command-line interface for classification. Because inference.py already contains the full, tested loading and preprocessing logic, the README keeps only the minimal usage notes below and a short programmatic example that delegates to the script's functions.

Install (optional: include safetensors for safer loading):

pip install torch torchvision pillow safetensors

CLI usage (example):

python inference.py --model model.safetensors --config config.json examples/retro_1.png

This will print a ranked list of predictions and a top prediction summary (see the script output example above).

Programmatic usage (calls the same functions used by the CLI):

# Minimal programmatic example using functions from inference.py
from inference import load_model, classify_image

model, config = load_model('model.safetensors', 'config.json')
results = classify_image(model, config, 'examples/retro_1.png')
print(results[:3])  # top-3 predictions as (style, confidence)

For the full implementation and additional options (e.g., --top-k), see inference.py in the repository.

Limitations

  • Input Resolution: Model processes images at 224x224, which may lose fine texture details from high-resolution sources (1920x1088+)
  • Domain: Trained on synthetically generated anime/visual novel images. May not generalize perfectly to all anime art styles, manga, or hand-drawn artwork
  • Style Ambiguity: Some real-world images may blend multiple styles (e.g., painterly with modern digital techniques) -- Validation Bias: Ground truth labels come from Gemma-12B vision model, so classifier may inherit some of its biases

Small-sample caution: The internal validation set achieved 98.18% (162/165). The 95% Wilson confidence interval for this is approximately 94.79%β€”99.38%. Because the holdout set is relatively small (20 images per class, 120 total), perfect classification on that set is possible by chance and should be reported with its confidence interval (see above).

Decision rule note: The model uses the standard softmax + argmax decision rule by default (choose the class with highest predicted probability). No abstain threshold is applied in the shipped inference.py; if you later want an abstain/human-review mode, adding a --min-conf option is straightforward.

Model Selection

This model was selected from a hyperparameter sweep of 144+ configurations across 6 architectures:

  • ResNet-18
  • MobileNetV3-Large
  • MobileNetV3-Small
  • EfficientNet-B0 ⭐ (winner)
  • EfficientNetV2-S
  • ViT-B/16

EfficientNet-B0 achieved perfect 100% holdout accuracy with:

  • Excellent efficiency (~5.3M parameters)
  • Fast inference
  • Strong generalization (98.18% val β†’ 100% holdout)

Citation

@software{anime_style_classifier_2025,
  author = {Your Name},
  title = {Anime Style Classifier},
  year = {2025},
  url = {https://huggingface.co/Mitchins/anime-style-classifier-efficientnet-b0}
}

Acknowledgments

  • Base Model: EfficientNet-B0 from torchvision (ImageNet pretrained)
  • Synthetic Data Generation: ComfyUI + Flux diffusion model
  • Data Validation: Gemma-12B vision-language model
  • Framework: PyTorch, torchvision

Contact

For questions or feedback, please open an issue on the GitHub repository.

Published validation preview

Below is a compact thumbnail-based preview of a subset of the published validation images. Thumbnails link to the full-size images under validation/images/.

Modern Painterly Retro Moe Flat Dark
modern_bookstore_chance_meeting.png painterly_bookstore_chance_meeting.png retro_bookstore_chance_meeting.png moe_bookstore_chance_meeting.png flat_bookstore_chance_meeting.png dark_bookstore_chance_meeting.png
modern_campus_courtyard_rain.png painterly_campus_courtyard_rain.png retro_campus_courtyard_rain.png moe_campus_courtyard_rain.png flat_campus_courtyard_rain.png dark_campus_courtyard_rain.png
modern_cloud_port_landing_pads.png painterly_cloud_port_landing_pads.png retro_cloud_port_landing_pads.png moe_cloud_port_landing_pads.png flat_cloud_port_landing_pads.png dark_cloud_port_landing_pads.png
modern_coral_archway_shore.png painterly_coral_archway_shore.png retro_coral_archway_shore.png moe_coral_archway_shore.png flat_coral_archway_shore.png dark_coral_archway_shore.png
modern_fairy_ring_meadow.png painterly_fairy_ring_meadow.png retro_fairy_ring_meadow.png moe_fairy_ring_meadow.png flat_fairy_ring_meadow.png dark_fairy_ring_meadow.png
modern_fey_crossing_footbridge.png painterly_fey_crossing_footbridge.png retro_fey_crossing_footbridge.png moe_fey_crossing_footbridge.png flat_fey_crossing_footbridge.png dark_fey_crossing_footbridge.png
modern_holo_library_steps.png painterly_holo_library_steps.png retro_holo_library_steps.png moe_holo_library_steps.png flat_holo_library_steps.png dark_holo_library_steps.png
modern_lantern_catacombs.png painterly_lantern_catacombs.png retro_lantern_catacombs.png moe_lantern_catacombs.png flat_lantern_catacombs.png dark_lantern_catacombs.png
modern_meteor_defense_bunker.png painterly_meteor_defense_bunker.png retro_meteor_defense_bunker.png moe_meteor_defense_bunker.png flat_meteor_defense_bunker.png dark_meteor_defense_bunker.png
modern_night_market_skewers.png painterly_night_market_skewers.png retro_night_market_skewers.png moe_night_market_skewers.png flat_night_market_skewers.png dark_night_market_skewers.png
modern_park_bench_shared_earbuds.png painterly_park_bench_shared_earbuds.png retro_park_bench_shared_earbuds.png moe_park_bench_shared_earbuds.png flat_park_bench_shared_earbuds.png dark_park_bench_shared_earbuds.png
modern_rain_on_train_crossing.png painterly_rain_on_train_crossing.png retro_rain_on_train_crossing.png moe_rain_on_train_crossing.png flat_rain_on_train_crossing.png dark_rain_on_train_crossing.png

Note about thumbnails and deployment

Thumbnails are provided only for the convenience of the model README/gallery and are optional. When you publish the model to Hugging Face you may omit validation/thumbs/ (it's listed in .gitignore) β€” the important artifacts for deployment are model.safetensors and config.json (and inference.py if you want a runnable CLI). Typical deployments will only fetch the model weights and config; README/gallery images are not required for inference and can be excluded from the model archive to keep downloads small.

Downloads last month
136
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Mitchins/anime-style-classifier-efficientnet-b0

Finetuned
(34)
this model

Evaluation results