MAE Waste Classifier

A finetuned MAE (Masked Autoencoder) ViT-Base model for waste classification achieving 93.27% validation accuracy on 9 waste categories.

Model Details

  • Architecture: Vision Transformer (ViT-Base) with MAE pretraining
  • Parameters: ~86M
  • Input Size: 224x224 RGB images
  • Classes: 9 waste categories
  • Validation Accuracy: 93.27%

Categories

  1. Cardboard - Flatten and place in recycling bin. Remove any tape or staples.
  2. Food Organics - Compost in organic waste bin or home composter.
  3. Glass - Rinse and place in glass recycling. Remove lids and caps.
  4. Metal - Rinse aluminum/steel cans and place in recycling bin.
  5. Miscellaneous Trash - Dispose in general waste bin. Cannot be recycled.
  6. Paper - Place clean paper in recycling. Remove plastic windows from envelopes.
  7. Plastic - Check recycling number. Rinse containers before recycling.
  8. Textile Trash - Donate if reusable, otherwise dispose in textile recycling.
  9. Vegetation - Compost in organic waste or use for mulch in garden.

Usage

import torch
import timm
from PIL import Image
from torchvision import transforms

# Load model
model = timm.create_model('vit_base_patch16_224', pretrained=False, num_classes=9)
checkpoint = torch.load('best_model.pth', map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Preprocessing
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Inference
image = Image.open('waste_item.jpg').convert('RGB')
input_tensor = transform(image).unsqueeze(0)

with torch.no_grad():
    outputs = model(input_tensor)
    probabilities = torch.nn.functional.softmax(outputs, dim=1)
    predicted_class = torch.argmax(probabilities, dim=1).item()

categories = ['Cardboard', 'Food Organics', 'Glass', 'Metal', 'Miscellaneous Trash', 'Paper', 'Plastic', 'Textile Trash', 'Vegetation']
print(f"Predicted: {categories[predicted_class]}")

Training Details

  • Dataset: RealWaste (4,752 images)
  • Pretraining: MAE on ImageNet
  • Finetuning: 15 epochs on RealWaste
  • Optimizer: AdamW
  • Hardware: NVIDIA RTX 3080 Ti

Performance

  • Validation Accuracy: 93.27%
  • Training Accuracy: 99.89%
  • Model Size: ~350MB
  • Inference Speed: ~50ms per image (GPU)

Environmental Impact

This model helps improve recycling efficiency by providing accurate waste classification and proper disposal instructions.

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using ysfad/mae-waste-classifier 1

Evaluation results