arxiv:2507.21364

Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers

Published on Jul 28

· Submitted by

lukmanaj on Jul 30

Upvote

Authors:

Lukman Jibril Aliyu ,

Abstract

A comparative study evaluates deep learning models for African wildlife image classification, highlighting trade-offs between accuracy, resource requirements, and deployability, with DenseNet-201 and ViT-H/14 performing best among convolutional networks and transformers, respectively.

AI-generated summary

Wildlife populations in Africa face severe threats, with vertebrate numbers declining by over 65% in the past five decades. In response, image classification using deep learning has emerged as a promising tool for biodiversity monitoring and conservation. This paper presents a comparative study of deep learning models for automatically classifying African wildlife images, focusing on transfer learning with frozen feature extractors. Using a public dataset of four species: buffalo, elephant, rhinoceros, and zebra; we evaluate the performance of DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer ViT-H/14. DenseNet-201 achieved the best performance among convolutional networks (67% accuracy), while ViT-H/14 achieved the highest overall accuracy (99%), but with significantly higher computational cost, raising deployment concerns. Our experiments highlight the trade-offs between accuracy, resource requirements, and deployability. The best-performing CNN (DenseNet-201) was integrated into a Hugging Face Gradio Space for real-time field use, demonstrating the feasibility of deploying lightweight models in conservation settings. This work contributes to African-grounded AI research by offering practical insights into model selection, dataset preparation, and responsible deployment of deep learning tools for wildlife conservation.

View arXiv page View PDF Add to collection

Community

lukmanaj

Paper author Paper submitter 3 days ago

We present a comparative evaluation of deep learning architectures for African wildlife image classification, focusing on four species (buffalo, elephant, rhinoceros, zebra) from a balanced public dataset. We benchmark DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer (ViT-H/14) using transfer learning with frozen features. ViT-H/14 achieves the highest accuracy (99%) but is computationally expensive. DenseNet-201 offers the best trade-off between accuracy (67%) and deployability, and is deployed as a real-time Hugging Face Gradio Space for conservation use. This work contributes to Africa-grounded AI by addressing ethical deployment, domain shift, and lightweight modeling for ecological monitoring in resource-constrained environments.