Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers
Abstract
A comparative study evaluates deep learning models for African wildlife image classification, highlighting trade-offs between accuracy, resource requirements, and deployability, with DenseNet-201 and ViT-H/14 performing best among convolutional networks and transformers, respectively.
Wildlife populations in Africa face severe threats, with vertebrate numbers declining by over 65% in the past five decades. In response, image classification using deep learning has emerged as a promising tool for biodiversity monitoring and conservation. This paper presents a comparative study of deep learning models for automatically classifying African wildlife images, focusing on transfer learning with frozen feature extractors. Using a public dataset of four species: buffalo, elephant, rhinoceros, and zebra; we evaluate the performance of DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer ViT-H/14. DenseNet-201 achieved the best performance among convolutional networks (67% accuracy), while ViT-H/14 achieved the highest overall accuracy (99%), but with significantly higher computational cost, raising deployment concerns. Our experiments highlight the trade-offs between accuracy, resource requirements, and deployability. The best-performing CNN (DenseNet-201) was integrated into a Hugging Face Gradio Space for real-time field use, demonstrating the feasibility of deploying lightweight models in conservation settings. This work contributes to African-grounded AI research by offering practical insights into model selection, dataset preparation, and responsible deployment of deep learning tools for wildlife conservation.
Community
We present a comparative evaluation of deep learning architectures for African wildlife image classification, focusing on four species (buffalo, elephant, rhinoceros, zebra) from a balanced public dataset. We benchmark DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer (ViT-H/14) using transfer learning with frozen features. ViT-H/14 achieves the highest accuracy (99%) but is computationally expensive. DenseNet-201 offers the best trade-off between accuracy (67%) and deployability, and is deployed as a real-time Hugging Face Gradio Space for conservation use. This work contributes to Africa-grounded AI by addressing ethical deployment, domain shift, and lightweight modeling for ecological monitoring in resource-constrained environments.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AQUA20: A Benchmark Dataset for Underwater Species Classification under Challenging Conditions (2025)
- Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification (2025)
- Crop Pest Classification Using Deep Learning Techniques: A Review (2025)
- Jellyfish Species Identification: A CNN Based Artificial Neural Network Approach (2025)
- Classification of Tents in Street Bazaars Using CNN (2025)
- From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications (2025)
- FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
@librarian-bot what's your take on this paper?
How does one apply this in the real world to at least add some changes to the environment?
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper