Papers
arxiv:2507.21364

Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers

Published on Jul 28
· Submitted by lukmanaj on Jul 30
Authors:
,
,
,
,
,
,

Abstract

A comparative study evaluates deep learning models for African wildlife image classification, highlighting trade-offs between accuracy, resource requirements, and deployability, with DenseNet-201 and ViT-H/14 performing best among convolutional networks and transformers, respectively.

AI-generated summary

Wildlife populations in Africa face severe threats, with vertebrate numbers declining by over 65% in the past five decades. In response, image classification using deep learning has emerged as a promising tool for biodiversity monitoring and conservation. This paper presents a comparative study of deep learning models for automatically classifying African wildlife images, focusing on transfer learning with frozen feature extractors. Using a public dataset of four species: buffalo, elephant, rhinoceros, and zebra; we evaluate the performance of DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer ViT-H/14. DenseNet-201 achieved the best performance among convolutional networks (67% accuracy), while ViT-H/14 achieved the highest overall accuracy (99%), but with significantly higher computational cost, raising deployment concerns. Our experiments highlight the trade-offs between accuracy, resource requirements, and deployability. The best-performing CNN (DenseNet-201) was integrated into a Hugging Face Gradio Space for real-time field use, demonstrating the feasibility of deploying lightweight models in conservation settings. This work contributes to African-grounded AI research by offering practical insights into model selection, dataset preparation, and responsible deployment of deep learning tools for wildlife conservation.

Community

Paper author Paper submitter

We present a comparative evaluation of deep learning architectures for African wildlife image classification, focusing on four species (buffalo, elephant, rhinoceros, zebra) from a balanced public dataset. We benchmark DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer (ViT-H/14) using transfer learning with frozen features. ViT-H/14 achieves the highest accuracy (99%) but is computationally expensive. DenseNet-201 offers the best trade-off between accuracy (67%) and deployability, and is deployed as a real-time Hugging Face Gradio Space for conservation use. This work contributes to Africa-grounded AI by addressing ethical deployment, domain shift, and lightweight modeling for ecological monitoring in resource-constrained environments.

@librarian-bot what's your take on this paper?

How does one apply this in the real world to at least add some changes to the environment?

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.21364 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.21364 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.21364 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.