Toxic-Predict

Toxic-Predict is a machine learning project developed as part of the Cellula Internship, focused on safe and responsible multi-modal toxic content moderation. It classifies text queries and image descriptions into nine toxicity categories such as "Safe", "Violent Crimes", "Non-Violent Crimes", "Unsafe", and others. The project leverages deep learning (Keras/TensorFlow), NLP preprocessing, and benchmarking with modern transformer models to build and evaluate a robust multi-class toxic content classifier.

🚩 Project Context

This project is part of the Cellula Internship proposal:
"Safe and Responsible Multi-Modal Toxic Content Moderation"
The goal is to build a dual-stage moderation pipeline for both text and images, combining hard guardrails (Llama Guard) and soft classification (DistilBERT/Deep Learning) for nuanced, policy-compliant moderation.

Features

Dual-stage moderation: hard filter (Llama Guard) + soft classifier (DistilBERT/CNN/LSTM)
Data cleaning, preprocessing, and label encoding
Tokenization and sequence padding for text data
Deep learning and transformer-based models for multi-class toxicity classification
Evaluation metrics: classification report and confusion matrix
Jupyter notebooks for data exploration and model development
Streamlit web app for demo and deployment

Usage

Preprocessing and Tokenization:
See notebooks/Preprocessing.ipynb and notebooks/tokenization.ipynb for step-by-step data cleaning, splitting, and tokenization.
Model Training:
Model architecture and training code are in models/model.py.
Inference:
Load the trained model (models/toxic_classifier.h5 or .keras) and tokenizer (data/tokenizer.pkl) to predict toxicity categories for new samples.

Data

CSV files with columns: query, image descriptions, Toxic Category, and Toxic Category Encoded.
Data splits: train.csv, eval.csv, test.csv, and cleaned.csv for processed data.
9 categories: Safe, Violent Crimes, Elections, Sex-Related Crimes, Unsafe, Non-Violent Crimes, Child Sexual Exploitation, Unknown S-Type, Suicide & Self-Harm.

Model

Deep learning model built with Keras (TensorFlow backend).
Multi-class classification with label encoding for toxicity categories.
Benchmarking with PEFT-LoRA DistilBERT and baseline CNN/LSTM.

Evaluation

Classification report and confusion matrix are generated for model evaluation.
See the evaluation steps in notebooks/Preprocessing.ipynb.

language: en

🤗 Hugging Face Inference

This model is available on the Hugging Face Hub: NightPrince/Toxic_Classification

Inference API Usage

You can use the Hugging Face Inference API or widget with two fields:

text: The main query or post text
image_desc: The image description (if any)

Example (Python):

from huggingface_hub import InferenceClient
client = InferenceClient("NightPrince/Toxic_Classification")
result = client.text_classification({
    "text": "This is a dangerous post",
    "image_desc": "Knife shown in the image"
})
print(result)  # {'label': 'toxic', 'score': 0.98}

Custom Pipeline Details

The model uses a custom pipeline.py for multi-input inference.
The output is a dictionary with the predicted label (class name) and score (confidence).
Class names are mapped using label_map.json.

Files in the repo:

pipeline.py (custom inference logic)
tokenizer.json (Keras tokenizer)
label_map.json (class code to name mapping)
TensorFlow SavedModel files (saved_model.pb, variables/)

Requirements:

tensorflow
keras
numpy

📚 Resources

License

MIT License

Author: Yahya Muhammad Alnwsany
Contact: [email protected]
Portfolio: https://nightprincey.github.io/Portfolio/

NightPrince
/

Toxic_Classification