Davephoenix's picture
Update README.md
d944395 verified
metadata
library_name: transformers
tags:
  - text-classification
  - bert
  - bullying-detection
  - hate-speech
  - social-good

Model Card for Davephoenix/bert-bullying-detector

A BERT-based binary classifier that detects whether a given English text contains bullying content or not. It is fine-tuned for use in moderation tools, education platforms, and social media analysis.

Model Details

Model Description

This model is based on bert-base-uncased and fine-tuned for binary text classification. The goal is to distinguish between bullying and non-bullying text, providing a tool to support online safety and moderation.

  • Developed by: Davephoenix
  • Funded by [optional]: Independent project
  • Shared by [optional]: Davephoenix
  • Model type: Text classification (binary)
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Finetuned from model [optional]: bert-base-uncased

Model Sources [optional]

Uses

Direct Use

  • Used for classifying short- to medium-length English text as "Bullying" or "Not Bullying".
  • Can be integrated into moderation tools, educational apps, or awareness platforms.

Downstream Use [optional]

  • As a building block in broader moderation or digital well-being systems.
  • Further fine-tuning possible for specific platforms/domains.

Out-of-Scope Use

  • Multilingual or non-English bullying detection.
  • Misuse in legal or disciplinary decision-making without human oversight.
  • Inference on sarcasm, coded language, or highly contextual text may be unreliable.

Bias, Risks, and Limitations

The model may exhibit limitations in:

  • Cultural or contextual understanding of bullying.
  • Identifying subtle or sarcastic forms of harassment.
  • False positives in emotionally intense or confrontational but non-abusive language.

Recommendations

Users (both direct and downstream) should:

  • Use the model alongside human review, especially in sensitive domains.
  • Avoid deploying in high-stakes environments without thorough testing.
  • Consider domain-specific fine-tuning if used outside general English online text.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

model_name = "Davephoenix/bert-bullying-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def classify_text(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    probs = F.softmax(outputs.logits, dim=1)
    pred = torch.argmax(probs, dim=1).item()
    return pred, probs[0][pred].item()

label_map = {0: "Not Bullying", 1: "Bullying"}
text = "You are so dumb and nobody likes you."
pred, confidence = classify_text(text)
print(f"Prediction: {label_map[pred]} (Confidence: {confidence:.2f})")

Training Details

Training Data

  • Approximately 20,000 English text samples labeled as "bullying" or "not bullying"
  • Balanced dataset curated from public moderation datasets and synthetic augmentation

Training Procedure

Preprocessing [optional]

  • Tokenized using bert-base-uncased tokenizer
  • Truncation and padding to max_length of 128 tokens

Training Hyperparameters

  • Training regime: fp16 mixed precision
  • Epochs: 3
  • Batch size: 32
  • Optimizer: AdamW with linear warmup
  • Learning rate: 2e-5

Speeds, Sizes, Times [optional]

  • Training time: ~5 hours on Kaggle GPU
  • Model size: ~420MB
  • Final Checkpoint: checkpoint-34371

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • 10% hold-out split from the training set
  • Similar distribution to training data

Factors

  • Sentence structure
  • Presence of explicit abusive terms
  • Subtlety of intent

Metrics

  • Accuracy, F1 score, Loss

Results

  • Accuracy: 95.6%
  • F1 Score: 95.6%
  • Validation Loss: 0.151

Summary

The model performs well for binary classification of bullying vs. non-bullying on general English text. Performance may degrade on ambiguous or culturally nuanced examples.

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions estimated via ML CO2 calculator:

  • Hardware Type: NVIDIA P100
  • Hours used: ~5
  • Cloud Provider: Kaggle
  • Compute Region: North America
  • Carbon Emitted: < 2 kg CO₂

Technical Specifications [optional]

Model Architecture and Objective

  • Architecture: BERT base uncased (12-layer, 768-hidden, 12-heads, 110M parameters)
  • Objective: Binary sequence classification with cross-entropy loss

Compute Infrastructure

Hardware

  • Kaggle P100 GPU (free tier)

Software

  • transformers 4.39.3
  • datasets 2.19.1
  • Python 3.11
  • PyTorch 2.x

Citation [optional]

BibTeX:

@misc{bert-bullying-detector,
  title={BERT Bullying Detector},
  author={Davephoenix},
  year={2025},
  note={Fine-tuned BERT for binary text classification (bullying detection)},
  howpublished={\url{https://huggingface.co/Davephoenix/bert-bullying-detector}}
}

APA:

Davephoenix. (2025). BERT Bullying Detector [Computer software]. Hugging Face. https://huggingface.co/Davephoenix/bert-bullying-detector

Glossary [optional]

  • BERT: Bidirectional Encoder Representations from Transformers
  • FP16: 16-bit floating point precision
  • F1 Score: Harmonic mean of precision and recall

More Information [optional]

To request the training notebook or API wrapper, please contact the model author.

Model Card Authors [optional]

  • Davephoenix

Model Card Contact


Let me know if you'd like this pushed directly to the Hub or edited from the UI.