File size: 6,215 Bytes

080fac0
 
d944395
080fac0
 
d944395
080fac0
d944395
080fac0
 
 
 
 
d944395
080fac0
d944395
 
 
 
 
 
 
 
 
080fac0
03c6ea9
d944395
080fac0
d944395
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
080fac0
d944395
080fac0
d944395
 
 
 
 
 
 
 
 
 
 
 
 
080fac0
03c6ea9
 
 
 
080fac0
03c6ea9
 
 
080fac0
03c6ea9
 
 
 
d944395
 
 
 
03c6ea9
d944395
 
 
03c6ea9
080fac0
 
 
d944395
 
 
 
080fac0
d944395
080fac0
d944395
080fac0
d944395
 
080fac0
d944395
080fac0
d944395
03c6ea9
d944395
 
 
080fac0
d944395
080fac0
d944395
 
 
080fac0
d944395
080fac0
d944395
080fac0
d944395
080fac0
d944395
 
080fac0
d944395
080fac0
d944395
 
 
080fac0
d944395
080fac0
d944395
080fac0
d944395
080fac0
d944395
 
 
080fac0
d944395
080fac0
d944395
 
 
 
 
080fac0
 
 
d944395
 
 
 
03c6ea9
d944395
 
080fac0
d944395
080fac0
d944395
080fac0
d944395
 
080fac0
d944395
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
03c6ea9
d944395
03c6ea9
 
d944395
 
03c6ea9
 
080fac0
d944395
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
080fac0
d944395
080fac0
d944395
080fac0
03c6ea9
080fac0
d944395
03c6ea9

---
library_name: transformers
tags: [text-classification, bert, bullying-detection, hate-speech, social-good]
---

# Model Card for Davephoenix/bert-bullying-detector

A BERT-based binary classifier that detects whether a given English text contains bullying content or not. It is fine-tuned for use in moderation tools, education platforms, and social media analysis.

## Model Details

### Model Description

This model is based on `bert-base-uncased` and fine-tuned for binary text classification. The goal is to distinguish between bullying and non-bullying text, providing a tool to support online safety and moderation.

- **Developed by:** Davephoenix
- **Funded by [optional]:** Independent project
- **Shared by [optional]:** Davephoenix
- **Model type:** Text classification (binary)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model [optional]:** bert-base-uncased

### Model Sources [optional]

- **Repository:** [https://huggingface.co/Davephoenix/bert-bullying-detector](https://huggingface.co/Davephoenix/bert-bullying-detector)
- **Demo [optional]:** API in progress

## Uses

### Direct Use

- Used for classifying short- to medium-length English text as "Bullying" or "Not Bullying".
- Can be integrated into moderation tools, educational apps, or awareness platforms.

### Downstream Use [optional]

- As a building block in broader moderation or digital well-being systems.
- Further fine-tuning possible for specific platforms/domains.

### Out-of-Scope Use

- Multilingual or non-English bullying detection.
- Misuse in legal or disciplinary decision-making without human oversight.
- Inference on sarcasm, coded language, or highly contextual text may be unreliable.

## Bias, Risks, and Limitations

The model may exhibit limitations in:

- Cultural or contextual understanding of bullying.
- Identifying subtle or sarcastic forms of harassment.
- False positives in emotionally intense or confrontational but non-abusive language.

### Recommendations

Users (both direct and downstream) should:

- Use the model alongside human review, especially in sensitive domains.
- Avoid deploying in high-stakes environments without thorough testing.
- Consider domain-specific fine-tuning if used outside general English online text.

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

model_name = "Davephoenix/bert-bullying-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def classify_text(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    probs = F.softmax(outputs.logits, dim=1)
    pred = torch.argmax(probs, dim=1).item()
    return pred, probs[0][pred].item()

label_map = {0: "Not Bullying", 1: "Bullying"}
text = "You are so dumb and nobody likes you."
pred, confidence = classify_text(text)
print(f"Prediction: {label_map[pred]} (Confidence: {confidence:.2f})")
````

## Training Details

### Training Data

* Approximately 20,000 English text samples labeled as "bullying" or "not bullying"
* Balanced dataset curated from public moderation datasets and synthetic augmentation

### Training Procedure

#### Preprocessing \[optional]

* Tokenized using `bert-base-uncased` tokenizer
* Truncation and padding to max\_length of 128 tokens

#### Training Hyperparameters

* **Training regime:** fp16 mixed precision
* **Epochs:** 3
* **Batch size:** 32
* **Optimizer:** AdamW with linear warmup
* **Learning rate:** 2e-5

#### Speeds, Sizes, Times \[optional]

* **Training time:** \~5 hours on Kaggle GPU
* **Model size:** \~420MB
* **Final Checkpoint:** `checkpoint-34371`

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

* 10% hold-out split from the training set
* Similar distribution to training data

#### Factors

* Sentence structure
* Presence of explicit abusive terms
* Subtlety of intent

#### Metrics

* Accuracy, F1 score, Loss

### Results

* **Accuracy:** 95.6%
* **F1 Score:** 95.6%
* **Validation Loss:** 0.151

#### Summary

The model performs well for binary classification of bullying vs. non-bullying on general English text. Performance may degrade on ambiguous or culturally nuanced examples.

## Model Examination \[optional]

\[More Information Needed]

## Environmental Impact

Carbon emissions estimated via [ML CO2 calculator](https://mlco2.github.io/impact):

* **Hardware Type:** NVIDIA P100
* **Hours used:** \~5
* **Cloud Provider:** Kaggle
* **Compute Region:** North America
* **Carbon Emitted:** < 2 kg CO₂

## Technical Specifications \[optional]

### Model Architecture and Objective

* **Architecture:** BERT base uncased (12-layer, 768-hidden, 12-heads, 110M parameters)
* **Objective:** Binary sequence classification with cross-entropy loss

### Compute Infrastructure

#### Hardware

* Kaggle P100 GPU (free tier)

#### Software

* `transformers` 4.39.3
* `datasets` 2.19.1
* Python 3.11
* PyTorch 2.x

## Citation \[optional]

**BibTeX:**

```bibtex
@misc{bert-bullying-detector,
  title={BERT Bullying Detector},
  author={Davephoenix},
  year={2025},
  note={Fine-tuned BERT for binary text classification (bullying detection)},
  howpublished={\url{https://huggingface.co/Davephoenix/bert-bullying-detector}}
}
```

**APA:**

Davephoenix. (2025). *BERT Bullying Detector* \[Computer software]. Hugging Face. [https://huggingface.co/Davephoenix/bert-bullying-detector](https://huggingface.co/Davephoenix/bert-bullying-detector)

## Glossary \[optional]

* **BERT:** Bidirectional Encoder Representations from Transformers
* **FP16:** 16-bit floating point precision
* **F1 Score:** Harmonic mean of precision and recall

## More Information \[optional]

To request the training notebook or API wrapper, please contact the model author.

## Model Card Authors \[optional]

* Davephoenix

## Model Card Contact

* [https://huggingface.co/Davephoenix](https://huggingface.co/Davephoenix)

```

Let me know if you'd like this pushed directly to the Hub or edited from the UI.
```