File size: 6,215 Bytes
080fac0
 
d944395
080fac0
 
d944395
080fac0
d944395
080fac0
 
 
 
 
d944395
080fac0
d944395
 
 
 
 
 
 
 
 
080fac0
03c6ea9
d944395
080fac0
d944395
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
080fac0
d944395
080fac0
d944395
 
 
 
 
 
 
 
 
 
 
 
 
080fac0
03c6ea9
 
 
 
080fac0
03c6ea9
 
 
080fac0
03c6ea9
 
 
 
d944395
 
 
 
03c6ea9
d944395
 
 
03c6ea9
080fac0
 
 
d944395
 
 
 
080fac0
d944395
080fac0
d944395
080fac0
d944395
 
080fac0
d944395
080fac0
d944395
03c6ea9
d944395
 
 
080fac0
d944395
080fac0
d944395
 
 
080fac0
d944395
080fac0
d944395
080fac0
d944395
080fac0
d944395
 
080fac0
d944395
080fac0
d944395
 
 
080fac0
d944395
080fac0
d944395
080fac0
d944395
080fac0
d944395
 
 
080fac0
d944395
080fac0
d944395
 
 
 
 
080fac0
 
 
d944395
 
 
 
03c6ea9
d944395
 
080fac0
d944395
080fac0
d944395
080fac0
d944395
 
080fac0
d944395
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
03c6ea9
d944395
03c6ea9
 
d944395
 
03c6ea9
 
080fac0
d944395
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
080fac0
d944395
080fac0
d944395
080fac0
03c6ea9
080fac0
d944395
03c6ea9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
library_name: transformers
tags: [text-classification, bert, bullying-detection, hate-speech, social-good]
---

# Model Card for Davephoenix/bert-bullying-detector

A BERT-based binary classifier that detects whether a given English text contains bullying content or not. It is fine-tuned for use in moderation tools, education platforms, and social media analysis.

## Model Details

### Model Description

This model is based on `bert-base-uncased` and fine-tuned for binary text classification. The goal is to distinguish between bullying and non-bullying text, providing a tool to support online safety and moderation.

- **Developed by:** Davephoenix
- **Funded by [optional]:** Independent project
- **Shared by [optional]:** Davephoenix
- **Model type:** Text classification (binary)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model [optional]:** bert-base-uncased

### Model Sources [optional]

- **Repository:** [https://huggingface.co/Davephoenix/bert-bullying-detector](https://huggingface.co/Davephoenix/bert-bullying-detector)
- **Demo [optional]:** API in progress

## Uses

### Direct Use

- Used for classifying short- to medium-length English text as "Bullying" or "Not Bullying".
- Can be integrated into moderation tools, educational apps, or awareness platforms.

### Downstream Use [optional]

- As a building block in broader moderation or digital well-being systems.
- Further fine-tuning possible for specific platforms/domains.

### Out-of-Scope Use

- Multilingual or non-English bullying detection.
- Misuse in legal or disciplinary decision-making without human oversight.
- Inference on sarcasm, coded language, or highly contextual text may be unreliable.

## Bias, Risks, and Limitations

The model may exhibit limitations in:

- Cultural or contextual understanding of bullying.
- Identifying subtle or sarcastic forms of harassment.
- False positives in emotionally intense or confrontational but non-abusive language.

### Recommendations

Users (both direct and downstream) should:

- Use the model alongside human review, especially in sensitive domains.
- Avoid deploying in high-stakes environments without thorough testing.
- Consider domain-specific fine-tuning if used outside general English online text.

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

model_name = "Davephoenix/bert-bullying-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def classify_text(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    probs = F.softmax(outputs.logits, dim=1)
    pred = torch.argmax(probs, dim=1).item()
    return pred, probs[0][pred].item()

label_map = {0: "Not Bullying", 1: "Bullying"}
text = "You are so dumb and nobody likes you."
pred, confidence = classify_text(text)
print(f"Prediction: {label_map[pred]} (Confidence: {confidence:.2f})")
````

## Training Details

### Training Data

* Approximately 20,000 English text samples labeled as "bullying" or "not bullying"
* Balanced dataset curated from public moderation datasets and synthetic augmentation

### Training Procedure

#### Preprocessing \[optional]

* Tokenized using `bert-base-uncased` tokenizer
* Truncation and padding to max\_length of 128 tokens

#### Training Hyperparameters

* **Training regime:** fp16 mixed precision
* **Epochs:** 3
* **Batch size:** 32
* **Optimizer:** AdamW with linear warmup
* **Learning rate:** 2e-5

#### Speeds, Sizes, Times \[optional]

* **Training time:** \~5 hours on Kaggle GPU
* **Model size:** \~420MB
* **Final Checkpoint:** `checkpoint-34371`

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

* 10% hold-out split from the training set
* Similar distribution to training data

#### Factors

* Sentence structure
* Presence of explicit abusive terms
* Subtlety of intent

#### Metrics

* Accuracy, F1 score, Loss

### Results

* **Accuracy:** 95.6%
* **F1 Score:** 95.6%
* **Validation Loss:** 0.151

#### Summary

The model performs well for binary classification of bullying vs. non-bullying on general English text. Performance may degrade on ambiguous or culturally nuanced examples.

## Model Examination \[optional]

\[More Information Needed]

## Environmental Impact

Carbon emissions estimated via [ML CO2 calculator](https://mlco2.github.io/impact):

* **Hardware Type:** NVIDIA P100
* **Hours used:** \~5
* **Cloud Provider:** Kaggle
* **Compute Region:** North America
* **Carbon Emitted:** < 2 kg CO₂

## Technical Specifications \[optional]

### Model Architecture and Objective

* **Architecture:** BERT base uncased (12-layer, 768-hidden, 12-heads, 110M parameters)
* **Objective:** Binary sequence classification with cross-entropy loss

### Compute Infrastructure

#### Hardware

* Kaggle P100 GPU (free tier)

#### Software

* `transformers` 4.39.3
* `datasets` 2.19.1
* Python 3.11
* PyTorch 2.x

## Citation \[optional]

**BibTeX:**

```bibtex
@misc{bert-bullying-detector,
  title={BERT Bullying Detector},
  author={Davephoenix},
  year={2025},
  note={Fine-tuned BERT for binary text classification (bullying detection)},
  howpublished={\url{https://huggingface.co/Davephoenix/bert-bullying-detector}}
}
```

**APA:**

Davephoenix. (2025). *BERT Bullying Detector* \[Computer software]. Hugging Face. [https://huggingface.co/Davephoenix/bert-bullying-detector](https://huggingface.co/Davephoenix/bert-bullying-detector)

## Glossary \[optional]

* **BERT:** Bidirectional Encoder Representations from Transformers
* **FP16:** 16-bit floating point precision
* **F1 Score:** Harmonic mean of precision and recall

## More Information \[optional]

To request the training notebook or API wrapper, please contact the model author.

## Model Card Authors \[optional]

* Davephoenix

## Model Card Contact

* [https://huggingface.co/Davephoenix](https://huggingface.co/Davephoenix)

```

Let me know if you'd like this pushed directly to the Hub or edited from the UI.
```