Sengil commited on
Commit
f1a5777
·
verified ·
1 Parent(s): e435084

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +164 -39
README.md CHANGED
@@ -1,9 +1,14 @@
 
1
  ---
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: answerdotai/ModernBERT-base
5
  tags:
6
  - generated_from_trainer
 
 
 
 
7
  metrics:
8
  - f1
9
  model-index:
@@ -11,58 +16,178 @@ model-index:
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
  # ModernBERT-NewsClassifier-EN-small
18
 
19
- This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the None dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 3.1201
22
- - F1: 0.5475
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- ## Model description
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- More information needed
 
27
 
28
- ## Intended uses & limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- More information needed
 
 
 
 
31
 
32
- ## Training and evaluation data
 
 
33
 
34
- More information needed
 
 
35
 
36
- ## Training procedure
37
 
38
- ### Training hyperparameters
 
 
 
39
 
40
- The following hyperparameters were used during training:
41
- - learning_rate: 5e-05
42
- - train_batch_size: 8
43
- - eval_batch_size: 4
44
- - seed: 42
45
- - gradient_accumulation_steps: 2
46
- - total_train_batch_size: 16
47
- - optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
- - lr_scheduler_type: linear
49
- - lr_scheduler_warmup_steps: 100
50
- - num_epochs: 5
51
 
52
- ### Training results
 
 
53
 
54
- | Training Loss | Epoch | Step | Validation Loss | F1 |
55
- |:-------------:|:------:|:----:|:---------------:|:------:|
56
- | 2.6251 | 1.0 | 1688 | 1.3810 | 0.5543 |
57
- | 1.9267 | 2.0 | 3376 | 1.4378 | 0.5588 |
58
- | 0.6349 | 3.0 | 5064 | 2.1705 | 0.5415 |
59
- | 0.1273 | 4.0 | 6752 | 2.9007 | 0.5402 |
60
- | 0.0288 | 4.9973 | 8435 | 3.1201 | 0.5475 |
61
 
 
 
62
 
63
- ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
- - Transformers 4.49.0.dev0
66
- - Pytorch 2.5.1+cu121
67
- - Datasets 3.2.0
68
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
1
+ ```yaml
2
  ---
3
  library_name: transformers
4
  license: apache-2.0
5
  base_model: answerdotai/ModernBERT-base
6
  tags:
7
  - generated_from_trainer
8
+ - text-classification
9
+ - news-classification
10
+ - english
11
+ - modernbert
12
  metrics:
13
  - f1
14
  model-index:
 
16
  results: []
17
  ---
18
 
 
 
 
19
  # ModernBERT-NewsClassifier-EN-small
20
 
21
+ This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an English **News Category** dataset covering 15 distinct topics (e.g., **Politics**, **Sports**, **Business**, etc.). It achieves the following results on the evaluation set:
22
+
23
+ - **Validation Loss**: `3.1201`
24
+ - **Weighted F1 Score**: `0.5475`
25
+
26
+ ---
27
+
28
+ ## Model Description
29
+
30
+ **Architecture**: This model is based on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), an advanced Transformer architecture featuring Rotary Position Embeddings (RoPE), Flash Attention, and a native long context window (up to 8,192 tokens). For the classification task, a linear classification head is added on top of the BERT encoder outputs.
31
+
32
+ **Task**: **Multi-class News Classification**
33
+ - The model classifies English news headlines or short texts into one of 15 categories.
34
+
35
+ **Use Cases**:
36
+ - Automatically tagging news headlines with appropriate categories in editorial pipelines.
37
+ - Classifying short text blurbs for social media or aggregator systems.
38
+ - Building a quick filter for content-based recommendation engines.
39
+
40
+ ---
41
+
42
+ ## Intended Uses & Limitations
43
+
44
+ - **Intended for**: Users who need to categorize short English news texts into broad topics.
45
+ - **Language**: Trained primarily on **English** texts. Performance on non-English text is not guaranteed.
46
+ - **Limitations**:
47
+ - Certain categories (e.g., `BLACK VOICES`, `QUEER VOICES`) may contain nuanced language that could lead to misclassification if context is limited or if the text is ambiguous.
48
+ - Headlines that discuss multiple topics might be assigned a single category, which could miss multi-faceted angles of the text.
49
+
50
+ ---
51
+
52
+ ## Training and Evaluation Data
53
+
54
+ - **Dataset**: Curated from an English news-category dataset with 15 labels (e.g., `POLITICS`, `ENTERTAINMENT`, `SPORTS`, `BUSINESS`, etc.).
55
+ - **Data Size**: ~30,000 samples in total, balanced at 2,000 samples per category.
56
+ - **Split**: 90% training (27,000 samples) and 10% testing (3,000 samples).
57
+
58
+ ### Categories
59
+
60
+ 1. POLITICS
61
+ 2. WELLNESS
62
+ 3. ENTERTAINMENT
63
+ 4. TRAVEL
64
+ 5. STYLE & BEAUTY
65
+ 6. PARENTING
66
+ 7. HEALTHY LIVING
67
+ 8. QUEER VOICES
68
+ 9. FOOD & DRINK
69
+ 10. BUSINESS
70
+ 11. COMEDY
71
+ 12. SPORTS
72
+ 13. BLACK VOICES
73
+ 14. HOME & LIVING
74
+ 15. PARENTS
75
+
76
+ ---
77
+
78
+ ## Training Procedure
79
+
80
+ ### Hyperparameters
81
 
82
+ | Hyperparameter | Value |
83
+ |------------------------------:|:-----------------------|
84
+ | **learning_rate** | 5e-05 |
85
+ | **train_batch_size** | 8 |
86
+ | **eval_batch_size** | 4 |
87
+ | **seed** | 42 |
88
+ | **gradient_accumulation_steps** | 2 |
89
+ | **total_train_batch_size** | 16 (8 x 2) |
90
+ | **optimizer** | `adamw_torch_fused` (betas=(0.9,0.999), epsilon=1e-08) |
91
+ | **lr_scheduler_type** | linear |
92
+ | **lr_scheduler_warmup_steps**| 100 |
93
+ | **num_epochs** | 5 |
94
 
95
+ **Optimizer**: Used `AdamW` with fused kernels (`adamw_torch_fused`) for efficiency.
96
+ **Loss Function**: Cross-entropy (with weighted F1 as metric).
97
 
98
+ ---
99
+
100
+ ## Training Results
101
+
102
+ | Training Loss | Epoch | Step | Validation Loss | F1 (Weighted) |
103
+ |:-------------:|:------:|:----:|:---------------:|:-------------:|
104
+ | 2.6251 | 1.0 | 1688 | 1.3810 | 0.5543 |
105
+ | 1.9267 | 2.0 | 3376 | 1.4378 | 0.5588 |
106
+ | 0.6349 | 3.0 | 5064 | 2.1705 | 0.5415 |
107
+ | 0.1273 | 4.0 | 6752 | 2.9007 | 0.5402 |
108
+ | 0.0288 | 4.9973 | 8435 | 3.1201 | 0.5475 |
109
+
110
+ - **Best Weighted F1** observed near the final epochs is **~0.55** on the validation set.
111
+
112
+ ---
113
+
114
+ ## Inference Example
115
+
116
+ Below are two ways to use this model: via a **pipeline** and by using the **model & tokenizer** directly.
117
+
118
+ ### 1) Quick Start with `pipeline`
119
+
120
+ ```python
121
+ from transformers import pipeline
122
 
123
+ # Instantiate the pipeline
124
+ classifier = pipeline(
125
+ "text-classification",
126
+ model="Sengil/ModernBERT-NewsClassifier-EN-small"
127
+ )
128
 
129
+ # Sample text
130
+ text = "The President pledges new infrastructure initiatives amid economic concerns."
131
+ outputs = classifier(text)
132
 
133
+ # Output: [{'label': 'POLITICS', 'score': 0.95}, ...]
134
+ print(outputs)
135
+ ```
136
 
137
+ ### 2) Direct Model Usage
138
 
139
+ ```python
140
+ import torch
141
+ import torch.nn.functional as F
142
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
143
 
144
+ model_name = "Sengil/ModernBERT-NewsClassifier-EN-small"
 
 
 
 
 
 
 
 
 
 
145
 
146
+ # Load model & tokenizer
147
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
148
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
149
 
150
+ sample_text = "Local authorities call for better healthcare policies."
151
+ inputs = tokenizer(sample_text, return_tensors="pt", truncation=True, max_length=512)
 
 
 
 
 
152
 
153
+ with torch.no_grad():
154
+ logits = model(**inputs).logits
155
 
156
+ # Convert logits to probabilities
157
+ probs = F.softmax(logits, dim=1)[0]
158
+ predicted_label_id = torch.argmax(probs).item()
159
+
160
+ # Get the label string
161
+ id2label = model.config.id2label
162
+ predicted_label = id2label[predicted_label_id]
163
+ confidence_score = probs[predicted_label_id].item()
164
+
165
+ print(f"Predicted Label: {predicted_label} | Score: {confidence_score:.4f}")
166
+ ```
167
+
168
+ ---
169
+
170
+ ## Additional Information
171
+
172
+ - **Framework Versions**:
173
+ - **Transformers**: 4.49.0.dev0
174
+ - **PyTorch**: 2.5.1+cu121
175
+ - **Datasets**: 3.2.0
176
+ - **Tokenizers**: 0.21.0
177
+
178
+ - **License**: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
179
+ - **Intellectual Property**: The original ModernBERT base model is provided by [answerdotai](https://huggingface.co/answerdotai). This fine-tuned checkpoint inherits the same license.
180
+
181
+ ---
182
 
183
+ **Citation** (If you use or extend this model in your research or applications, please consider citing it):
184
+ ```
185
+ @misc{ModernBERTNewsClassifierENsmall,
186
+ title={ModernBERT-NewsClassifier-EN-small},
187
+ author={Sengil, Mert},
188
+ year={2025},
189
+ howpublished={\url{https://huggingface.co/Sengil/ModernBERT-NewsClassifier-EN-small}},
190
+ }
191
+ ```
192
+
193
+ Please see the [Hugging Face License](https://huggingface.co/license) for more details, and always review outputs carefully in case of domain-specific biases or hallucinations.