Sengil
/

ModernBERT-NewsClassifier-EN-small

@@ -1,9 +1,14 @@
 ---
 library_name: transformers
 license: apache-2.0
 base_model: answerdotai/ModernBERT-base
 tags:
 - generated_from_trainer
 metrics:
 - f1
 model-index:
@@ -11,58 +16,178 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # ModernBERT-NewsClassifier-EN-small
-This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 3.1201
-- F1: 0.5475
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 8
-- eval_batch_size: 4
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 16
-- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 100
-- num_epochs: 5
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | F1     |
-|:-------------:|:------:|:----:|:---------------:|:------:|
-| 2.6251        | 1.0    | 1688 | 1.3810          | 0.5543 |
-| 1.9267        | 2.0    | 3376 | 1.4378          | 0.5588 |
-| 0.6349        | 3.0    | 5064 | 2.1705          | 0.5415 |
-| 0.1273        | 4.0    | 6752 | 2.9007          | 0.5402 |
-| 0.0288        | 4.9973 | 8435 | 3.1201          | 0.5475 |
-### Framework versions
-- Transformers 4.49.0.dev0
-- Pytorch 2.5.1+cu121
-- Datasets 3.2.0
-- Tokenizers 0.21.0

+```yaml
 ---
 library_name: transformers
 license: apache-2.0
 base_model: answerdotai/ModernBERT-base
 tags:
 - generated_from_trainer
+- text-classification
+- news-classification
+- english
+- modernbert
 metrics:
 - f1
 model-index:
   results: []
 ---
 # ModernBERT-NewsClassifier-EN-small
+This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an English **News Category** dataset covering 15 distinct topics (e.g., **Politics**, **Sports**, **Business**, etc.). It achieves the following results on the evaluation set:
+- **Validation Loss**: `3.1201`
+- **Weighted F1 Score**: `0.5475`
+---
+## Model Description
+**Architecture**: This model is based on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), an advanced Transformer architecture featuring Rotary Position Embeddings (RoPE), Flash Attention, and a native long context window (up to 8,192 tokens). For the classification task, a linear classification head is added on top of the BERT encoder outputs.
+**Task**: **Multi-class News Classification**
+- The model classifies English news headlines or short texts into one of 15 categories.
+**Use Cases**:
+- Automatically tagging news headlines with appropriate categories in editorial pipelines.
+- Classifying short text blurbs for social media or aggregator systems.
+- Building a quick filter for content-based recommendation engines.
+---
+## Intended Uses & Limitations
+- **Intended for**: Users who need to categorize short English news texts into broad topics.
+- **Language**: Trained primarily on **English** texts. Performance on non-English text is not guaranteed.
+- **Limitations**:
+  - Certain categories (e.g., `BLACK VOICES`, `QUEER VOICES`) may contain nuanced language that could lead to misclassification if context is limited or if the text is ambiguous.
+  - Headlines that discuss multiple topics might be assigned a single category, which could miss multi-faceted angles of the text.
+---
+## Training and Evaluation Data
+- **Dataset**: Curated from an English news-category dataset with 15 labels (e.g., `POLITICS`, `ENTERTAINMENT`, `SPORTS`, `BUSINESS`, etc.).
+- **Data Size**: ~30,000 samples in total, balanced at 2,000 samples per category.
+- **Split**: 90% training (27,000 samples) and 10% testing (3,000 samples).
+### Categories
+1. POLITICS
+2. WELLNESS
+3. ENTERTAINMENT
+4. TRAVEL
+5. STYLE & BEAUTY
+6. PARENTING
+7. HEALTHY LIVING
+8. QUEER VOICES
+9. FOOD & DRINK
+10. BUSINESS
+11. COMEDY
+12. SPORTS
+13. BLACK VOICES
+14. HOME & LIVING
+15. PARENTS
+---
+## Training Procedure
+### Hyperparameters
+| Hyperparameter                | Value                  |
+|------------------------------:|:-----------------------|
+| **learning_rate**            | 5e-05                  |
+| **train_batch_size**         | 8                      |
+| **eval_batch_size**          | 4                      |
+| **seed**                     | 42                     |
+| **gradient_accumulation_steps** | 2                  |
+| **total_train_batch_size**   | 16 (8 x 2)             |
+| **optimizer**                | `adamw_torch_fused` (betas=(0.9,0.999), epsilon=1e-08) |
+| **lr_scheduler_type**        | linear                 |
+| **lr_scheduler_warmup_steps**| 100                    |
+| **num_epochs**               | 5                      |
+**Optimizer**: Used `AdamW` with fused kernels (`adamw_torch_fused`) for efficiency.
+**Loss Function**: Cross-entropy (with weighted F1 as metric).
+---
+## Training Results
+| Training Loss | Epoch  | Step | Validation Loss | F1 (Weighted) |
+|:-------------:|:------:|:----:|:---------------:|:-------------:|
+| 2.6251        | 1.0    | 1688 | 1.3810          | 0.5543        |
+| 1.9267        | 2.0    | 3376 | 1.4378          | 0.5588        |
+| 0.6349        | 3.0    | 5064 | 2.1705          | 0.5415        |
+| 0.1273        | 4.0    | 6752 | 2.9007          | 0.5402        |
+| 0.0288        | 4.9973 | 8435 | 3.1201          | 0.5475        |
+- **Best Weighted F1** observed near the final epochs is **~0.55** on the validation set.
+---
+## Inference Example
+Below are two ways to use this model: via a **pipeline** and by using the **model & tokenizer** directly.
+### 1) Quick Start with `pipeline`
+```python
+from transformers import pipeline
+# Instantiate the pipeline
+classifier = pipeline(
+    "text-classification",
+    model="Sengil/ModernBERT-NewsClassifier-EN-small"
+)
+# Sample text
+text = "The President pledges new infrastructure initiatives amid economic concerns."
+outputs = classifier(text)
+# Output: [{'label': 'POLITICS', 'score': 0.95}, ...]
+print(outputs)
+```
+### 2) Direct Model Usage
+```python
+import torch
+import torch.nn.functional as F
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+model_name = "Sengil/ModernBERT-NewsClassifier-EN-small"
+# Load model & tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+sample_text = "Local authorities call for better healthcare policies."
+inputs = tokenizer(sample_text, return_tensors="pt", truncation=True, max_length=512)
+with torch.no_grad():
+    logits = model(**inputs).logits
+# Convert logits to probabilities
+probs = F.softmax(logits, dim=1)[0]
+predicted_label_id = torch.argmax(probs).item()
+# Get the label string
+id2label = model.config.id2label
+predicted_label = id2label[predicted_label_id]
+confidence_score = probs[predicted_label_id].item()
+print(f"Predicted Label: {predicted_label} | Score: {confidence_score:.4f}")
+```
+---
+## Additional Information
+- **Framework Versions**:
+  - **Transformers**: 4.49.0.dev0
+  - **PyTorch**: 2.5.1+cu121
+  - **Datasets**: 3.2.0
+  - **Tokenizers**: 0.21.0
+- **License**: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+- **Intellectual Property**: The original ModernBERT base model is provided by [answerdotai](https://huggingface.co/answerdotai). This fine-tuned checkpoint inherits the same license.
+---
+**Citation** (If you use or extend this model in your research or applications, please consider citing it):
+```
+@misc{ModernBERTNewsClassifierENsmall,
+  title={ModernBERT-NewsClassifier-EN-small},
+  author={Sengil, Mert},
+  year={2025},
+  howpublished={\url{https://huggingface.co/Sengil/ModernBERT-NewsClassifier-EN-small}},
+}
+```
+Please see the [Hugging Face License](https://huggingface.co/license) for more details, and always review outputs carefully in case of domain-specific biases or hallucinations.