synapti
/

nci-binary-detector

@@ -1,134 +1,77 @@
 ---
 license: apache-2.0
 base_model: answerdotai/ModernBERT-base
 tags:
-- transformers
-- modernbert
-- text-classification
-- propaganda-detection
-- binary-classification
-- nci-protocol
-datasets:
-- synapti/nci-propaganda-production
 metrics:
 - accuracy
 - f1
 - precision
 - recall
-pipeline_tag: text-classification
 ---
-# NCI Binary Propaganda Detector
-Binary classifier that detects whether text contains propaganda/manipulation techniques.
-## Model Description
-This model is **Stage 1** of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:
-- **Stage 1 (this model)**: Fast binary detection - "Does this text contain propaganda?"
-- **Stage 2**: Multi-label technique classification - "Which specific techniques are used?"
-The binary detector is optimized for **high recall** to ensure manipulative content is not missed, while Stage 2 provides detailed technique classification.
-## Intended Uses
-- Fast filtering of content for propaganda presence
-- First-pass screening in content moderation pipelines
-- Real-time detection in social media monitoring
-- Input gating for detailed technique analysis
-## Training Data
-Trained on the [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production) dataset:
-- **23,000+ examples** from multiple sources
-- **Positive examples**: SemEval-2020 Task 11 propaganda techniques
-- **Hard negatives**: LIAR2 factual statements, Qbias center-biased news
-- **Train/Val/Test split**: 80/10/10
-## Performance
-| Metric | Score |
-|--------|-------|
-| Accuracy | ~95% |
-| F1 | ~94% |
-| Precision | ~96% |
-| Recall | ~92% |
-## Usage
-```python
-from transformers import pipeline
-# Load the model
-detector = pipeline("text-classification", model="synapti/nci-binary-detector")
-# Detect propaganda
-text = "The radical left wants to DESTROY our country!"
-result = detector(text)
-# Result: {'label': 'LABEL_1', 'score': 0.99}
-# LABEL_0 = no propaganda, LABEL_1 = has propaganda
-```
-### Two-Stage Pipeline
-For complete propaganda analysis, use with the technique classifier:
-```python
-from transformers import pipeline
-binary = pipeline("text-classification", model="synapti/nci-binary-detector")
-technique = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)
-text = "Your text here..."
-# Stage 1: Binary detection
-binary_result = binary(text)[0]
-has_propaganda = binary_result["label"] == "LABEL_1"
-if has_propaganda:
-    # Stage 2: Technique classification
-    techniques = technique(text)[0]
-    detected = [t for t in techniques if t["score"] > 0.3]
-```
-## Model Architecture
-- **Base Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
-- **Parameters**: 149.6M
-- **Max Sequence Length**: 512 tokens
-- **Output**: 2 classes (no_propaganda, has_propaganda)
-## Training Details
-- **Loss Function**: Focal Loss (gamma=2.0, alpha=0.25)
-- **Optimizer**: AdamW
-- **Learning Rate**: 2e-5
-- **Batch Size**: 16 (effective 64 with gradient accumulation)
-- **Epochs**: 5 with early stopping
-- **Hardware**: NVIDIA A10G GPU
-## Limitations
-- Trained primarily on English text
-- May not detect novel propaganda techniques not in training data
-- Optimized for short-to-medium length text (tweets, headlines, paragraphs)
-- Should be used as part of a larger analysis pipeline, not as sole arbiter
-## Citation
-```bibtex
-@misc{nci-binary-detector,
-  author = {NCI Protocol Team},
-  title = {NCI Binary Propaganda Detector},
-  year = {2024},
-  publisher = {HuggingFace},
-  url = {https://huggingface.co/synapti/nci-binary-detector}
-}
-```
-## License
-Apache 2.0

 ---
+library_name: transformers
 license: apache-2.0
 base_model: answerdotai/ModernBERT-base
 tags:
+- generated_from_trainer
 metrics:
 - accuracy
 - f1
 - precision
 - recall
+model-index:
+- name: nci-binary-detector
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# nci-binary-detector
+This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0031
+- Accuracy: 0.9954
+- F1: 0.9959
+- Precision: 0.9919
+- Recall: 1.0
+- Roc Auc: 0.9986
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 16
+- eval_batch_size: 32
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 32
+- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 5
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Accuracy | F1     | Precision | Recall | Roc Auc |
+|:-------------:|:------:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|:-------:|
+| 0.0093        | 0.1634 | 100  | 0.0043          | 0.9844   | 0.9865 | 0.9763    | 0.9970 | 0.9990  |
+| 0.0021        | 0.3268 | 200  | 0.0036          | 0.9954   | 0.9960 | 0.9930    | 0.9990 | 0.9978  |
+| 0.0001        | 0.4902 | 300  | 0.0011          | 0.9988   | 0.9990 | 0.9980    | 1.0    | 0.9999  |
+| 0.0043        | 0.6536 | 400  | 0.0009          | 0.9959   | 0.9965 | 0.9930    | 1.0    | 1.0000  |
+| 0.0001        | 0.8170 | 500  | 0.0006          | 0.9988   | 0.9990 | 0.9980    | 1.0    | 1.0000  |
+| 0.0006        | 0.9804 | 600  | 0.0010          | 0.9977   | 0.9980 | 0.9980    | 0.9980 | 0.9999  |
+### Framework versions
+- Transformers 4.57.3
+- Pytorch 2.9.1+cu128
+- Datasets 4.4.1
+- Tokenizers 0.22.1

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a024037033ebcb1cc2672d604b7ad65c9bb5cf40f6b885f9aaa31033514c7b18
 size 598439784

 version https://git-lfs.github.com/spec/v1
+oid sha256:e48724b649fbcc9f55d6b989782528171959b58f41052e547603338a6b75baa5
 size 598439784

test_results.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "eval_loss": 0.003097335109487176,
+  "eval_accuracy": 0.995373048004627,
+  "eval_f1": 0.9959432048681541,
+  "eval_precision": 0.9919191919191919,
+  "eval_recall": 1.0,
+  "eval_roc_auc": 0.998592468993421,
+  "eval_runtime": 10.1758,
+  "eval_samples_per_second": 169.913,
+  "eval_steps_per_second": 5.405,
+  "epoch": 0.9803921568627451
+}