---
license: apache-2.0
datasets:
- synapti/nci-propaganda-production
base_model: answerdotai/ModernBERT-base
tags:
- transformers
- modernbert
- text-classification
- propaganda-detection
- multi-label-classification
- nci-protocol
- semeval-2020
- onnx
library_name: transformers
pipeline_tag: text-classification
---

# NCI Technique Classifier

Multi-label classifier that identifies specific propaganda techniques in text.

## Model Description

This model is **Stage 2** of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:

- **Stage 1**: Fast binary detection - "Does this text contain propaganda?"
- **Stage 2 (this model)**: Multi-label technique classification - "Which specific techniques are used?"

The classifier identifies **18 propaganda techniques** from the SemEval-2020 Task 11 taxonomy.

## Propaganda Techniques

| # | Technique | F1 Score | Optimal Threshold |
|---|-----------|----------|-------------------|
| 0 | Loaded_Language | 95.3% | 0.3 |
| 1 | Appeal_to_fear-prejudice | 85.1% | 0.3 |
| 2 | Exaggeration,Minimisation | 49.0% | 0.4 |
| 3 | Repetition | 55.9% | 0.4 |
| 4 | Flag-Waving | 50.9% | 0.4 |
| 5 | Name_Calling,Labeling | 79.0% | 0.1 |
| 6 | Reductio_ad_hitlerum | 82.4% | 0.3 |
| 7 | Black-and-White_Fallacy | 68.8% | 0.5 |
| 8 | Causal_Oversimplification | 67.9% | 0.4 |
| 9 | Whataboutism,Straw_Men,Red_Herring | 47.7% | 0.3 |
| 10 | Straw_Man | 60.3% | 0.5 |
| 11 | Red_Herring | 86.3% | 0.5 |
| 12 | Doubt | 63.4% | 0.3 |
| 13 | Appeal_to_Authority | 50.0% | 0.3 |
| 14 | Thought-terminating_Cliches | 71.2% | 0.5 |
| 15 | Bandwagon | 46.7% | 0.5 |
| 16 | Slogans | 46.0% | 0.3 |
| 17 | Obfuscation,Intentional_Vagueness,Confusion | 86.3% | 0.5 |

## Performance

**Test Set Results (1,729 samples):**

| Metric | Default (0.5) | Optimized Thresholds |
|--------|--------------|---------------------|
| Micro F1 | 72.7% | **80.3%** |
| Macro F1 | 62.5% | **68.3%** |
| ECE (Calibration Error) | - | **0.0096** |

## Usage

### Basic Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="synapti/nci-technique-classifier",
    top_k=None  # Return all labels
)

text = "The radical left is DESTROYING our country!"
results = classifier(text)[0]

# Get detected techniques (using default 0.5 threshold)
detected = [r for r in results if r["score"] > 0.5]
for d in detected:
    print(f"{d['label']}: {d['score']:.2%}")
```

### With Calibration Config (Recommended)

The model includes a `calibration_config.json` file with optimized per-technique thresholds and temperature scaling for better calibrated confidence scores.

```python
import json
from transformers import pipeline
from huggingface_hub import hf_hub_download

# Load calibration config
config_path = hf_hub_download(
    repo_id="synapti/nci-technique-classifier",
    filename="calibration_config.json"
)
with open(config_path) as f:
    config = json.load(f)

temperature = config["temperature"]  # 0.75
thresholds = config["thresholds"]
labels = config["technique_labels"]

classifier = pipeline(
    "text-classification",
    model="synapti/nci-technique-classifier",
    top_k=None
)

text = "Your text here..."
results = classifier(text)[0]

# Apply per-technique thresholds
detected = []
for r in results:
    idx = int(r["label"].split("_")[1])
    technique = labels[idx]
    threshold = thresholds.get(technique, 0.5)
    if r["score"] > threshold:
        detected.append((technique, r["score"]))
```

### ONNX Inference (Faster)

The model is also available in ONNX format for optimized inference:

```python
import onnxruntime as ort
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
import numpy as np

# Download ONNX model
onnx_path = hf_hub_download(
    repo_id="synapti/nci-technique-classifier",
    filename="onnx/model.onnx"
)

# Load tokenizer and ONNX session
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier")
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])

# Inference
text = "Your text here..."
inputs = tokenizer(text, padding="max_length", truncation=True, max_length=512, return_tensors="np")
onnx_inputs = {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"],
}
logits = session.run(None, onnx_inputs)[0]
probs = 1 / (1 + np.exp(-logits))  # Sigmoid for multi-label
```

### Two-Stage Pipeline

For best results, use with the binary detector:

```python
from transformers import pipeline

# Stage 1: Binary detection (fast filter)
detector = pipeline("text-classification", model="synapti/nci-binary-detector")

# Stage 2: Technique classification
classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)

text = "Your text to analyze..."

# Quick check first
detection = detector(text)[0]
if detection["label"] == "has_propaganda" and detection["score"] > 0.5:
    # Detailed technique analysis
    techniques = classifier(text)[0]
    detected = [t for t in techniques if t["score"] > 0.3]
    for t in detected:
        print(f"{t['label']}: {t['score']:.2%}")
else:
    print("No propaganda detected")
```

## Calibration Config

The `calibration_config.json` file contains:

```json
{
  "temperature": 0.75,
  "thresholds": {
    "Loaded_Language": 0.3,
    "Appeal_to_fear-prejudice": 0.3,
    "Name_Calling,Labeling": 0.1,
    ...
  },
  "metrics": {
    "ece": 0.0096,
    "micro_f1_optimized": 0.803,
    "macro_f1_optimized": 0.683
  }
}
```

## Training Data

Trained on [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production):

- **23,000+ examples** with multi-hot technique labels
- **Augmented data** for minority techniques (MLSMOTE)
- **Hard negatives** from LIAR2 and Qbias datasets
- **Class-weighted Focal Loss** to handle imbalance

## Model Architecture

- **Base Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
- **Parameters**: 149.6M
- **Max Sequence Length**: 512 tokens
- **Output**: 18 labels (multi-label sigmoid)
- **Calibration Temperature**: 0.75

## Available Files

| File | Description |
|------|-------------|
| `model.safetensors` | PyTorch model weights |
| `calibration_config.json` | Optimized thresholds & temperature |
| `onnx/model.onnx` | ONNX model for fast inference |
| `config.json` | Model configuration |

## Training Details

- **Loss Function**: Class-weighted Focal Loss (gamma=2.0)
- **Class Weights**: Inverse frequency weighting
- **Optimizer**: AdamW
- **Learning Rate**: 2e-5
- **Batch Size**: 8 (effective 32 with gradient accumulation)
- **Epochs**: 5 with early stopping (patience=3)
- **Hardware**: NVIDIA A10G GPU

## Limitations

- Trained primarily on English text
- Performance varies by technique (see table above)
- Some techniques overlap semantically
- Should be used with binary detector for best results
- Threshold optimization recommended for specific use cases

## Related Models

- [synapti/nci-binary-detector](https://huggingface.co/synapti/nci-binary-detector) - Stage 1 binary detector

## Citation

```bibtex
@inproceedings{da-san-martino-etal-2020-semeval,
    title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles",
    author = "Da San Martino, Giovanni and others",
    booktitle = "Proceedings of SemEval-2020",
    year = "2020",
}

@misc{nci-technique-classifier,
  author = {NCI Protocol Team},
  title = {NCI Technique Classifier},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-technique-classifier}
}
```

## License

Apache 2.0