idobrovolskyi's picture
Add model card
7bd0bc9 verified
metadata
license: cc-by-4.0
language: en
tags:
  - text-classification
  - linguistics
  - ukraine
  - toponymy
datasets:
  - KyivNotKiev/corpus
pipeline_tag: text-classification

Toponym Context Classifier

DeBERTa-v3-large fine-tuned for discourse context classification of Ukrainian toponym mentions.

Part of #KyivNotKiev.

Performance

Model F1 Macro Accuracy
DeBERTa-v3-large (this) 0.857 +/- 0.013 0.901
XLM-RoBERTa-large 0.846 +/- 0.011 0.892
mDeBERTa-v3-base 0.807 +/- 0.007 0.864

11 context classes: war_conflict, academic_science, history, politics, sports, culture_arts, food_cuisine, travel_tourism, religion, business_economy, general_news.

Training

  • Corpus: 36,791 texts, 59 toponym pairs, 5 sources
  • Annotation: Claude Haiku 4.5 (validated: kappa=0.56-0.69 vs human consensus 86.2%)
  • Loss: class-weighted cross-entropy
  • Config: LR=1e-5, epochs=3, batch=16, fp16, seed=456
  • Benchmark: 21 runs (12 hyperparameter + 9 architecture comparison)

License

CC-BY-4.0