File size: 8,332 Bytes
05656a6
 
 
943f626
 
 
 
 
 
 
05656a6
 
9afb7e1
 
 
 
05656a6
 
943f626
 
9afb7e1
 
 
 
 
 
943f626
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9afb7e1
 
943f626
9afb7e1
29bddd8
9afb7e1
943f626
9afb7e1
ae13e8e
9afb7e1
ae13e8e
9afb7e1
 
943f626
9afb7e1
943f626
 
 
 
 
 
 
 
9afb7e1
e45b95b
943f626
9afb7e1
943f626
9afb7e1
943f626
 
 
 
9afb7e1
943f626
 
 
 
 
 
 
9afb7e1
943f626
 
 
 
9afb7e1
095c699
9d3998c
 
 
095c699
9d3998c
095c699
9afb7e1
9d3998c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9afb7e1
9d3998c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
943f626
 
9d3998c
943f626
 
9d3998c
 
 
943f626
 
 
9d3998c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
943f626
9d3998c
 
 
 
 
 
 
943f626
 
 
e45b95b
 
 
 
 
 
 
9d3998c
 
 
e45b95b
943f626
 
 
 
9d3998c
943f626
 
9d3998c
e45b95b
943f626
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d3998c
 
 
 
943f626
 
 
 
 
 
 
 
9d3998c
943f626
 
26ad2bd
943f626
 
 
 
 
 
9d3998c
943f626
 
 
9d3998c
943f626
 
9d3998c
943f626
 
9afb7e1
9d3998c
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
---
base_model:
- Unbabel/wmt22-comet-da
- xlm-roberta-large
language:
- en
- rw
library_name: comet
license: apache-2.0
pipeline_tag: text-classification
tags:
- kinyarwanda
- english
- translation
- quality-estimation
- comet
- mt-evaluation
- african-languages
- low-resource-languages
- multilingual
metrics:
- pearson
- spearman
- mae
- rmse

model-index:
- name: KinyCOMET
  results:
  - task:
      type: translation-quality-estimation
      name: Translation Quality Estimation
    dataset:
      type: custom
      name: Kinyarwanda-English QE Dataset
    metrics:
    - type: pearson
      value: 0.751
      name: Pearson Correlation
    - type: spearman
      value: 0.593
      name: Spearman Correlation
    - type: system_score
      value: 0.896
      name: System Score
---

# KinyCOMET — Translation Quality Estimation for Kinyarwanda ↔ English

![KinyCOMET Banner](https://huggingface.co/chrismazii/kinycomet_unbabel/resolve/main/banner.png)

## Model Description

KinyCOMET is a neural translation quality estimation model for Kinyarwanda-English translation pairs. The model addresses the poor correlation between BLEU scores and human judgment in Kinyarwanda translation evaluation, achieving 0.75 Pearson correlation with human assessments

The model was trained on 4,323 human-annotated translation pairs collected from 15 linguistics students using Direct Assessment scoring aligned with WMT evaluation standards.


## Model Variants & Performance

| Variant | Base Model | Pearson | Spearman | Kendall's τ | MAE |
|---------|------------|---------|----------|-------------|-----|
| **KinyCOMET-Unbabel** | Unbabel/wmt22-comet-da | **0.75** | **0.59** | **0.42** | **0.07** |
| **KinyCOMET-XLM** | XLM-RoBERTa-large | 0.73 | 0.50 | 0.35 | 0.07 |
| Unbabel (baseline) | wmt22-comet-da | 0.54 | 0.55 | 0.39 | 0.17 |
| AfriCOMET STL 1.1 | AfriCOMET base | 0.52 | 0.35 | 0.24 | 0.18 |
| BLEU | N/A | 0.30 | 0.34 | 0.23 | 0.62 |
| chrF | N/A | 0.38 | 0.30 | 0.21 | 0.34 |

Both KinyCOMET variants outperform existing baselines. KinyCOMET-Unbabel shows the strongest overall correlation, while performance varies by translation direction:
## Performance Highlights

### Comprehensive Evaluation Results

**Overall Performance (Both Directions)**
- **Pearson Correlation**: 0.75 (KinyCOMET-Unbabel) vs 0.30 (BLEU) - **2.5x improvement**
- **Spearman Correlation**: 0.59 vs 0.34 (BLEU) - **73% improvement**  
- **Mean Absolute Error**: 0.07 vs 0.62 (BLEU) - **89% reduction**

### Directional Analysis
| Direction | Model | Pearson | Spearman | Kendall's τ |
|-----------|-------|---------|----------|-------------|
| **English → Kinyarwanda** | KinyCOMET-XLM | **0.76** | 0.52 | 0.37 |
| **English → Kinyarwanda** | KinyCOMET-Unbabel | 0.75 | **0.56** | **0.40** |
| **Kinyarwanda → English** | KinyCOMET-Unbabel | **0.63** | **0.47** | **0.33** |
| **Kinyarwanda → English** | KinyCOMET-XLM | 0.37 | 0.29 | 0.21 |

**Key Insights:**
- English→Kinyarwanda consistently outperforms Kinyarwanda→English across all metrics
- Both KinyCOMET variants significantly outperform AfriCOMET baselines despite including Kinyarwanda
- Surprising finding: Unbabel baseline (not trained on Kinyarwanda) outperforms AfriCOMET variants

## Installation

Make sure you have Python ≥ 3.8 and install COMET via pip:

```bash
pip install unbabel-comet
```

You can verify the CLI tool is installed:

```bash
which comet-score
# should print something like: /usr/local/bin/comet-score
```

For more details on COMET, see the [official documentation](https://unbabel.github.io/COMET/html/index.html).

## Usage

###  Load and Use the Model in Python

Here's a simple example to score translations directly in Python:

```python
from comet import load_from_checkpoint

# Load the public KinyCOMET model
model = load_from_checkpoint("chrismazii/kinycomet_unbabel")

# Example translations
samples = [
    {
        "src": "Umugabo ararya.",
        "mt": "The man is eating.",
        "ref": "The man is eating."
    },
    {
        "src": "Umwana arasinzira.",
        "mt": "A dog sleeps.",
        "ref": "The child is sleeping."
    }
]

# Predict scores
pred = model.predict(samples, gpus=0)
print(pred)
```

**Output Example:**

```python
Prediction({
  'scores': [0.9899, 0.8813],
  'system_score': 0.9356
})
```

### Using the Command Line Interface (CLI)

You can also evaluate translations directly using the terminal.

**Step 1: Create the text files**

```bash
cat > source.txt <<'SRC'
Umugabo ararya.
Umwana arasinzira.
Uyu mwanya neza cyane.
SRC

cat > reference.txt <<'REF'
The man is eating.
The child is sleeping.
This place is very nice.
REF

cat > hypothesis.txt <<'HYP'
The man is eating.
A dog sleeps.
This place is very nice.
HYP
```

**Step 2: Run KinyCOMET**

```bash
comet-score -s source.txt -r reference.txt -t hypothesis.txt \
  --model chrismazii/kinycomet_unbabel --gpus 0 --to_json results.json
```

**Step 3: View the results**

```bash
cat results.json
```


### Score Interpretation

- **Scores range from 0 to 1**: Higher scores indicate better translation quality
- **System score**: Average quality across all translations
- **Segment scores**: Individual quality scores for each translation pair
- **Threshold guidance**: Scores above 0.8 typically indicate high-quality translations

## Training Details

### Data
- 4,323 human-annotated Kinyarwanda-English translation pairs
- Annotations collected from 15 linguistics students
- Direct Assessment scoring following WMT standards
- Split: 80% train (3,497) / 10% validation (404) / 10% test (422)
- Domains: education and tourism
  
### Model Architecture
- **Base Models**: XLM-RoBERTa-large and Unbabel/wmt22-comet-da
- **Framework**: COMET quality estimation framework
- **Evaluation metrics**: Kendall's τ and Spearman ρ correlation with human DA scores

### Training Configuration
- **Methodology**: COMET framework with Direct Assessment supervision
- **Evaluation Metrics**: Kendall's τ and Spearman ρ correlation with human DA scores
- **Data Split**: 80% train (3,497) / 10% validation (404) / 10% test (422)

### MT System Benchmarking Results

We evaluated several production MT systems using KinyCOMET:

| MT System | Kinyarwanda→English | English→Kinyarwanda | Overall |
|-----------|:-------------------:|:-------------------:|:-------:|
| **GPT-4o** | **93.10%** ± 7.77 | 87.83% ± 11.15 | 90.69% ± 9.82 |
| **GPT-4.1** | 93.08% ± 6.62 | **87.92%** ± 10.38 | 90.75% ± 8.90 |
| **Gemini Flash 2.0** | 91.46% ± 11.39 | 90.02% ± 8.92 | **90.80%** ± 10.35 |
| **Claude 3.7** | 92.48% ± 8.32 | 85.75% ± 11.28 | 89.43% ± 10.33 |
| **NLLB-1.3B** | 89.42% ± 12.04 | 83.96% ± 16.31 | 86.78% ± 14.52 |
| **NLLB-600M** | 88.87% ± 12.11 | 75.46% ± 28.49 | 82.71% ± 22.27 |

**Key Findings:**
- LLM-based systems significantly outperform traditional neural MT
- All systems perform better on Kinyarwanda→English than English→Kinyarwanda



## Dataset Access

The training dataset is available separately. See the [KinyCOMET Dataset Card](https://huggingface.co/datasets/chrismazii/kinycomet_dataset) for details on accessing the human-annotated quality estimation data.

## Citation & Research

If you use KinyCOMET in your research, please cite:

```bibtex
@misc{kinycomet2025,
    title={KinyCOMET: Translation Quality Estimation for Kinyarwanda-English},
    author={Prince Chris Mazimpaka and Jan Nehring},
    year={2025},
    publisher={Hugging Face},
    howpublished={\url{https://huggingface.co/chrismazii/kinycomet_unbabel}}
}
```


## License

This model is released under the Apache 2.0 License.

## Acknowledgments

- **COMET Framework**: Built on the excellent [COMET quality estimation framework](https://unbabel.github.io/COMET/html/index.html)
- **Base Models**: Leverages XLM-RoBERTa and Unbabel's WMT22 COMET-DA models  
- **African NLP Community**: Inspired by ongoing efforts to advance African language technologies
- **Contributors**: Thanks to the 15 linguistics students and all researchers who made this work possible

---

**Resources:**
- [COMET Documentation](https://unbabel.github.io/COMET/html/index.html)
- [Dataset Card](https://huggingface.co/datasets/chrismazii/kinycomet_dataset)
- [Model Files](https://huggingface.co/chrismazii/kinycomet_unbabel/tree/main)