File size: 5,431 Bytes
f6b52e4 a6e46fc 93f6a50 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
---
language:
- az
tags:
- token-classification
- ner
- bert
- multilingual
license: mit
datasets:
- LocalDoc/azerbaijani-ner-dataset
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: mBERT Azerbaijani NER Model
results:
- task:
name: Named Entity Recognition
type: token-classification
dataset:
name: Azerbaijani NER Dataset
type: LocalDoc/azerbaijani-ner-dataset
metrics:
- name: Precision
type: precision
value: 0.704872
- name: Recall
type: recall
value: 0.650684
- name: F1
type: f1
value: 0.676695
- name: Accuracy
type: accuracy
value: 0.920898
---
# mBERT Azerbaijani NER Model
[](https://huggingface.co/IsmatS/mbert-az-ner)
This model is a fine-tuned version of **mBERT** (Multilingual BERT) for Named Entity Recognition (NER) in the Azerbaijani language. It recognizes several entity types commonly used in Azerbaijani text, providing solid performance on tasks requiring entity extraction, such as personal names, locations, organizations, and dates.
## Model Details
- **Base Model**: `bert-base-multilingual-cased`
- **Fine-tuned on**: [Azerbaijani Named Entity Recognition Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset)
- **Task**: Named Entity Recognition (NER)
- **Language**: Azerbaijani (az)
- **Dataset**: Custom Azerbaijani NER dataset with entity tags such as `PERSON`, `LOCATION`, `ORGANISATION`, `DATE`, etc.
### Data Source
The model was trained on the [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset), which provides annotated data with 25 distinct entity types specifically for the Azerbaijani language. This dataset is an invaluable resource for improving NLP tasks in Azerbaijani, including entity recognition and language understanding.
### Entity Types
The model recognizes the following entities:
- **PERSON**: Names of people
- **LOCATION**: Geographical locations
- **ORGANISATION**: Companies, institutions
- **DATE**: Dates and periods
- **MONEY**: Monetary values
- **TIME**: Time expressions
- **GPE**: Countries, cities, states
- **FACILITY**: Buildings, landmarks, etc.
- **EVENT**: Events and occurrences
- **...and more**
For the full list of entities, please refer to the dataset description.
## Performance Metrics
### Epoch-wise Performance
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
|-------|---------------|-----------------|-----------|--------|--------|----------|
| 1 | 0.295200 | 0.265711 | 0.715424 | 0.622853 | 0.665937 | 0.919136 |
| 2 | 0.248600 | 0.252083 | 0.721036 | 0.637979 | 0.676970 | 0.921439 |
| 3 | 0.206800 | 0.253372 | 0.704872 | 0.650684 | 0.676695 | 0.920898 |
### Evaluation Summary (Epoch 3)
- **Evaluation Loss**: 0.253372
- **Evaluation Precision**: 0.704872
- **Evaluation Recall**: 0.650684
- **Evaluation F1**: 0.676695
- **Evaluation Accuracy**: 0.920898
## Usage
You can use this model with the Hugging Face `transformers` library to perform NER on Azerbaijani text. Here’s an example:
### Installation
Make sure you have the `transformers` library installed:
```bash
pip install transformers
```
### Inference Example
Load the model and tokenizer, then run the NER pipeline on Azerbaijani text:
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
# Load the model and tokenizer
model_name = "IsmatS/mbert-az-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Set up the NER pipeline
nlp_ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
# Example sentence
sentence = "Bakı şəhərində Azərbaycan Respublikasının prezidenti İlham Əliyev."
entities = nlp_ner(sentence)
# Display entities
for entity in entities:
print(f"Entity: {entity['word']}, Label: {entity['entity_group']}, Score: {entity['score']}")
```
### Sample Output
```json
[
{
"entity_group": "PERSON",
"score": 0.97,
"word": "İlham Əliyev",
"start": 34,
"end": 46
},
{
"entity_group": "LOCATION",
"score": 0.95,
"word": "Bakı",
"start": 0,
"end": 4
}
]
```
## Training Details
- **Training Data**: This model was fine-tuned on the [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset) with 25 entity types.
- **Training Framework**: Hugging Face `transformers`
- **Optimizer**: AdamW
- **Epochs**: 3
- **Batch Size**: 64
- **Evaluation Metric**: F1-score
## Limitations
- The model is trained specifically for the Azerbaijani language and may not generalize well to other languages.
- Certain rare entities may be misclassified due to limited training data in those categories.
## Citation
If you use this model in your research or application, please consider citing:
```
@model{ismats_mbert_az_ner_2024,
title={mBERT Azerbaijani NER Model},
author={Ismat Samadov},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/IsmatS/mbert-az-ner}
}
```
## License
This model is available under the [MIT License](https://opensource.org/licenses/MIT). |