metadata
license: apache-2.0
language:
- tr
base_model:
- dbmdz/bert-base-turkish-cased
pipeline_tag: token-classification
tags:
- e-commerce
- ner
- named-entity-recognition
- bert
- nlp
Turkish BERT for Aspect Term Extraction
This model is a fine-tuned version of dbmdz/bert-base-turkish-cased specifically trained for aspect term extraction from Turkish e-commerce product reviews.
Model Description
- Base Model: dbmdz/bert-base-turkish-cased
- Task: Token Classification (Aspect Term Extraction)
- Language: Turkish
- Domain: E-commerce product reviews
Model Performance
- F1 Score: 83% on test set
- Test Set Size: 2,000 samples
- Training Set Size: ~16,000 samples
Training Details
Training Data
- Dataset Size: 16,000 reviews
- Data Source: Private e-commerce product review dataset
- Domain: E-commerce product reviews in Turkish
- Coverage: Over 500 product categories
Training Configuration
- Epochs: 5
- Task Type: Token Classification
- Label Scheme: BIO tagging
B-ASPECT
: Beginning of an aspect termI-ASPECT
: Inside/continuation of an aspect termO
: Outside (not an aspect term)
Training Loss
The model showed consistent improvement across epochs:
Epoch | Loss |
---|---|
1 | 0.1758 |
2 | 0.1749 |
3 | 0.1217 |
4 | 0.1079 |
5 | 0.0699 |
Usage
Option 1: Using Pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
# Create pipeline
aspect_extractor = pipeline("token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple")
# Example usage
text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz."
results = aspect_extractor(text)
print(results)
Expected Output:
[{'entity_group': 'ASPECT', 'score': 0.99498886, 'word': 'kamerası', 'start': 13, 'end': 21},
{'entity_group': 'ASPECT', 'score': 0.9970175, 'word': 'bataryası', 'start': 34, 'end': 43}]
Option 2: Manual Inference
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
# Example text
text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz."
# Tokenize input
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class_ids = predictions.argmax(dim=-1)
# Convert predictions to labels
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[0]]
# Display results
for token, label in zip(tokens, predicted_labels):
if token not in ['[CLS]', '[SEP]', '[PAD]']:
print(f"{token}: {label}")
Expected Output:
Bu: O
telefonun: O
kamerası: B-ASPECT
çok: O
iyi: O
ama: O
batarya: B-ASPECT
##sı: I-ASPECT
yetersiz: O
.: O
Option 3: Batch Inference
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
# Example texts for batch processing
texts = [
"Bu telefonun kamerası çok iyi ama bataryası yetersiz.",
"Ürünün fiyatı uygun ancak kalitesi düşük.",
"Teslimat hızı mükemmel, ambalaj da gayet sağlam."
]
# Tokenize all texts
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)
# Get predictions for all texts
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class_ids = predictions.argmax(dim=-1)
# Process results for each text
for i, text in enumerate(texts):
print(f"\nText {i+1}: {text}")
print("-" * 50)
# Get tokens for this specific text
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][i])
predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[i]]
# Display results
for token, label in zip(tokens, predicted_labels):
if token not in ['[CLS]', '[SEP]', '[PAD]']:
print(f"{token}: {label}")
Expected Output:
Text 1: Bu telefonun kamerası çok iyi ama bataryası yetersiz.
Bu: O
telefonun: O
kamerası: B-ASPECT
çok: O
iyi: O
ama: O
batarya: B-ASPECT
##sı: I-ASPECT
yetersiz: O
.: O
Text 2: Ürünün fiyatı uygun ancak kalitesi düşük.
Ürünün: O
fiyatı: B-ASPECT
uygun: O
ancak: O
kalitesi: B-ASPECT
düşük: O
.: O
Text 3: Teslimat hızı mükemmel, ambalaj da gayet sağlam.
Teslim: B-ASPECT
##at: I-ASPECT
hızı: I-ASPECT
mükemmel: O
,: O
ambalaj: B-ASPECT
da: O
gayet: O
sağlam: O
.: O
Label Mapping
id2label = {
0: "O",
1: "B-ASPECT",
2: "I-ASPECT"
}
label2id = {
"O": 0,
"B-ASPECT": 1,
"I-ASPECT": 2
}
Intended Use
This model is designed for:
- Extracting aspect terms from Turkish e-commerce product reviews
- Identifying product features and attributes mentioned in reviews
- Supporting aspect-based sentiment analysis pipelines
Limitations
- Trained specifically on e-commerce domain data
- Performance may vary on other domains or text types
- Limited to Turkish language
- Based on private dataset, so reproducibility may be limited
Citation
If you use this model, please cite:
@misc{turkish-bert-aspect-extraction,
title={Turkish BERT for Aspect Term Extraction},
author={Abdullah Koçak},
year={2025},
url={https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction}
}
Base Model Citation
@misc{schweter2020bertbase,
title={BERTurk - BERT models for Turkish},
author={Stefan Schweter},
year={2020},
publisher={Hugging Face},
url={https://huggingface.co/dbmdz/bert-base-turkish-cased}
}