--- library_name: transformers tags: - Aspect Term Extraction - transformers - t5 language: - tr metrics: - micro-f1 base_model: - Turkish-NLP/t5-efficient-base-turkish pipeline_tag: text2text-generation widget: - text: "Pilav çok lezzetliydi ama servis yavaştı." example_title: "Demo" output: text: "pilav, servis" --- # **Sengil/t5-turkish-aspect-term-extractor** 🇹🇷 A Turkish sequence-to-sequence model based on `Turkish-NLP/t5-efficient-base-turkish`, fine-tuned for **Aspect Term Extraction (ATE)** from customer reviews and sentences. Given a Turkish sentence, the model generates a list of **aspect terms** (e.g., *kahve*, *servis*, *fiyatlar*) that reflect the primary discussed entities or features. --- ## ✨ Example ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM import torch import re from collections import Counter #LOAD MODEL MODEL_ID = "Sengil/t5-turkish-aspect-term-extractor" DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_ID).to(DEVICE) model.eval() TURKISH_STOPWORDS = { "ve", "çok", "ama", "bir", "bu", "daha", "gibi", "ile", "için", "de", "da", "ki", "o", "şu", "bu", "sen", "biz", "siz", "onlar" } def is_valid_aspect(word): word = word.strip().lower() return ( len(word) > 1 and word not in TURKISH_STOPWORDS and word.isalpha() ) def extract_and_rank_aspects(text, max_tokens=64, beams=5): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(DEVICE) with torch.no_grad(): outputs = model.generate( input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=max_tokens, num_beams=beams, num_return_sequences=beams, early_stopping=True ) all_predictions = [ tokenizer.decode(output, skip_special_tokens=True) for output in outputs ] all_terms = [] for pred in all_predictions: candidates = re.split(r"[;,–—\-]|(?:\s*,\s*)", pred) all_terms.extend([w.strip().lower() for w in candidates if is_valid_aspect(w)]) ranked = Counter(all_terms).most_common() return ranked #INFERENCE text = "Artılar: Göl manzarasıyla harika bir atmosfer, Ipoh'un her zaman sıcak olan havası nedeniyle iyi bir klima olan restoran, iyi ve hızlı hizmet sunan garsonlar, temassız ödeme kabul eden e-cüzdan, ücretsiz otopark ama sıcak güneş altında açık, yemeklerin tadı güzel." ranked_aspects = extract_and_rank_aspects(text) print("Sorted Aspect Terms:") for term, score in ranked_aspects: print(f"{term:<15} skor: {score}") ``` **Output:** ``` Sorted Aspect Terms: atmosfer skor: 1 servis skor: 1 restoran skor: 1 hizmet skor: 1 ``` --- ## 📌 Model Details | Detail | Value | | -------------------- | -------------------------------------------- | | **Model Type** | `AutoModelForSeq2SeqLM` (T5-style) | | **Base Model** | `Turkish-NLP/t5-efficient-base-turkish` | | **Languages** | `tr` (Turkish) | | **Fine-tuning Task** | Aspect Term Extraction (sequence generation) | | **Framework** | 🤗 Transformers | | **License** | Apache-2.0 | | **Tokenizer** | SentencePiece (T5-style) | --- ## 📊 Dataset & Training * Total samples: 37,000+ Turkish review sentences * Input: Raw sentence (e.g., `"Pilav çok lezzetliydi ama servis yavaştı."`) * Target: Comma-separated aspect terms (e.g., `"pilav, servis"`) ### Training Configuration | Setting | Value | | --------------------- | ------------------ | | **Epochs** | 3 | | **Batch size** | 8 | | **Max input length** | 128 tokens | | **Max output length** | 64 tokens | | **Optimizer** | AdamW | | **Learning rate** | 3e-5 | | **Scheduler** | Linear | | **Precision** | FP32 | | **Hardware** | 1× Tesla T4 / P100 | --- ### 🔍 Evaluation The model was evaluated using exact-match micro-F1 score on a held-out test set. | Metric | Score | | --------------- | ----: | | **Micro-F1** | 0.84+ | | **Exact Match** | \~78% | --- ## 💡 Use Cases * 💬 Opinion mining in Turkish product or service reviews * 🧾 Aspect-level sentiment analysis preprocessing * 📊 Feature-based review summarization in NLP pipelines --- ## 📦 Model Card / Citation ```bibtex @misc{Sengil2025T5AspectTR, title = {Sengil/t5-turkish-aspect-term-extractor: Turkish Aspect Term Extraction with T5}, author = {Şengil, Mert}, year = {2025}, url = {https://huggingface.co/Sengil/t5-turkish-aspect-term-extractor} } ``` --- For contributions, improvements, or issue reporting, feel free to open a GitHub/Hugging Face issue or contact **[Mert Şengil](https://www.linkedin.com/in/mertsengil/)**.