Text Classification
Transformers
Safetensors
xlm-roberta
sentiment-analysis
thai
multilingual
fine-tuned
southeast-asian

🎯 MultiSent-E5-Pro: Advanced Thai Sentiment Classifier

MultiSent-E5-Pro Logo

🇹🇭 State-of-the-art Thai sentiment analysis with multilingual capabilities

📋 Quick Overview

MultiSent-E5-Pro is a fine-tuned sentiment analysis model based on intfloat/multilingual-e5-large, specially optimized for Thai with support for multilingual contexts. The model classifies text into four categories: Positive, Negative, Neutral, and Question.

🎯 Key Features

  • Handles Thai-specific expressions, colloquialisms, and sarcasm effectively
  • Performs well on real-world social media & review data
  • Multilingual support for Southeast and East Asian languages

🏆 Benchmark Summary

Rank Model Accuracy F1-Macro Notes
🥇 1st MultiSent-E5-Pro 84.61% 84.61% Best overall
2nd MultiSent-E5 80.62% 80.62% Baseline model
3rd sentiment-103 57.40% 49.87% Moderate baseline

📊 Detailed Metrics (2,183 samples)

Metric Score
Accuracy 84.61%
F1-Macro 84.61%
F1-Weighted 84.75%
Avg Confidence 98.53%
Low Confidence Rate (<60%) 0.96%

Per-Class Performance

Class Precision Recall F1 Notes
Negative 91.0% 84.6% 87.7% Excellent
Positive 83.0% 94.3% 88.3% Excellent
Neutral 71.9% 81.6% 76.4% Moderate
Question 94.4% 79.0% 86.0% Good

🌍 Language Support

Region Languages Performance
Thai Thai 🟢 Excellent
SEA ID, VI, MS 🟡 Good
East Asia ZH, JA, KO 🟠 Moderate
Europe EN, ES, FR 🔴 Low

⚡ Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = "ZombitX64/MultiSent-E5-Pro"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForSequenceClassification.from_pretrained(model)

text = "ผลิตภัณฑ์นี้ดีมาก ใช้งานง่าย"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted = torch.argmax(probs, dim=-1)

labels = ["Question", "Negative", "Neutral", "Positive"]
print(f"Sentiment: {labels[predicted.item()]} (Confidence: {probs[0][predicted].item():.2%})")

🌟 Use Cases

Application Suitability
Product Reviews 🟢 Excellent
Social Media 🟢 Excellent
Customer Support 🟢 Excellent
Content Moderation 🟡 Good
Research Analysis 🟡 Good

⚠ Known Limitations

  • Sarcasm Misclassification (especially in Chinese)
  • Mixed Sentiments lean toward Neutral
  • Low recall for Question class due to limited data
  • Bias toward Positive due to class imbalance
  • Overconfidence in some multilingual predictions

🛠 Technical Info

Config Value
Base Model multilingual-e5-large
Params ~1.02B
Classes 4
Max Length 512
Training Time ~27 min

Data Summary:

  • Training: 2,456 samples
  • Validation: 273 samples
  • Evaluation: 2,183 samples

📄 Citation

@misc{MultiSent-E5-Pro-2024,
  title={MultiSent-E5-Pro: Advanced Thai Sentiment Analysis},
  author={ZombitX64, Janutsaha K., Saengwichain C.},
  year={2024},
  url={https://huggingface.co/ZombitX64/MultiSent-E5-Pro},
  note={Hugging Face Model Card}
}
@article{wang2024multilingual,
  title={Multilingual E5 Text Embeddings: A Technical Report},
  author={Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu},
  journal={arXiv preprint arXiv:2402.05672},
  year={2024}
}

👨‍💼 Authors

Role Name
Lead Dev ZombitX64
Data Scientist Krittanut Janutsaha
Engineer Chanyut Saengwichain

😊 Feedback & Contributions


Last Updated: Dec 2024 | Version: 1.1 | Docs: v2.0
Downloads last month
68
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZombitX64/MultiSent-E5-Pro

Finetuned
(110)
this model

Datasets used to train ZombitX64/MultiSent-E5-Pro

Space using ZombitX64/MultiSent-E5-Pro 1

Collection including ZombitX64/MultiSent-E5-Pro