agentlans
/

multilingual-e5-small-quality-v3

Text Classification

Generated from Trainer

Model card Files Files and versions

agentlans commited on Jul 20

Commit

d92c5a7

·

verified ·

1 Parent(s): 4f5289a

Update README.md

Files changed (1) hide show

README.md +33 -15

README.md CHANGED Viewed

@@ -13,29 +13,47 @@ datasets:
 - agentlans/en-translations-quality-v3
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# multilingual-e5-small-aligned-v2-text-quality-v3
-This model is a fine-tuned version of [agentlans/multilingual-e5-small-aligned-v2](https://huggingface.co/agentlans/multilingual-e5-small-aligned-v2) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0641
-- Mse: 0.0641
-- Combined Score: 0.0641
-- Num Input Tokens Seen: 1109813760
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure

 - agentlans/en-translations-quality-v3
 ---
+# Multilingual Text Quality Model
+This model rates the **quality of non-English text** for AI learning.
+Input a text string, and it outputs a numeric quality score reflecting overall informativeness and usefulness.
+## Performance
+On the evaluation set, it achieved:
+- Loss: 0.0641
+- MSE: 0.0641
+- Combined Score: 0.0641
+- Tokens processed during training: 1,109,813,760
+## Usage Example
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_name = "agentlans/multilingual-e5-small-quality-v3"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name).to("cuda" if torch.cuda.is_available() else "cpu")
+# Higher scores indicate higher text quality.
+# The sign of the score has no particular meaning.
+# For example, a negative score doesn't necessarily mean that the text is low quality.
+def quality(text):
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(model.device)
+    with torch.no_grad():
+        score = model(**inputs).logits.squeeze().cpu().item()
+    return score
+print(quality("Your text here."))
+```
+## Limitations
+- Works best on non-fiction and general-purpose texts.
+- Scores give an overall quality estimate but don’t explain why.
+- Unlike the other `quality-v3` models, this model is only trained on short non-English sentences.
+- Check for biases and suitability before use.
 ## Training procedure