agentlans commited on
Commit
d92c5a7
·
verified ·
1 Parent(s): 4f5289a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -15
README.md CHANGED
@@ -13,29 +13,47 @@ datasets:
13
  - agentlans/en-translations-quality-v3
14
  ---
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
 
19
- # multilingual-e5-small-aligned-v2-text-quality-v3
 
20
 
21
- This model is a fine-tuned version of [agentlans/multilingual-e5-small-aligned-v2](https://huggingface.co/agentlans/multilingual-e5-small-aligned-v2) on an unknown dataset.
22
- It achieves the following results on the evaluation set:
23
- - Loss: 0.0641
24
- - Mse: 0.0641
25
- - Combined Score: 0.0641
26
- - Num Input Tokens Seen: 1109813760
27
 
28
- ## Model description
 
 
 
 
29
 
30
- More information needed
31
 
32
- ## Intended uses & limitations
 
 
33
 
34
- More information needed
 
 
35
 
36
- ## Training and evaluation data
 
 
 
 
 
 
 
37
 
38
- More information needed
 
 
 
 
 
 
 
 
39
 
40
  ## Training procedure
41
 
 
13
  - agentlans/en-translations-quality-v3
14
  ---
15
 
16
+ # Multilingual Text Quality Model
 
17
 
18
+ This model rates the **quality of non-English text** for AI learning.
19
+ Input a text string, and it outputs a numeric quality score reflecting overall informativeness and usefulness.
20
 
21
+ ## Performance
 
 
 
 
 
22
 
23
+ On the evaluation set, it achieved:
24
+ - Loss: 0.0641
25
+ - MSE: 0.0641
26
+ - Combined Score: 0.0641
27
+ - Tokens processed during training: 1,109,813,760
28
 
29
+ ## Usage Example
30
 
31
+ ```python
32
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
33
+ import torch
34
 
35
+ model_name = "agentlans/multilingual-e5-small-quality-v3"
36
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
37
+ model = AutoModelForSequenceClassification.from_pretrained(model_name).to("cuda" if torch.cuda.is_available() else "cpu")
38
 
39
+ # Higher scores indicate higher text quality.
40
+ # The sign of the score has no particular meaning.
41
+ # For example, a negative score doesn't necessarily mean that the text is low quality.
42
+ def quality(text):
43
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(model.device)
44
+ with torch.no_grad():
45
+ score = model(**inputs).logits.squeeze().cpu().item()
46
+ return score
47
 
48
+ print(quality("Your text here."))
49
+ ```
50
+
51
+ ## Limitations
52
+
53
+ - Works best on non-fiction and general-purpose texts.
54
+ - Scores give an overall quality estimate but don’t explain why.
55
+ - Unlike the other `quality-v3` models, this model is only trained on short non-English sentences.
56
+ - Check for biases and suitability before use.
57
 
58
  ## Training procedure
59