Feature Extraction
Model2Vec
Safetensors
English
Portuguese
cnmoro commited on
Commit
c36b919
·
verified ·
1 Parent(s): bf76ea8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -3
README.md CHANGED
@@ -1,3 +1,32 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - cnmoro/AllTripletsMsMarco-PTBR
5
+ - Tevatron/msmarco-passage-corpus
6
+ language:
7
+ - en
8
+ - pt
9
+ library_name: model2vec
10
+ base_model:
11
+ - nomic-ai/nomic-embed-text-v2-moe
12
+ pipeline_tag: feature-extraction
13
+ ---
14
+
15
+ This [Model2Vec](https://github.com/MinishLab/model2vec) model was created by using [Tokenlearn](https://github.com/MinishLab/tokenlearn), with [nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) as a base, trained on around 20M passages (english and portuguese).
16
+
17
+ I have yet to run any benchmarks on it, but it easily outperforms [potion-multilingual-128M](https://huggingface.co/minishlab/potion-multilingual-128M) on my custom-portuguese-testing-workload-thing.
18
+
19
+ The output dimension is 768.
20
+
21
+ ## Usage
22
+
23
+ Load this model using the `from_pretrained` method:
24
+ ```python
25
+ from model2vec import StaticModel
26
+
27
+ # Load a pretrained Model2Vec model
28
+ model = StaticModel.from_pretrained("cnmoro/static-nomic-eng-ptbr-large")
29
+
30
+ # Compute text embeddings
31
+ embeddings = model.encode(["Example sentence"])
32
+ ```