cnmoro
/

static-nomic-eng-ptbr-large

Feature Extraction

Model card Files Files and versions

cnmoro commited on Jun 1

Commit

c36b919

·

verified ·

1 Parent(s): bf76ea8

Update README.md

Files changed (1) hide show

README.md +32 -3

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- cnmoro/AllTripletsMsMarco-PTBR
+- Tevatron/msmarco-passage-corpus
+language:
+- en
+- pt
+library_name: model2vec
+base_model:
+- nomic-ai/nomic-embed-text-v2-moe
+pipeline_tag: feature-extraction
+---
+This [Model2Vec](https://github.com/MinishLab/model2vec) model was created by using [Tokenlearn](https://github.com/MinishLab/tokenlearn), with [nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) as a base, trained on around 20M passages (english and portuguese).
+I have yet to run any benchmarks on it, but it easily outperforms [potion-multilingual-128M](https://huggingface.co/minishlab/potion-multilingual-128M) on my custom-portuguese-testing-workload-thing.
+The output dimension is 768.
+## Usage
+Load this model using the `from_pretrained` method:
+```python
+from model2vec import StaticModel
+# Load a pretrained Model2Vec model
+model = StaticModel.from_pretrained("cnmoro/static-nomic-eng-ptbr-large")
+# Compute text embeddings
+embeddings = model.encode(["Example sentence"])
+```