Add Sentence Transformers snippet to README (#2)
Browse files- Add Sentence Transformers snippet to README (03ce83e345bce72437e8ffa8935594a14a93fad9)
README.md
CHANGED
|
@@ -3223,7 +3223,7 @@ Jina Embeddings V2 [technical report](https://arxiv.org/abs/2310.19923)
|
|
| 3223 |
|
| 3224 |
### Why mean pooling?
|
| 3225 |
|
| 3226 |
-
`mean
|
| 3227 |
It has been proved to be the most effective way to produce high-quality sentence embeddings.
|
| 3228 |
We offer an `encode` function to deal with this.
|
| 3229 |
|
|
@@ -3256,7 +3256,7 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
|
|
| 3256 |
</p>
|
| 3257 |
</details>
|
| 3258 |
|
| 3259 |
-
You can use Jina Embedding models directly from transformers package:
|
| 3260 |
```python
|
| 3261 |
!pip install transformers
|
| 3262 |
from transformers import AutoModel
|
|
@@ -3277,7 +3277,22 @@ embeddings = model.encode(
|
|
| 3277 |
)
|
| 3278 |
```
|
| 3279 |
|
| 3280 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3281 |
|
| 3282 |
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
| 3283 |
2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
|
|
|
|
| 3223 |
|
| 3224 |
### Why mean pooling?
|
| 3225 |
|
| 3226 |
+
`mean pooling` takes all token embeddings from model output and averaging them at sentence/paragraph level.
|
| 3227 |
It has been proved to be the most effective way to produce high-quality sentence embeddings.
|
| 3228 |
We offer an `encode` function to deal with this.
|
| 3229 |
|
|
|
|
| 3256 |
</p>
|
| 3257 |
</details>
|
| 3258 |
|
| 3259 |
+
You can use Jina Embedding models directly from the `transformers` package:
|
| 3260 |
```python
|
| 3261 |
!pip install transformers
|
| 3262 |
from transformers import AutoModel
|
|
|
|
| 3277 |
)
|
| 3278 |
```
|
| 3279 |
|
| 3280 |
+
Or you can use the model with the `sentence-transformers` package:
|
| 3281 |
+
```python
|
| 3282 |
+
from sentence_transformers import SentenceTransformer, util
|
| 3283 |
+
|
| 3284 |
+
model = SentenceTransformer("jinaai/jina-embeddings-v2-base-es", trust_remote_code=True)
|
| 3285 |
+
embeddings = model.encode(['How is the weather today?', '¿Qué tiempo hace hoy?'])
|
| 3286 |
+
print(util.cos_sim(embeddings[0], embeddings[1]))
|
| 3287 |
+
```
|
| 3288 |
+
|
| 3289 |
+
And if you only want to handle shorter sequence, such as 2k, then you can set the `model.max_seq_length`
|
| 3290 |
+
|
| 3291 |
+
```python
|
| 3292 |
+
model.max_seq_length = 2048
|
| 3293 |
+
```
|
| 3294 |
+
|
| 3295 |
+
## Alternatives to Transformers and Sentence Transformers
|
| 3296 |
|
| 3297 |
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
| 3298 |
2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
|