Xenova HF Staff commited on
Commit
117c4c0
·
verified ·
1 Parent(s): cfc4725

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +305 -304
README.md CHANGED
@@ -1,305 +1,306 @@
1
- ---
2
- license: apache-2.0
3
- base_model: microsoft/MiniLM-L6-v2
4
- tags:
5
- - transformers
6
- - sentence-transformers
7
- - sentence-similarity
8
- - feature-extraction
9
- - text-embeddings-inference
10
- - information-retrieval
11
- - knowledge-distillation
12
- language:
13
- - en
14
- ---
15
- <div style="display: flex; justify-content: center;">
16
- <div style="display: flex; align-items: center; gap: 10px;">
17
- <img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">
18
- <span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-mt</span>
19
- </div>
20
- </div>
21
-
22
- # Content
23
-
24
- 1. [Introduction](#introduction)
25
- 2. [Technical Report](#technical-report)
26
- 3. [Highlights](#highlights)
27
- 4. [Benchmarks](#benchmark-comparison)
28
- 5. [Quickstart](#quickstart)
29
- 6. [Citation](#citation)
30
-
31
- # Introduction
32
-
33
- `mdbr-leaf-mt` is a compact high-performance text embedding model designed for classification, clustering, semantic sentence similarity and summarization tasks.
34
-
35
- To enable even greater efficiency, `mdbr-leaf-mt` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).
36
-
37
- If you are looking to perform semantic search / information retrieval (e.g. for RAGs), please check out our [`mdbr-leaf-ir`](https://huggingface.co/MongoDB/mdbr-leaf-ir) model, which is specifically trained for these tasks.
38
-
39
- > [!Note]
40
- > **Note**: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.
41
-
42
- # Technical Report
43
-
44
- A technical report detailing our proposed `LEAF` training procedure is [available here](https://arxiv.org/abs/2509.12539).
45
-
46
- # Highlights
47
-
48
- * **State-of-the-Art Performance**: `mdbr-leaf-mt` achieves new state-of-the-art results for compact embedding models, **ranking #1** on the [public MTEB v2 (Eng) benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤30M parameters.
49
- * **Flexible Architecture Support**: `mdbr-leaf-mt` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
50
- * **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-mt` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl-truncation) for more information.
51
-
52
- ## Benchmark Comparison
53
-
54
- The table below shows the scores for `mdbr-leaf-mt` on the MTEB v2 (English) benchmark, compared to other retrieval models.
55
-
56
- `mdbr-leaf-mt` ranks #1 on this benchmark for models with <30M parameters.
57
-
58
- | Model | Size | MTEB v2 (Eng) |
59
- |------------------------------------|---------|---------------|
60
- | OpenAI text-embedding-3-large | Unknown | 66.43 |
61
- | OpenAI text-embedding-3-small | Unknown | 64.56 |
62
- | **mdbr-leaf-mt** | 23M | **63.97** |
63
- | gte-small | 33M | 63.22 |
64
- | snowflake-arctic-embed-s | 32M | 61.59 |
65
- | e5-small-v2 | 33M | 61.32 |
66
- | granite-embedding-small-english-r2 | 47M | 61.07 |
67
- | all-MiniLM-L6-v2 | 22M | 59.03 |
68
-
69
-
70
- # Quickstart
71
-
72
- ## Sentence Transformers
73
-
74
- ```python
75
- from sentence_transformers import SentenceTransformer
76
-
77
- # Load the model
78
- model = SentenceTransformer("MongoDB/mdbr-leaf-mt")
79
-
80
- # Example queries and documents
81
- queries = [
82
- "What is machine learning?",
83
- "How does neural network training work?"
84
- ]
85
-
86
- documents = [
87
- "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
88
- "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."
89
- ]
90
-
91
- # Encode queries and documents
92
- query_embeddings = model.encode(queries, prompt_name="query")
93
- document_embeddings = model.encode(documents)
94
-
95
- # Compute similarity scores
96
- scores = model.similarity(query_embeddings, document_embeddings)
97
-
98
- # Print results
99
- for i, query in enumerate(queries):
100
- print(f"Query: {query}")
101
- for j, doc in enumerate(documents):
102
- print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")
103
- ```
104
-
105
- <details>
106
-
107
- <summary>See example output</summary>
108
-
109
- ```
110
- Query: What is machine learning?
111
- Similarity: 0.9063 | Document 0: Machine learning is a subset of ...
112
- Similarity: 0.7287 | Document 1: Neural networks are trained ...
113
-
114
- Query: How does neural network training work?
115
- Similarity: 0.6725 | Document 0: Machine learning is a subset of ...
116
- Similarity: 0.8287 | Document 1: Neural networks are trained ...
117
- ```
118
- </details>
119
-
120
- ## Transformers.js
121
-
122
- If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
123
- ```bash
124
- npm i @huggingface/transformers
125
- ```
126
-
127
- You can then use the model to compute embeddings like this:
128
-
129
- ```js
130
- import { AutoModel, AutoTokenizer, matmul } from "@huggingface/transformers";
131
-
132
- // Download from the 🤗 Hub
133
- const model_id = "MongoDB/mdbr-leaf-mt";
134
- const tokenizer = await AutoTokenizer.from_pretrained(model_id);
135
- const model = await AutoModel.from_pretrained(model_id, {
136
- dtype: "fp32", // Options: "fp32" | "q8" | "q4"
137
- });
138
-
139
- // Prepare queries and documents
140
- const queries = [
141
- "What is machine learning?",
142
- "How does neural network training work?",
143
- ];
144
- const documents = [
145
- "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
146
- "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.",
147
- ];
148
- const inputs = await tokenizer([
149
- ...queries.map((x) => "Represent this sentence for searching relevant passages: " + x),
150
- ...documents,
151
- ], { padding: true });
152
-
153
- // Generate embeddings
154
- const { sentence_embedding } = await model(inputs);
155
- const normalized_sentence_embedding = sentence_embedding.normalize();
156
-
157
- // Compute similarities
158
- const scores = await matmul(
159
- normalized_sentence_embedding.slice([0, queries.length]),
160
- normalized_sentence_embedding.slice([queries.length, null]).transpose(1, 0),
161
- );
162
- const scores_list = scores.tolist();
163
-
164
- for (let i = 0; i < queries.length; ++i) {
165
- console.log(`Query: ${queries[i]}`);
166
- for (let j = 0; j < documents.length; ++j) {
167
- console.log(` Similarity: ${scores_list[i][j].toFixed(4)} | Document ${j}: ${documents[j]}`);
168
- }
169
- console.log();
170
- }
171
- ```
172
-
173
- <details>
174
-
175
- <summary>See example output</summary>
176
-
177
- ```
178
- Query: What is machine learning?
179
- Similarity: 0.9063 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
180
- Similarity: 0.7287 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
181
-
182
- Query: How does neural network training work?
183
- Similarity: 0.6725 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
184
- Similarity: 0.8287 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
185
- ```
186
- </details>
187
-
188
-
189
- ## Transformers Usage
190
-
191
- See [here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/transformers_example_mt.ipynb).
192
-
193
- ## Asymmetric Retrieval Setup
194
-
195
- > [!Note]
196
- > **Note**: a version of this asymmetric setup, conveniently packaged into a single model, is [available here](https://huggingface.co/MongoDB/mdbr-leaf-mt-asym).
197
-
198
- `mdbr-leaf-mt` is *aligned* to [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1), the model it has been distilled from, making the asymmetric system below possible:
199
-
200
- ```python
201
- # Use mdbr-leaf-mt for query encoding (real-time, low latency)
202
- query_model = SentenceTransformer("MongoDB/mdbr-leaf-mt")
203
- query_embeddings = query_model.encode(queries, prompt_name="query")
204
-
205
- # Use a larger model for document encoding (one-time, at index time)
206
- doc_model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
207
- document_embeddings = doc_model.encode(documents)
208
-
209
- # Compute similarities
210
- scores = query_model.similarity(query_embeddings, document_embeddings)
211
- ```
212
- Retrieval results from asymmetric mode are usually superior to the [standard mode above](#sentence-transformers).
213
-
214
- ## MRL Truncation
215
-
216
- Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
217
- ```python
218
- query_embeds = model.encode(queries, prompt_name="query", truncate_dim=256)
219
- doc_embeds = model.encode(documents, truncate_dim=256)
220
-
221
- similarities = model.similarity(query_embeds, doc_embeds)
222
-
223
- print('After MRL:')
224
- print(f"* Embeddings dimension: {query_embeds.shape[1]}")
225
- print(f"* Similarities: \n\t{similarities}")
226
- ```
227
-
228
- <details>
229
-
230
- <summary>See example output</summary>
231
-
232
- ```
233
- After MRL:
234
- * Embeddings dimension: 256
235
- * Similarities:
236
- tensor([[0.9164, 0.7219],
237
- [0.6682, 0.8393]], device='cuda:0')
238
- ```
239
- </details>
240
-
241
- ## Vector Quantization
242
- Vector quantization, for example to `int8` or `binary`, can be performed as follows:
243
-
244
- **Note**: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization).
245
- Good initial values are -1.0 and +1.0.
246
- ```python
247
- from sentence_transformers.quantization import quantize_embeddings
248
- import torch
249
-
250
- query_embeds = model.encode(queries, prompt_name="query")
251
- doc_embeds = model.encode(documents)
252
-
253
- # Quantize embeddings to int8 using -1.0 and +1.0
254
- ranges = torch.tensor([[-1.0], [+1.0]]).expand(2, query_embeds.shape[1]).cpu().numpy()
255
- query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
256
- doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)
257
-
258
- # Calculate similarities; cast to int64 to avoid under/overflow
259
- similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
260
-
261
- print('After quantization:')
262
- print(f"* Embeddings type: {query_embeds.dtype}")
263
- print(f"* Similarities: \n{similarities}")
264
- ```
265
-
266
- <details>
267
-
268
- <summary>See example output</summary>
269
-
270
- ```
271
- After quantization:
272
- * Embeddings type: int8
273
- * Similarities:
274
- [[2202032 1422868]
275
- [1421197 1845580]]
276
- ```
277
- </details>
278
-
279
- ## Evaluation
280
-
281
- Please [see here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/evaluate_models.ipynb).
282
-
283
- # Citation
284
-
285
- If you use this model in your work, please cite:
286
-
287
- ```bibtex
288
- @misc{mdbr_leaf,
289
- title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations},
290
- author={Robin Vujanic and Thomas Rueckstiess},
291
- year={2025},
292
- eprint={2509.12539},
293
- archivePrefix={arXiv},
294
- primaryClass={cs.IR},
295
- url={https://arxiv.org/abs/2509.12539},
296
- }
297
- ```
298
-
299
- # License
300
-
301
- This model is released under Apache 2.0 License.
302
-
303
- # Contact
304
-
 
305
  For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML Research team at [email protected].
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: microsoft/MiniLM-L6-v2
4
+ tags:
5
+ - transformers
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - text-embeddings-inference
10
+ - information-retrieval
11
+ - knowledge-distillation
12
+ - transformers.js
13
+ language:
14
+ - en
15
+ ---
16
+ <div style="display: flex; justify-content: center;">
17
+ <div style="display: flex; align-items: center; gap: 10px;">
18
+ <img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">
19
+ <span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-mt</span>
20
+ </div>
21
+ </div>
22
+
23
+ # Content
24
+
25
+ 1. [Introduction](#introduction)
26
+ 2. [Technical Report](#technical-report)
27
+ 3. [Highlights](#highlights)
28
+ 4. [Benchmarks](#benchmark-comparison)
29
+ 5. [Quickstart](#quickstart)
30
+ 6. [Citation](#citation)
31
+
32
+ # Introduction
33
+
34
+ `mdbr-leaf-mt` is a compact high-performance text embedding model designed for classification, clustering, semantic sentence similarity and summarization tasks.
35
+
36
+ To enable even greater efficiency, `mdbr-leaf-mt` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).
37
+
38
+ If you are looking to perform semantic search / information retrieval (e.g. for RAGs), please check out our [`mdbr-leaf-ir`](https://huggingface.co/MongoDB/mdbr-leaf-ir) model, which is specifically trained for these tasks.
39
+
40
+ > [!Note]
41
+ > **Note**: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.
42
+
43
+ # Technical Report
44
+
45
+ A technical report detailing our proposed `LEAF` training procedure is [available here](https://arxiv.org/abs/2509.12539).
46
+
47
+ # Highlights
48
+
49
+ * **State-of-the-Art Performance**: `mdbr-leaf-mt` achieves new state-of-the-art results for compact embedding models, **ranking #1** on the [public MTEB v2 (Eng) benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤30M parameters.
50
+ * **Flexible Architecture Support**: `mdbr-leaf-mt` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
51
+ * **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-mt` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl-truncation) for more information.
52
+
53
+ ## Benchmark Comparison
54
+
55
+ The table below shows the scores for `mdbr-leaf-mt` on the MTEB v2 (English) benchmark, compared to other retrieval models.
56
+
57
+ `mdbr-leaf-mt` ranks #1 on this benchmark for models with <30M parameters.
58
+
59
+ | Model | Size | MTEB v2 (Eng) |
60
+ |------------------------------------|---------|---------------|
61
+ | OpenAI text-embedding-3-large | Unknown | 66.43 |
62
+ | OpenAI text-embedding-3-small | Unknown | 64.56 |
63
+ | **mdbr-leaf-mt** | 23M | **63.97** |
64
+ | gte-small | 33M | 63.22 |
65
+ | snowflake-arctic-embed-s | 32M | 61.59 |
66
+ | e5-small-v2 | 33M | 61.32 |
67
+ | granite-embedding-small-english-r2 | 47M | 61.07 |
68
+ | all-MiniLM-L6-v2 | 22M | 59.03 |
69
+
70
+
71
+ # Quickstart
72
+
73
+ ## Sentence Transformers
74
+
75
+ ```python
76
+ from sentence_transformers import SentenceTransformer
77
+
78
+ # Load the model
79
+ model = SentenceTransformer("MongoDB/mdbr-leaf-mt")
80
+
81
+ # Example queries and documents
82
+ queries = [
83
+ "What is machine learning?",
84
+ "How does neural network training work?"
85
+ ]
86
+
87
+ documents = [
88
+ "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
89
+ "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."
90
+ ]
91
+
92
+ # Encode queries and documents
93
+ query_embeddings = model.encode(queries, prompt_name="query")
94
+ document_embeddings = model.encode(documents)
95
+
96
+ # Compute similarity scores
97
+ scores = model.similarity(query_embeddings, document_embeddings)
98
+
99
+ # Print results
100
+ for i, query in enumerate(queries):
101
+ print(f"Query: {query}")
102
+ for j, doc in enumerate(documents):
103
+ print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")
104
+ ```
105
+
106
+ <details>
107
+
108
+ <summary>See example output</summary>
109
+
110
+ ```
111
+ Query: What is machine learning?
112
+ Similarity: 0.9063 | Document 0: Machine learning is a subset of ...
113
+ Similarity: 0.7287 | Document 1: Neural networks are trained ...
114
+
115
+ Query: How does neural network training work?
116
+ Similarity: 0.6725 | Document 0: Machine learning is a subset of ...
117
+ Similarity: 0.8287 | Document 1: Neural networks are trained ...
118
+ ```
119
+ </details>
120
+
121
+ ## Transformers.js
122
+
123
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
124
+ ```bash
125
+ npm i @huggingface/transformers
126
+ ```
127
+
128
+ You can then use the model to compute embeddings like this:
129
+
130
+ ```js
131
+ import { AutoModel, AutoTokenizer, matmul } from "@huggingface/transformers";
132
+
133
+ // Download from the 🤗 Hub
134
+ const model_id = "MongoDB/mdbr-leaf-mt";
135
+ const tokenizer = await AutoTokenizer.from_pretrained(model_id);
136
+ const model = await AutoModel.from_pretrained(model_id, {
137
+ dtype: "fp32", // Options: "fp32" | "fp16" | "q8" | "q4" | "q4f16"
138
+ });
139
+
140
+ // Prepare queries and documents
141
+ const queries = [
142
+ "What is machine learning?",
143
+ "How does neural network training work?",
144
+ ];
145
+ const documents = [
146
+ "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
147
+ "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.",
148
+ ];
149
+ const inputs = await tokenizer([
150
+ ...queries.map((x) => "Represent this sentence for searching relevant passages: " + x),
151
+ ...documents,
152
+ ], { padding: true });
153
+
154
+ // Generate embeddings
155
+ const { sentence_embedding } = await model(inputs);
156
+ const normalized_sentence_embedding = sentence_embedding.normalize();
157
+
158
+ // Compute similarities
159
+ const scores = await matmul(
160
+ normalized_sentence_embedding.slice([0, queries.length]),
161
+ normalized_sentence_embedding.slice([queries.length, null]).transpose(1, 0),
162
+ );
163
+ const scores_list = scores.tolist();
164
+
165
+ for (let i = 0; i < queries.length; ++i) {
166
+ console.log(`Query: ${queries[i]}`);
167
+ for (let j = 0; j < documents.length; ++j) {
168
+ console.log(` Similarity: ${scores_list[i][j].toFixed(4)} | Document ${j}: ${documents[j]}`);
169
+ }
170
+ console.log();
171
+ }
172
+ ```
173
+
174
+ <details>
175
+
176
+ <summary>See example output</summary>
177
+
178
+ ```
179
+ Query: What is machine learning?
180
+ Similarity: 0.9063 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
181
+ Similarity: 0.7287 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
182
+
183
+ Query: How does neural network training work?
184
+ Similarity: 0.6725 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
185
+ Similarity: 0.8287 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
186
+ ```
187
+ </details>
188
+
189
+
190
+ ## Transformers Usage
191
+
192
+ See [here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/transformers_example_mt.ipynb).
193
+
194
+ ## Asymmetric Retrieval Setup
195
+
196
+ > [!Note]
197
+ > **Note**: a version of this asymmetric setup, conveniently packaged into a single model, is [available here](https://huggingface.co/MongoDB/mdbr-leaf-mt-asym).
198
+
199
+ `mdbr-leaf-mt` is *aligned* to [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1), the model it has been distilled from, making the asymmetric system below possible:
200
+
201
+ ```python
202
+ # Use mdbr-leaf-mt for query encoding (real-time, low latency)
203
+ query_model = SentenceTransformer("MongoDB/mdbr-leaf-mt")
204
+ query_embeddings = query_model.encode(queries, prompt_name="query")
205
+
206
+ # Use a larger model for document encoding (one-time, at index time)
207
+ doc_model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
208
+ document_embeddings = doc_model.encode(documents)
209
+
210
+ # Compute similarities
211
+ scores = query_model.similarity(query_embeddings, document_embeddings)
212
+ ```
213
+ Retrieval results from asymmetric mode are usually superior to the [standard mode above](#sentence-transformers).
214
+
215
+ ## MRL Truncation
216
+
217
+ Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
218
+ ```python
219
+ query_embeds = model.encode(queries, prompt_name="query", truncate_dim=256)
220
+ doc_embeds = model.encode(documents, truncate_dim=256)
221
+
222
+ similarities = model.similarity(query_embeds, doc_embeds)
223
+
224
+ print('After MRL:')
225
+ print(f"* Embeddings dimension: {query_embeds.shape[1]}")
226
+ print(f"* Similarities: \n\t{similarities}")
227
+ ```
228
+
229
+ <details>
230
+
231
+ <summary>See example output</summary>
232
+
233
+ ```
234
+ After MRL:
235
+ * Embeddings dimension: 256
236
+ * Similarities:
237
+ tensor([[0.9164, 0.7219],
238
+ [0.6682, 0.8393]], device='cuda:0')
239
+ ```
240
+ </details>
241
+
242
+ ## Vector Quantization
243
+ Vector quantization, for example to `int8` or `binary`, can be performed as follows:
244
+
245
+ **Note**: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization).
246
+ Good initial values are -1.0 and +1.0.
247
+ ```python
248
+ from sentence_transformers.quantization import quantize_embeddings
249
+ import torch
250
+
251
+ query_embeds = model.encode(queries, prompt_name="query")
252
+ doc_embeds = model.encode(documents)
253
+
254
+ # Quantize embeddings to int8 using -1.0 and +1.0
255
+ ranges = torch.tensor([[-1.0], [+1.0]]).expand(2, query_embeds.shape[1]).cpu().numpy()
256
+ query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
257
+ doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)
258
+
259
+ # Calculate similarities; cast to int64 to avoid under/overflow
260
+ similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
261
+
262
+ print('After quantization:')
263
+ print(f"* Embeddings type: {query_embeds.dtype}")
264
+ print(f"* Similarities: \n{similarities}")
265
+ ```
266
+
267
+ <details>
268
+
269
+ <summary>See example output</summary>
270
+
271
+ ```
272
+ After quantization:
273
+ * Embeddings type: int8
274
+ * Similarities:
275
+ [[2202032 1422868]
276
+ [1421197 1845580]]
277
+ ```
278
+ </details>
279
+
280
+ ## Evaluation
281
+
282
+ Please [see here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/evaluate_models.ipynb).
283
+
284
+ # Citation
285
+
286
+ If you use this model in your work, please cite:
287
+
288
+ ```bibtex
289
+ @misc{mdbr_leaf,
290
+ title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations},
291
+ author={Robin Vujanic and Thomas Rueckstiess},
292
+ year={2025},
293
+ eprint={2509.12539},
294
+ archivePrefix={arXiv},
295
+ primaryClass={cs.IR},
296
+ url={https://arxiv.org/abs/2509.12539},
297
+ }
298
+ ```
299
+
300
+ # License
301
+
302
+ This model is released under Apache 2.0 License.
303
+
304
+ # Contact
305
+
306
  For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML Research team at [email protected].