amentaphd commited on
Commit
190f9ed
·
verified ·
1 Parent(s): 0200b44

Upload fine-tuned EU regulation embeddings model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,849 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:46338
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-m-v2.0
11
+ widget:
12
+ - source_sentence: What are the anticipated financial effects that could arise from
13
+ material risks associated with resource use and circular economy, and how might
14
+ these risks impact the financial position, performance, and cash flows of an undertaking
15
+ over different time frames?
16
+ sentences:
17
+ - '(a)
18
+
19
+
20
+ anticipated financial effects due to material risks arising from material resource
21
+ use and circular economy -related impacts and dependencies and how these risks
22
+ have or could reasonably be expected to have) a material influence on the undertaking’s
23
+ financial position, financial performance performance, and cash flows over the
24
+ short-, medium- and long-term; and
25
+
26
+
27
+ (b)
28
+
29
+
30
+ anticipated financial effects due to material opportunities related to resource
31
+ use and circular economy.
32
+
33
+
34
+ The disclosure shall include:
35
+
36
+
37
+ (a)'
38
+ - combination of hydrocarbons obtained as a raffinate from a sulphuric acid treating
39
+ process. It consists of hydrocarbons having carbon numbers predominantly in the
40
+ range of C7 through C12 and boiling in the range of approximately 90 °C to 230
41
+ °C.) 649-351-00-7 265-115-2 64742-15-0 P Naphtha (petroleum), chemically neutralised
42
+ heavy; Low boiling point naphtha — unspecified (A complex combination of hydrocarbons
43
+ produced by a treating process to remove acidic materials. It consists of hydrocarbons
44
+ having carbon numbers predominantly in the range of C6 through C12 and boiling
45
+ in the range of approximately 65 °C to 230 °C.) 649-352-00-2 265-122-0 64742-22-9
46
+ P Naphtha (petroleum), chemically neutralised light; Low boiling point naphtha
47
+
48
+ - '2. Member States shall require any investment firm wishing to establish a branch
49
+ within the territory of another Member State or to use tied agents established
50
+ in another Member State in which it has not established a branch, first to notify
51
+ the competent authority of its home Member State and to provide it with the following
52
+ information:
53
+
54
+
55
+ (a) the Member States within the territory of which it plans to establish a branch
56
+ or the Member States in which it has not established a branch but plans to use
57
+ tied agents established there;
58
+
59
+
60
+ (b) a programme of operations setting out, inter alia, the investment services
61
+ and/or activities as well as the ancillary services to be offered;
62
+
63
+
64
+ (c) where established, the organisational structure of the branch and indicating
65
+ whether the branch intends to use tied agents and the identity of those tied agents;
66
+
67
+
68
+ (d) where tied agents are to be used in a Member State in which an investment
69
+ firm has not established a branch, a description of the intended use of the tied
70
+ agent(s) and an organisational structure, including reporting lines, indicating
71
+ how the agent(s) fit into the corporate structure of the investment firm;
72
+
73
+
74
+ (e) the address in the host Member State from which documents may be obtained;
75
+
76
+
77
+ (f) the names of those responsible for the management of the branch or of the
78
+ tied agent.
79
+
80
+
81
+ Where an investment firm uses a tied agent established in a Member State outside
82
+ its home Member State, such tied agent shall be assimilated to the branch, where
83
+ one is established, and shall in any event be subject to the provisions of this
84
+ Directive relating to branches.'
85
+ - source_sentence: What steps must the single point of contact take if the project
86
+ promoter submits an incomplete application for a Strategic Project, and how does
87
+ this affect the permit-granting process timeline?
88
+ sentences:
89
+ - '(1)
90
+
91
+
92
+ ‘cooling’ means the extraction of heat from an enclosed or indoor space (comfort
93
+ application) or from a process in order to reduce the space or process temperature
94
+ to, or maintain it at, a specified temperature (set point); for cooling systems,
95
+ the extracted heat is rejected into and absorbed by the ambient air, ambient water
96
+ or the ground, where the environment (air, ground, and water) provides a sink
97
+ for the heat extracted and thus functions as a cold source;
98
+
99
+
100
+ (2)'
101
+ - '1. Suppliers shall provide the manufacturer with all the information and documentation
102
+ necessary for the manufacturer to demonstrate the conformity of the packaging
103
+ and the packaging materials with this Regulation, including the technical documentation
104
+ referred to in Annex VII and required under or pursuant to Articles 5 to 11, in
105
+ one or more languages which can be easily understood by the manufacturer. That
106
+ information and documentation shall be provided in either paper or electronic
107
+ form.
108
+
109
+
110
+ 2. Where appropriate, the documentation and information required under Union legal
111
+ acts applicable to contact-sensitive packaging shall be part of the information
112
+ and documentation to be provided to the manufacturer pursuant to paragraph 1.'
113
+ - '6.
114
+
115
+
116
+ No later than 45 days following the receipt of a permit-granting application related
117
+ to a Strategic Project, the single point of contact concerned shall acknowledge
118
+ that the application is complete or, if the project promoter has not sent all
119
+ the information required to process an application, request the project promoter
120
+ to submit a complete application without undue delay, specifying which information
121
+ is missing. Where the application submitted is deemed to be incomplete a second
122
+ time, the single point of contact concerned shall not request information in areas
123
+ not covered in the first request for additional information and shall be entitled
124
+ only to request further evidence to complete the identified missing information.
125
+
126
+
127
+ The date of the acknowledgement referred to in the first subparagraph shall serve
128
+ as the start of the permit-granting process.
129
+
130
+
131
+ 7.
132
+
133
+
134
+ No later than one month from the date of acknowledgement referred to in paragraph
135
+ 6 of this Article, the single point of contact concerned shall draw up, in close
136
+ cooperation with the project promoter and other competent authorities concerned,
137
+ a detailed schedule for the permit-granting process. The schedule shall be published
138
+ by the project promoter on the website referred to in Article 8(5). The single
139
+ point of contact concerned shall update the schedule in the event that there are
140
+ significant changes that potentially affect the timing of the comprehensive decision.
141
+
142
+
143
+ 8.
144
+
145
+
146
+ The single point of contact concerned shall notify the project promoter when the
147
+ environmental impact assessment report referred in Article 5(1) of Directive 2011/92/EU
148
+ is due, taking into account the organisation of the permit-granting process in
149
+ the Member State concerned and the need to allow sufficient time to assess the
150
+ report. The period between the deadline for the submission of the environmental
151
+ impact assessment report and the actual submission of that report shall not be
152
+ counted towards the duration of the permit-granting process referred to in paragraphs
153
+ 1 and 2 of this Article.
154
+
155
+
156
+ 9.'
157
+ - source_sentence: What are the requirements for energy audits to be considered compliant
158
+ with the specified paragraph, and what role do voluntary agreements play in this
159
+ process?
160
+ sentences:
161
+ - '8. Member States shall develop programmes to encourage enterprises that are not
162
+ SMEs and that are not subject to paragraph 1 or 2 to undergo energy audits and
163
+ to subsequently implement the recommendations arising from those audits.
164
+
165
+
166
+ 9. Energy audits shall be considered to comply with paragraph 2 where they are:
167
+
168
+
169
+ (a) carried out in an independent manner, on the basis of the minimum criteria
170
+ set out in Annex VI; (b) implemented under voluntary agreements concluded between
171
+ organisations of stakeholders and a body appointed and supervised by the Member
172
+ State concerned, by another body to which the competent authorities have delegated
173
+ the responsibility concerned or by the Commission. --- ---'
174
+ - '3.1.1. The evaluation of all available information shall comprise:
175
+
176
+
177
+ the hazard identification based on all available information,
178
+
179
+
180
+ the establishment of the quantitative dose (concentration)-response (effect) relationship.
181
+
182
+
183
+ 3.1.2. When it is not possible to establish the quantitative dose (concentration)-response
184
+ (effect) relationship, then this should be justified and a semi-quantitative or
185
+ qualitative analysis shall be included.
186
+
187
+
188
+ 3.1.3. All information used to assess the effects on a specific environmental
189
+ sphere shall be briefly presented, if possible in the form of a table or tables.
190
+ The relevant test results (e.g. LC50 or NOEC) and test conditions (e.g. test duration,
191
+ route of administration) and other relevant information shall be presented, in
192
+ internationally recognised units of measurement for that effect.
193
+
194
+
195
+ 3.1.4. All information used to assess the environmental fate of the substance
196
+ shall be briefly presented, if possible in the form of a table or tables. The
197
+ relevant test results and test conditions and other relevant information shall
198
+ be presented, in internationally recognised units of measurement for that effect.
199
+
200
+
201
+ 3.1.5. If one study is available then a robust study summary should be prepared
202
+ for that study. Where there is more than one study addressing the same effect,
203
+ then the study or studies giving rise to the highest concern shall be used to
204
+ draw a conclusion and a robust study summary shall be prepared for that study
205
+ or studies and included as part of the technical dossier. Robust summaries will
206
+ be required of all key data used in the hazard assessment. If the study or studies
207
+ giving rise to the highest concern are not used, then this shall be fully justified
208
+ and included as part of the technical dossier, not only for the study being used
209
+ but also for all studies reaching a higher concern than the study being used.
210
+ For substances where all available studies indicate no hazards an overall assessment
211
+ of the validity of all studies should be performed.
212
+
213
+
214
+ 3.2. Step 2 : Classification and Labelling
215
+
216
+
217
+ ▼M51'
218
+ - impact of single-use packaging, in particular plastic carrier bags; --- --- (f)
219
+ the composting properties and appropriate waste management options for compostable
220
+ packaging in accordance with Article 9(2) of this Regulation; consumers shall
221
+ be informed that compostable packaging is not suitable for home composting and
222
+ that compostable packaging is not to be discarded in nature. --- ---
223
+ - source_sentence: In what scenario should information on toxic effects be listed
224
+ only once for a mixture?
225
+ sentences:
226
+ - 'In determining the energy savings from taxation-related policy measures introduced
227
+ under Article 10, the following principles shall apply: (a) credit shall be given
228
+ only for energy savings from taxation measures exceeding the minimum levels of
229
+ taxation applicable to fuels as required in Council Directive 2003/96/EC (2) or
230
+ 2006/112/EC (3); (b) short-run price elasticities for the calculation of the impact
231
+ of the energy taxation measures shall represent the responsiveness of energy demand
232
+ to price changes, and shall be estimated on the basis of recent and representative
233
+ official data sources, which are applicable for the Member State, and, where applicable,
234
+ on the basis of accompanying studies from an independent institute. If a different'
235
+ - 'Article 13
236
+
237
+
238
+ Project development assistance
239
+
240
+
241
+ 1.
242
+
243
+
244
+ The Commission shall, after consulting the Member States in accordance with Article
245
+ 21(2), point (c), determine the maximum amount of Innovation Fund support available
246
+ for project development assistance.
247
+
248
+
249
+ 2.
250
+
251
+
252
+ The Commission may award project development assistance in the form of technical
253
+ assistance to any project that falls within the scope of the Innovation Fund,
254
+ as set out in Article 10a(8), first and sixth subparagraphs of Directive 2003/87/EC.
255
+
256
+
257
+ 3.
258
+
259
+
260
+ The following activities may be funded by way of project development assistance:
261
+
262
+
263
+ (a)
264
+
265
+
266
+ improvement and development of project documentation or of components of the project
267
+ design with a view to ensuring the sufficient maturity of the project;
268
+
269
+
270
+ (b)
271
+
272
+
273
+ assessment of the feasibility of the project, including technical and economic
274
+ studies;
275
+
276
+
277
+ (c)
278
+
279
+
280
+ advice on the financial and legal structure of the project;
281
+
282
+
283
+ (d)
284
+
285
+
286
+ capacity building of the project proponent.
287
+
288
+
289
+ 4.
290
+
291
+
292
+ If project development assistance is implemented under indirect management, the
293
+ implementing entity shall carry out the selection procedure and take the decision
294
+ to award the project development assistance after having consulted the Commission.
295
+ The award criteria shall take into account the degree of innovation compared to
296
+ the state of the art, the potential to significantly reduce climate impacts and
297
+ to support widespread application, the maturity as well as the geographical and
298
+ sectoral balance in relation to the portfolio of funded projects.'
299
+ - 'effects of the mixture. The information on toxic effects shall be presented for
300
+ each substance, except for the following cases: (a) if the information is duplicated,
301
+ it shall be listed only once for the mixture overall, such as when two substances
302
+ both cause vomiting and diarrhoea; (b) if it is unlikely that these effects will
303
+ occur at the concentrations present, such as when a mild irritant is diluted to
304
+ below a certain concentration in a non-irritant solution; (c) where information
305
+ on interactions between substances in a mixture is not available, assumptions
306
+ shall not be made and instead the health effects of each substance shall be listed
307
+ separately. --- ---'
308
+ - source_sentence: How does the text suggest addressing the social aspects related
309
+ to low- and middle-income transport users in the context of zero-emission vehicle
310
+ initiatives?
311
+ sentences:
312
+ - '(b)
313
+
314
+
315
+ measures intended to accelerate the uptake of zero-emission vehicles or to provide
316
+ financial support for the deployment of fully interoperable refuelling and recharging
317
+ infrastructure for zero-emission vehicles, or measures to encourage a shift to
318
+ public transport and improve multimodality, or to provide financial support in
319
+ order to address social aspects concerning low- and middle-income transport users;
320
+
321
+
322
+ (c)
323
+
324
+
325
+ to finance their Social Climate Plan in accordance with Article 15 of Regulation
326
+ (EU) 2023/955;
327
+
328
+
329
+ (d)'
330
+ - If the planned change is implemented notwithstanding the first and second subparagraphs,
331
+ or if an unplanned change has taken place pursuant to which the AIFM’s management
332
+ of the AIF no longer complies with this Directive or the AIFM otherwise no longer
333
+ complies with this Directive, the competent authorities of the Member State of
334
+ reference of the AIFM shall take all due measures in accordance with Article 46,
335
+ including, if necessary, the express prohibition of marketing of the AIF.
336
+ - '(d)
337
+
338
+
339
+ for gas discharge lamps, 80 % shall be recycled.
340
+
341
+
342
+ Part 2: Minimum targets applicable by category from 15 August 2015 until 14 August
343
+ 2018 with reference to the categories listed in Annex I:
344
+
345
+
346
+ (a)
347
+
348
+
349
+ for WEEE falling within category 1 or 10 of Annex I,
350
+
351
+
352
+ 85 % shall be recovered, and
353
+
354
+
355
+ 80 % shall be prepared for re-use and recycled;
356
+
357
+
358
+ (b)
359
+
360
+
361
+ for WEEE falling within category 3 or 4 of Annex I,
362
+
363
+
364
+ 80 % shall be recovered, and
365
+
366
+
367
+ 70 % shall be prepared for re-use and recycled;
368
+
369
+
370
+ (c)
371
+
372
+
373
+ for WEEE falling within category 2, 5, 6, 7, 8 or 9 of Annex I,
374
+
375
+
376
+ 75 % shall be recovered, and
377
+
378
+
379
+ 55 % shall be prepared for re-use and recycled;
380
+
381
+
382
+ (d)
383
+
384
+
385
+ for gas discharge lamps, 80 % shall be recycled.'
386
+ pipeline_tag: sentence-similarity
387
+ library_name: sentence-transformers
388
+ metrics:
389
+ - cosine_accuracy@1
390
+ - cosine_accuracy@3
391
+ - cosine_accuracy@5
392
+ - cosine_accuracy@10
393
+ - cosine_precision@1
394
+ - cosine_precision@3
395
+ - cosine_precision@5
396
+ - cosine_precision@10
397
+ - cosine_recall@1
398
+ - cosine_recall@3
399
+ - cosine_recall@5
400
+ - cosine_recall@10
401
+ - cosine_ndcg@10
402
+ - cosine_mrr@10
403
+ - cosine_map@100
404
+ model-index:
405
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0
406
+ results:
407
+ - task:
408
+ type: information-retrieval
409
+ name: Information Retrieval
410
+ dataset:
411
+ name: Unknown
412
+ type: unknown
413
+ metrics:
414
+ - type: cosine_accuracy@1
415
+ value: 0.7058518902123252
416
+ name: Cosine Accuracy@1
417
+ - type: cosine_accuracy@3
418
+ value: 0.9067840497151735
419
+ name: Cosine Accuracy@3
420
+ - type: cosine_accuracy@5
421
+ value: 0.9447609183497324
422
+ name: Cosine Accuracy@5
423
+ - type: cosine_accuracy@10
424
+ value: 0.9730709476954945
425
+ name: Cosine Accuracy@10
426
+ - type: cosine_precision@1
427
+ value: 0.7058518902123252
428
+ name: Cosine Precision@1
429
+ - type: cosine_precision@3
430
+ value: 0.3022613499050578
431
+ name: Cosine Precision@3
432
+ - type: cosine_precision@5
433
+ value: 0.18895218366994648
434
+ name: Cosine Precision@5
435
+ - type: cosine_precision@10
436
+ value: 0.09730709476954946
437
+ name: Cosine Precision@10
438
+ - type: cosine_recall@1
439
+ value: 0.7058518902123252
440
+ name: Cosine Recall@1
441
+ - type: cosine_recall@3
442
+ value: 0.9067840497151735
443
+ name: Cosine Recall@3
444
+ - type: cosine_recall@5
445
+ value: 0.9447609183497324
446
+ name: Cosine Recall@5
447
+ - type: cosine_recall@10
448
+ value: 0.9730709476954945
449
+ name: Cosine Recall@10
450
+ - type: cosine_ndcg@10
451
+ value: 0.851314896054128
452
+ name: Cosine Ndcg@10
453
+ - type: cosine_mrr@10
454
+ value: 0.8109469830857718
455
+ name: Cosine Mrr@10
456
+ - type: cosine_map@100
457
+ value: 0.8122768308333804
458
+ name: Cosine Map@100
459
+ ---
460
+
461
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0
462
+
463
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
464
+
465
+ ## Model Details
466
+
467
+ ### Model Description
468
+ - **Model Type:** Sentence Transformer
469
+ - **Base model:** [Snowflake/snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0) <!-- at revision 95c2741480856aa9666782eb4afe11959938017f -->
470
+ - **Maximum Sequence Length:** 8192 tokens
471
+ - **Output Dimensionality:** 768 dimensions
472
+ - **Similarity Function:** Cosine Similarity
473
+ <!-- - **Training Dataset:** Unknown -->
474
+ <!-- - **Language:** Unknown -->
475
+ <!-- - **License:** Unknown -->
476
+
477
+ ### Model Sources
478
+
479
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
480
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
481
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
482
+
483
+ ### Full Model Architecture
484
+
485
+ ```
486
+ SentenceTransformer(
487
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: GteModel
488
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
489
+ (2): Normalize()
490
+ )
491
+ ```
492
+
493
+ ## Usage
494
+
495
+ ### Direct Usage (Sentence Transformers)
496
+
497
+ First install the Sentence Transformers library:
498
+
499
+ ```bash
500
+ pip install -U sentence-transformers
501
+ ```
502
+
503
+ Then you can load this model and run inference.
504
+ ```python
505
+ from sentence_transformers import SentenceTransformer
506
+
507
+ # Download from the 🤗 Hub
508
+ model = SentenceTransformer("sentence_transformers_model_id")
509
+ # Run inference
510
+ sentences = [
511
+ 'How does the text suggest addressing the social aspects related to low- and middle-income transport users in the context of zero-emission vehicle initiatives?',
512
+ '(b)\n\nmeasures intended to accelerate the uptake of zero-emission vehicles or to provide financial support for the deployment of fully interoperable refuelling and recharging infrastructure for zero-emission vehicles, or measures to encourage a shift to public transport and improve multimodality, or to provide financial support in order to address social aspects concerning low- and middle-income transport users;\n\n(c)\n\nto finance their Social Climate Plan in accordance with Article 15 of Regulation (EU) 2023/955;\n\n(d)',
513
+ 'If the planned change is implemented notwithstanding the first and second subparagraphs, or if an unplanned change has taken place pursuant to which the AIFM’s management of the AIF no longer complies with this Directive or the AIFM otherwise no longer complies with this Directive, the competent authorities of the Member State of reference of the AIFM shall take all due measures in accordance with Article 46, including, if necessary, the express prohibition of marketing of the AIF.',
514
+ ]
515
+ embeddings = model.encode(sentences)
516
+ print(embeddings.shape)
517
+ # [3, 768]
518
+
519
+ # Get the similarity scores for the embeddings
520
+ similarities = model.similarity(embeddings, embeddings)
521
+ print(similarities.shape)
522
+ # [3, 3]
523
+ ```
524
+
525
+ <!--
526
+ ### Direct Usage (Transformers)
527
+
528
+ <details><summary>Click to see the direct usage in Transformers</summary>
529
+
530
+ </details>
531
+ -->
532
+
533
+ <!--
534
+ ### Downstream Usage (Sentence Transformers)
535
+
536
+ You can finetune this model on your own dataset.
537
+
538
+ <details><summary>Click to expand</summary>
539
+
540
+ </details>
541
+ -->
542
+
543
+ <!--
544
+ ### Out-of-Scope Use
545
+
546
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
547
+ -->
548
+
549
+ ## Evaluation
550
+
551
+ ### Metrics
552
+
553
+ #### Information Retrieval
554
+
555
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
556
+
557
+ | Metric | Value |
558
+ |:--------------------|:-----------|
559
+ | cosine_accuracy@1 | 0.7059 |
560
+ | cosine_accuracy@3 | 0.9068 |
561
+ | cosine_accuracy@5 | 0.9448 |
562
+ | cosine_accuracy@10 | 0.9731 |
563
+ | cosine_precision@1 | 0.7059 |
564
+ | cosine_precision@3 | 0.3023 |
565
+ | cosine_precision@5 | 0.189 |
566
+ | cosine_precision@10 | 0.0973 |
567
+ | cosine_recall@1 | 0.7059 |
568
+ | cosine_recall@3 | 0.9068 |
569
+ | cosine_recall@5 | 0.9448 |
570
+ | cosine_recall@10 | 0.9731 |
571
+ | **cosine_ndcg@10** | **0.8513** |
572
+ | cosine_mrr@10 | 0.8109 |
573
+ | cosine_map@100 | 0.8123 |
574
+
575
+ <!--
576
+ ## Bias, Risks and Limitations
577
+
578
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
579
+ -->
580
+
581
+ <!--
582
+ ### Recommendations
583
+
584
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
585
+ -->
586
+
587
+ ## Training Details
588
+
589
+ ### Training Dataset
590
+
591
+ #### Unnamed Dataset
592
+
593
+ * Size: 46,338 training samples
594
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
595
+ * Approximate statistics based on the first 1000 samples:
596
+ | | sentence_0 | sentence_1 |
597
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
598
+ | type | string | string |
599
+ | details | <ul><li>min: 9 tokens</li><li>mean: 39.98 tokens</li><li>max: 286 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 248.72 tokens</li><li>max: 1315 tokens</li></ul> |
600
+ * Samples:
601
+ | sentence_0 | sentence_1 |
602
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
603
+ | <code>What is the maximum allowable reduction in excise duty for mixtures used as motor fuels containing biodiesel in Italy until 30 June 2004?</code> | <code>for waste oils which are reused as fuel, either directly after recovery or following a recycling process for waste oils, and where the reuse is subject to duty.<br><br>8. ITALY:<br><br>for differentiated rates of excise duty on mixtures used as motor fuels containing 5 % or 25 % of biodiesel until 30 June 2004. The reduction in excise duty may not be greater than the amount of excise duty payable on the volume of biofuels present in the products eligible for the reduction. The reduction in excise duty shall be adjusted to take account of changes in the price of raw materials to avoid overcompensating for the extra costs involved in the manufacture of biofuels;</code> |
604
+ | <code>What are the minimum indicative share percentages for the years 2023 to 2030, and how do these percentages relate to the interconnectivity levels of the Member States?</code> | <code>Such indicative shares may, in each year, amount to at least 5 % from 2023 to 2026 and at least 10 % from 2027 to 2030, or, where lower, to the level of interconnectivity of the Member State concerned in any given year.<br><br>In order to acquire further implementation experience, Member States may organise one or more pilot schemes where support is open to producers located in other Member States.<br><br>2.</code> |
605
+ | <code>What is the significance of the one-month period mentioned in the context?</code> | <code>one month after its notification, in accordance with the arrangements provided for in Article 23.</code> |
606
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
607
+ ```json
608
+ {
609
+ "loss": "MultipleNegativesRankingLoss",
610
+ "matryoshka_dims": [
611
+ 768,
612
+ 512,
613
+ 256,
614
+ 128,
615
+ 64
616
+ ],
617
+ "matryoshka_weights": [
618
+ 1,
619
+ 1,
620
+ 1,
621
+ 1,
622
+ 1
623
+ ],
624
+ "n_dims_per_step": -1
625
+ }
626
+ ```
627
+
628
+ ### Training Hyperparameters
629
+ #### Non-Default Hyperparameters
630
+
631
+ - `eval_strategy`: steps
632
+ - `num_train_epochs`: 4
633
+ - `fp16`: True
634
+ - `multi_dataset_batch_sampler`: round_robin
635
+
636
+ #### All Hyperparameters
637
+ <details><summary>Click to expand</summary>
638
+
639
+ - `overwrite_output_dir`: False
640
+ - `do_predict`: False
641
+ - `eval_strategy`: steps
642
+ - `prediction_loss_only`: True
643
+ - `per_device_train_batch_size`: 8
644
+ - `per_device_eval_batch_size`: 8
645
+ - `per_gpu_train_batch_size`: None
646
+ - `per_gpu_eval_batch_size`: None
647
+ - `gradient_accumulation_steps`: 1
648
+ - `eval_accumulation_steps`: None
649
+ - `torch_empty_cache_steps`: None
650
+ - `learning_rate`: 5e-05
651
+ - `weight_decay`: 0.0
652
+ - `adam_beta1`: 0.9
653
+ - `adam_beta2`: 0.999
654
+ - `adam_epsilon`: 1e-08
655
+ - `max_grad_norm`: 1
656
+ - `num_train_epochs`: 4
657
+ - `max_steps`: -1
658
+ - `lr_scheduler_type`: linear
659
+ - `lr_scheduler_kwargs`: {}
660
+ - `warmup_ratio`: 0.0
661
+ - `warmup_steps`: 0
662
+ - `log_level`: passive
663
+ - `log_level_replica`: warning
664
+ - `log_on_each_node`: True
665
+ - `logging_nan_inf_filter`: True
666
+ - `save_safetensors`: True
667
+ - `save_on_each_node`: False
668
+ - `save_only_model`: False
669
+ - `restore_callback_states_from_checkpoint`: False
670
+ - `no_cuda`: False
671
+ - `use_cpu`: False
672
+ - `use_mps_device`: False
673
+ - `seed`: 42
674
+ - `data_seed`: None
675
+ - `jit_mode_eval`: False
676
+ - `use_ipex`: False
677
+ - `bf16`: False
678
+ - `fp16`: True
679
+ - `fp16_opt_level`: O1
680
+ - `half_precision_backend`: auto
681
+ - `bf16_full_eval`: False
682
+ - `fp16_full_eval`: False
683
+ - `tf32`: None
684
+ - `local_rank`: 0
685
+ - `ddp_backend`: None
686
+ - `tpu_num_cores`: None
687
+ - `tpu_metrics_debug`: False
688
+ - `debug`: []
689
+ - `dataloader_drop_last`: False
690
+ - `dataloader_num_workers`: 0
691
+ - `dataloader_prefetch_factor`: None
692
+ - `past_index`: -1
693
+ - `disable_tqdm`: False
694
+ - `remove_unused_columns`: True
695
+ - `label_names`: None
696
+ - `load_best_model_at_end`: False
697
+ - `ignore_data_skip`: False
698
+ - `fsdp`: []
699
+ - `fsdp_min_num_params`: 0
700
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
701
+ - `fsdp_transformer_layer_cls_to_wrap`: None
702
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
703
+ - `deepspeed`: None
704
+ - `label_smoothing_factor`: 0.0
705
+ - `optim`: adamw_torch
706
+ - `optim_args`: None
707
+ - `adafactor`: False
708
+ - `group_by_length`: False
709
+ - `length_column_name`: length
710
+ - `ddp_find_unused_parameters`: None
711
+ - `ddp_bucket_cap_mb`: None
712
+ - `ddp_broadcast_buffers`: False
713
+ - `dataloader_pin_memory`: True
714
+ - `dataloader_persistent_workers`: False
715
+ - `skip_memory_metrics`: True
716
+ - `use_legacy_prediction_loop`: False
717
+ - `push_to_hub`: False
718
+ - `resume_from_checkpoint`: None
719
+ - `hub_model_id`: None
720
+ - `hub_strategy`: every_save
721
+ - `hub_private_repo`: None
722
+ - `hub_always_push`: False
723
+ - `gradient_checkpointing`: False
724
+ - `gradient_checkpointing_kwargs`: None
725
+ - `include_inputs_for_metrics`: False
726
+ - `include_for_metrics`: []
727
+ - `eval_do_concat_batches`: True
728
+ - `fp16_backend`: auto
729
+ - `push_to_hub_model_id`: None
730
+ - `push_to_hub_organization`: None
731
+ - `mp_parameters`:
732
+ - `auto_find_batch_size`: False
733
+ - `full_determinism`: False
734
+ - `torchdynamo`: None
735
+ - `ray_scope`: last
736
+ - `ddp_timeout`: 1800
737
+ - `torch_compile`: False
738
+ - `torch_compile_backend`: None
739
+ - `torch_compile_mode`: None
740
+ - `dispatch_batches`: None
741
+ - `split_batches`: None
742
+ - `include_tokens_per_second`: False
743
+ - `include_num_input_tokens_seen`: False
744
+ - `neftune_noise_alpha`: None
745
+ - `optim_target_modules`: None
746
+ - `batch_eval_metrics`: False
747
+ - `eval_on_start`: False
748
+ - `use_liger_kernel`: False
749
+ - `eval_use_gather_object`: False
750
+ - `average_tokens_across_devices`: False
751
+ - `prompts`: None
752
+ - `batch_sampler`: batch_sampler
753
+ - `multi_dataset_batch_sampler`: round_robin
754
+
755
+ </details>
756
+
757
+ ### Training Logs
758
+ | Epoch | Step | Training Loss | cosine_ndcg@10 |
759
+ |:------:|:-----:|:-------------:|:--------------:|
760
+ | 0.0863 | 500 | 0.225 | - |
761
+ | 0.1726 | 1000 | 0.1337 | - |
762
+ | 0.2589 | 1500 | 0.1195 | - |
763
+ | 0.3452 | 2000 | 0.0803 | - |
764
+ | 0.4316 | 2500 | 0.0775 | - |
765
+ | 0.5179 | 3000 | 0.0714 | - |
766
+ | 0.6042 | 3500 | 0.0852 | - |
767
+ | 0.6905 | 4000 | 0.0718 | - |
768
+ | 0.7768 | 4500 | 0.0499 | - |
769
+ | 0.8631 | 5000 | 0.0665 | 0.8371 |
770
+ | 0.9494 | 5500 | 0.0674 | - |
771
+ | 1.0 | 5793 | - | 0.8416 |
772
+ | 1.0357 | 6000 | 0.0538 | - |
773
+ | 1.1220 | 6500 | 0.0606 | - |
774
+ | 1.2084 | 7000 | 0.0294 | - |
775
+ | 1.2947 | 7500 | 0.0129 | - |
776
+ | 1.3810 | 8000 | 0.0101 | - |
777
+ | 1.4673 | 8500 | 0.0072 | - |
778
+ | 1.5536 | 9000 | 0.0211 | - |
779
+ | 1.6399 | 9500 | 0.0133 | - |
780
+ | 1.7262 | 10000 | 0.0063 | 0.8513 |
781
+
782
+
783
+ ### Framework Versions
784
+ - Python: 3.10.15
785
+ - Sentence Transformers: 4.0.2
786
+ - Transformers: 4.49.0
787
+ - PyTorch: 2.6.0+cu126
788
+ - Accelerate: 0.26.0
789
+ - Datasets: 3.5.0
790
+ - Tokenizers: 0.21.1
791
+
792
+ ## Citation
793
+
794
+ ### BibTeX
795
+
796
+ #### Sentence Transformers
797
+ ```bibtex
798
+ @inproceedings{reimers-2019-sentence-bert,
799
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
800
+ author = "Reimers, Nils and Gurevych, Iryna",
801
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
802
+ month = "11",
803
+ year = "2019",
804
+ publisher = "Association for Computational Linguistics",
805
+ url = "https://arxiv.org/abs/1908.10084",
806
+ }
807
+ ```
808
+
809
+ #### MatryoshkaLoss
810
+ ```bibtex
811
+ @misc{kusupati2024matryoshka,
812
+ title={Matryoshka Representation Learning},
813
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
814
+ year={2024},
815
+ eprint={2205.13147},
816
+ archivePrefix={arXiv},
817
+ primaryClass={cs.LG}
818
+ }
819
+ ```
820
+
821
+ #### MultipleNegativesRankingLoss
822
+ ```bibtex
823
+ @misc{henderson2017efficient,
824
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
825
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
826
+ year={2017},
827
+ eprint={1705.00652},
828
+ archivePrefix={arXiv},
829
+ primaryClass={cs.CL}
830
+ }
831
+ ```
832
+
833
+ <!--
834
+ ## Glossary
835
+
836
+ *Clearly define terms in order to be accessible across audiences.*
837
+ -->
838
+
839
+ <!--
840
+ ## Model Card Authors
841
+
842
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
843
+ -->
844
+
845
+ <!--
846
+ ## Model Card Contact
847
+
848
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
849
+ -->
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-m-v2.0",
3
+ "architectures": [
4
+ "GteModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_hf_alibaba_nlp_gte.GteConfig",
9
+ "AutoModel": "Snowflake/snowflake-arctic-embed-m-v2.0--modeling_hf_alibaba_nlp_gte.GteModel"
10
+ },
11
+ "classifier_dropout": 0.1,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 768,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "layer_norm_eps": 1e-12,
18
+ "layer_norm_type": "layer_norm",
19
+ "logn_attention_clip1": false,
20
+ "logn_attention_scale": false,
21
+ "matryoshka_dimensions": [
22
+ 256
23
+ ],
24
+ "max_position_embeddings": 8192,
25
+ "model_type": "gte",
26
+ "num_attention_heads": 12,
27
+ "num_hidden_layers": 12,
28
+ "pack_qkv": true,
29
+ "pad_token_id": 1,
30
+ "position_embedding_type": "rope",
31
+ "rope_scaling": null,
32
+ "rope_theta": 160000,
33
+ "torch_dtype": "float32",
34
+ "transformers_version": "4.49.0",
35
+ "type_vocab_size": 1,
36
+ "unpad_inputs": "true",
37
+ "use_memory_efficient_attention": "true",
38
+ "vocab_size": 250048
39
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.0.2",
4
+ "transformers": "4.49.0",
5
+ "pytorch": "2.6.0+cu126"
6
+ },
7
+ "prompts": {
8
+ "query": "query: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
configuration_hf_alibaba_nlp_gte.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """ GTE model configuration"""
17
+ from transformers.configuration_utils import PretrainedConfig
18
+ from transformers.utils import logging
19
+
20
+ logger = logging.get_logger(__name__)
21
+
22
+
23
+ class GteConfig(PretrainedConfig):
24
+ r"""
25
+ This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
26
+ instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
27
+ configuration with the defaults will yield a similar configuration to that of the NEW
28
+ [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
29
+
30
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
31
+ documentation from [`PretrainedConfig`] for more information.
32
+
33
+
34
+ Args:
35
+ vocab_size (`int`, *optional*, defaults to 30522):
36
+ Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
37
+ `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
38
+ hidden_size (`int`, *optional*, defaults to 768):
39
+ Dimensionality of the encoder layers and the pooler layer.
40
+ num_hidden_layers (`int`, *optional*, defaults to 12):
41
+ Number of hidden layers in the Transformer encoder.
42
+ num_attention_heads (`int`, *optional*, defaults to 12):
43
+ Number of attention heads for each attention layer in the Transformer encoder.
44
+ intermediate_size (`int`, *optional*, defaults to 3072):
45
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
46
+ hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
47
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
48
+ `"relu"`, `"silu"` and `"gelu_new"` are supported.
49
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
50
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
51
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
52
+ The dropout ratio for the attention probabilities.
53
+ max_position_embeddings (`int`, *optional*, defaults to 512):
54
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
55
+ just in case (e.g., 512 or 1024 or 2048).
56
+ type_vocab_size (`int`, *optional*, defaults to 2):
57
+ The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
58
+ initializer_range (`float`, *optional*, defaults to 0.02):
59
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
60
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
61
+ The epsilon used by the layer normalization layers.
62
+ position_embedding_type (`str`, *optional*, defaults to `"rope"`):
63
+ Type of position embedding. Choose one of `"absolute"`, `"rope"`.
64
+ rope_theta (`float`, *optional*, defaults to 10000.0):
65
+ The base period of the RoPE embeddings.
66
+ rope_scaling (`Dict`, *optional*):
67
+ Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
68
+ strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
69
+ `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
70
+ `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
71
+ these scaling strategies behave:
72
+ https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
73
+ experimental feature, subject to breaking API changes in future versions.
74
+ classifier_dropout (`float`, *optional*):
75
+ The dropout ratio for the classification head.
76
+
77
+ Examples:
78
+
79
+ ```python
80
+ >>> from transformers import NewConfig, NewModel
81
+
82
+ >>> # Initializing a NEW izhx/new-base-en style configuration
83
+ >>> configuration = NewConfig()
84
+
85
+ >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
86
+ >>> model = NewModel(configuration)
87
+
88
+ >>> # Accessing the model configuration
89
+ >>> configuration = model.config
90
+ ```"""
91
+
92
+ model_type = "gte"
93
+
94
+ def __init__(
95
+ self,
96
+ vocab_size=30528,
97
+ hidden_size=768,
98
+ num_hidden_layers=12,
99
+ num_attention_heads=12,
100
+ intermediate_size=3072,
101
+ hidden_act="gelu",
102
+ hidden_dropout_prob=0.1,
103
+ attention_probs_dropout_prob=0.0,
104
+ max_position_embeddings=2048,
105
+ type_vocab_size=1,
106
+ initializer_range=0.02,
107
+ layer_norm_type='layer_norm',
108
+ layer_norm_eps=1e-12,
109
+ # pad_token_id=0,
110
+ position_embedding_type="rope",
111
+ rope_theta=10000.0,
112
+ rope_scaling=None,
113
+ classifier_dropout=None,
114
+ pack_qkv=True,
115
+ unpad_inputs=False,
116
+ use_memory_efficient_attention=False,
117
+ logn_attention_scale=False,
118
+ logn_attention_clip1=False,
119
+ **kwargs,
120
+ ):
121
+ super().__init__(**kwargs)
122
+
123
+ self.vocab_size = vocab_size
124
+ self.hidden_size = hidden_size
125
+ self.num_hidden_layers = num_hidden_layers
126
+ self.num_attention_heads = num_attention_heads
127
+ self.hidden_act = hidden_act
128
+ self.intermediate_size = intermediate_size
129
+ self.hidden_dropout_prob = hidden_dropout_prob
130
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
131
+ self.max_position_embeddings = max_position_embeddings
132
+ self.type_vocab_size = type_vocab_size
133
+ self.initializer_range = initializer_range
134
+ self.layer_norm_type = layer_norm_type
135
+ self.layer_norm_eps = layer_norm_eps
136
+ self.position_embedding_type = position_embedding_type
137
+ self.rope_theta = rope_theta
138
+ self.rope_scaling = rope_scaling
139
+ self.classifier_dropout = classifier_dropout
140
+
141
+ self.pack_qkv = pack_qkv
142
+ self.unpad_inputs = unpad_inputs
143
+ self.use_memory_efficient_attention = use_memory_efficient_attention
144
+ self.logn_attention_scale = logn_attention_scale
145
+ self.logn_attention_clip1 = logn_attention_clip1
eval/Information-Retrieval_evaluation_results.csv ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ epoch,steps,cosine-Accuracy@1,cosine-Accuracy@3,cosine-Accuracy@5,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@3,cosine-Recall@3,cosine-Precision@5,cosine-Recall@5,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
2
+ 1.0,5793,0.6904885206283445,0.9026411185914034,0.9366476782323494,0.967892283790782,0.6904885206283445,0.6904885206283445,0.30088037286380115,0.9026411185914034,0.18732953564646984,0.9366476782323494,0.09678922837907819,0.967892283790782,0.7996936642198161,0.8416020003069878,0.8012900708593229
3
+ 2.0,11586,0.6941135853616434,0.8993612981184188,0.9375107888831348,0.9696185050923528,0.6941135853616434,0.6941135853616434,0.2997870993728063,0.8993612981184188,0.18750215777662693,0.9375107888831348,0.09696185050923527,0.9696185050923528,0.8014224200526644,0.8432305431101866,0.8029183926314174
4
+ 3.0,17379,0.693423096841015,0.8972898325565337,0.9387191437942344,0.9696185050923528,0.693423096841015,0.693423096841015,0.2990966108521779,0.8972898325565337,0.18774382875884688,0.9387191437942344,0.09696185050923527,0.9696185050923528,0.8008940730328616,0.8428217313706017,0.802428493797126
5
+ 4.0,23172,0.6903158984981874,0.8933195235629208,0.9368203003625065,0.9684101501812532,0.6903158984981874,0.6903158984981874,0.29777317452097357,0.8933195235629208,0.18736406007250128,0.9368203003625065,0.09684101501812532,0.9684101501812532,0.7984443458032272,0.8406723510269695,0.80003685068108
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5180b42e713060bcf65a1ff5f11f8b27dca0230fc31b3f6512cfa7c99fd0726
3
+ size 1221487872
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa7a6ad87a7ce8fe196787355f6af7d03aee94d19c54a5eb1392ed18c8ef451a
3
+ size 17082988
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "max_length": 512,
51
+ "model_max_length": 32768,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "<pad>",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "</s>",
57
+ "stride": 0,
58
+ "tokenizer_class": "XLMRobertaTokenizerFast",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "<unk>"
62
+ }