aaa961 commited on
Commit
a71b03d
·
verified ·
1 Parent(s): 9f8181d

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,760 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:5424
9
+ - loss:BatchSemiHardTripletLoss
10
+ base_model: BAAI/bge-base-en
11
+ widget:
12
+ - source_sentence: Tab previews often seem to be captured too early Quite often when
13
+ I look at tab previews I find a preview that looks half rendered. Maybe the screenshot
14
+ was taking while the page was still loading or something? Here is an example of
15
+ the preview for WhatsApp's web interface. I see this for other sites too but WhatsApp
16
+ seems to be reasonably reproducible.
17
+ sentences:
18
+ - Firefox should not fetch keys in oauth flows even if it gets the keyFetchToken
19
+ The FxA server was accidentally sending us its keyFetchToken during oauth flows.
20
+ This caused a race between who was going to use it. While a PR for that has been
21
+ merged in FxA, we still might as well avoid even trying to use the token if we
22
+ get it some how.
23
+ - "Tab doesn't receive focus outline on keyboard navigation I don't yet have good\
24
+ \ steps to reproduce but while using the keyboard to navigate tabs, sometimes\
25
+ \ one of my tabs doesn't get the focus outline. \n\nTo reproduce, open several\
26
+ \ tabs and navigate to the tab strip with the keyboard by focusing the addressbar\
27
+ \ (ctrl+L) and hitting shift+tab a few times. Then arrow through your tabs and\
28
+ \ notice that one of them doesn't get the focus indicator.\n\nI'm missing some\
29
+ \ step because I can't consistently reproduce but I've seen it several times now\
30
+ \ so getting a bug on file."
31
+ - 'Options menu no longer accepts mouse click after changing default zoom. STR:
32
+
33
+ * Open Options Menu
34
+
35
+ * Change default zoom from 100% to 90%
36
+
37
+
38
+ Expected Behavior:
39
+
40
+ * Zoom is applied in options menu and user is able to interact further with Options
41
+ menu
42
+
43
+
44
+ Actual behavior (Firefox 77.0a1 20200407214402):
45
+
46
+ * Zoom is applied in options menu, but user is no longer able to interact with
47
+ Options menu using mouse. Clicking any button/dropdown/... no longer works.
48
+
49
+ Using keyboard (tab to navigate options) still works.'
50
+ - source_sentence: '"Open Video in New Tab" should copy Mute settings STR:
51
+
52
+
53
+ 1) Mute a tab.
54
+
55
+ 2) Play a video on the tab.
56
+
57
+ 3) Right click, "Open Video in New Tab"
58
+
59
+ 4) MUSIC INSTANTLY BLASTING THROUGH THE OFFICE
60
+
61
+
62
+ The new tab should copy the mute settings from the original tab, particularly
63
+ because autoplay is enabled by default.'
64
+ sentences:
65
+ - 'The ingest logic for detecting newly enabled Suggest suggestions is broken In
66
+ bug 1907696 I added some logic to `BaseFeature` that tries to ingest only newly
67
+ enabled suggestion types when `update()` is called. It doesn''t work right because
68
+ by the time it [gets the old enabled suggestion types](https://searchfox.org/mozilla-central/rev/7c7e11a8e0352b0110923e86b873e4a26e3b0650/browser/components/urlbar/private/BaseFeature.sys.mjs#217-226),
69
+ prefs/variables have already changed, so `isRustSuggestionTypeEnabled()` reflects
70
+ the *new* enabled status, not the old. So later when it [tries to ingest only
71
+ newly enabled suggestion types](https://searchfox.org/mozilla-central/rev/7c7e11a8e0352b0110923e86b873e4a26e3b0650/browser/components/urlbar/private/BaseFeature.sys.mjs#237-249),
72
+ it won''t because it will think those types were already enabled.
73
+
74
+
75
+ This isn''t a huge problem because it only affects features that manage more than
76
+ one suggestion type, and only when one type was already enabled, i.e., only when
77
+ the feature itself was already enabled. `AdmWikipedia` is the only feature that
78
+ manages multiple types. So to trigger this, the user would have to have sponsored
79
+ enabled, nonsponsored disabled, and then turn on nonsponsored (or vice versa).
80
+ And even then, ingest on startup isn''t affected, so the worst case is that they
81
+ might end up with slightly outdated AMP or Wikipedia suggestions.'
82
+ - Possible to add multiple of the same tag in the Bookmarks Panel tag list betsymikal
83
+ was able to reproduce this (see screenshot), but I was not. needinfo'ing betsymikal
84
+ for steps to reproduce.
85
+ - '[Colorway Closet] Fix up illustration sizing Couple of things:
86
+
87
+ - We want to allow illustrations up to 300\*300px instead of the current 288\*288px.
88
+
89
+ - We currently use the **height** variable to set the max-**width** and the **width**
90
+ variable to set the max-**height**. We should fix that :)
91
+
92
+ - We also need to set a minimum container size so that layout doesn''t change
93
+ due to different colorway illustrations having different sizes.'
94
+ - source_sentence: 'Proton Menu Update Does Not Allow Easily Restoring Session User
95
+ Agent: Mozilla/5.0 (X11; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0
96
+
97
+
98
+ Steps to reproduce:
99
+
100
+
101
+ Enabled browser.proton.appmenu.
102
+
103
+
104
+ Restarted my browser.
105
+
106
+
107
+ Attempted to Access Restore Recent Session Item, either on the main menu, or in
108
+ a sub-menu.
109
+
110
+
111
+
112
+ Actual results:
113
+
114
+
115
+ I was unable to do so.
116
+
117
+
118
+
119
+ Expected results:
120
+
121
+
122
+ In order to easily facilitate restoring an old session, a menu item should be
123
+ included, either in the main menu, or within the history sub-menu.'
124
+ sentences:
125
+ - 'When there are fluent .orig files Storybook won''t start The [webpackInclude
126
+ comments](https://searchfox.org/mozilla-central/rev/a64647a2125cf3d334451051491fef6772e8eb57/browser/components/storybook/.storybook/preview.js#33,40)
127
+ for the fluent files is over-permissive and catching .orig files which then cause
128
+ webpack to throw an error and prevent storybook from starting.
129
+
130
+
131
+ STR
132
+
133
+
134
+ 1. `cp browser/locales/en-US/browser/browser.ftl{,.orig}`
135
+
136
+ 2. `./mach storybook`
137
+
138
+
139
+ Expected results: Storybook starts
140
+
141
+ Actual results: Webpack error :('
142
+ - '[Experiment] The “expose” event is not registered on the treatment branches of
143
+ the "Test Window modal vs Tab modal on about:welcome" experiment **[Affected versions]:**
144
+
145
+ - Firefox Release candidate 110.0 (Build ID: 20230206190557)
146
+
147
+
148
+ **[Affected Platforms]:**
149
+
150
+ - Windows 10 x64
151
+
152
+ - Windows 11 x64
153
+
154
+
155
+ **[Prerequisites]:**
156
+
157
+ - Have the latest version of Firefox Beta 110 installed.
158
+
159
+ - Have the Firefox browser pinned to Taskbar.
160
+
161
+ - Have the [user.js](https://drive.google.com/file/d/1rE_QlyzmNhqL598EWgoOA5Y4HsJEHamk/view?usp=share_link)
162
+ file saved to your PC.
163
+
164
+
165
+
166
+ **[Steps to reproduce]:**
167
+
168
+ 1. Create a new Firefox profile but do not open it.
169
+
170
+ 2. Navigate to the Firefox profile folder and paste the user.js file from the
171
+ prerequisites.
172
+
173
+ 3. Open the browser using the previously created profile and the “--first-startup”
174
+ syntax.
175
+
176
+ 4. Make sure that the window modal is shown.
177
+
178
+ 5. Navigate to the “about:telemetry#events” page and search for the “expose” event.
179
+
180
+
181
+ **[Expected result]:**
182
+
183
+ - The “expose” event is displayed.
184
+
185
+
186
+ **[Actual result]:**
187
+
188
+ - No “expose” event is registered.
189
+
190
+
191
+ **[Notes]:**
192
+
193
+ - This issue is not reproducible on the Control branch of the experiment.'
194
+ - 'Search mode chiclet can get overloaded with Switch to tab text STR
195
+
196
+ 1. Open Tabs search mode by typing `% ` or clicking the tabs search shortcut.
197
+
198
+ 2. Press the down arrow until a non-remote Switch-to-Tab result is selected.
199
+
200
+ 3. Press Esc.
201
+
202
+
203
+ Expected results: The search mode chiclet reads "Tabs"
204
+
205
+ Actual results: The search mode chiclet reads "Switch to tab:Tabs"'
206
+ - source_sentence: "Limit address bar clipboard result for new tabs **Found in**\n\
207
+ * Fx 121.0a1\n\n**Affected versions**\n* Fx 121.0a1\n\n**Affected platforms**\n\
208
+ * Windows 10\n* Ubuntu\n* macOS\n\n**Preconditions**\n* Set browser.urlbar.clipboard.featureGate\
209
+ \ to true.\n\n**Steps to reproduce**\n1. Launch Firefox.\n2. Copy a website url.\n\
210
+ 3. Click the address bar and inspect the clipboard result suggestion.\n4. Open\
211
+ \ a new tab and click the address bar - the clipboard result is shown.\n5. Redo\
212
+ \ the previous step several times.\n\n**Expected result**\n* Clipboard result\
213
+ \ is no longer shown after a few tries. \n\n**Actual result**\n* Clipboard result\
214
+ \ suggestion is displayed on every new tab. It seems that the impression for the\
215
+ \ clipboard result is not registered.\n\n**Regression range**\n* Not a regression.\n\
216
+ \n**Additional notes**\n* Clipboard result only appears twice on the same tab."
217
+ sentences:
218
+ - '[Docs] Feature callout doc incorrectly says randomize is a property of tiles
219
+ rather than MultiSelectItem If you search for randomize in [the doc](https://firefox-source-docs.mozilla.org/browser/components/asrouter/docs/feature-callout.html),
220
+ you''ll find it is on the `tiles` object. But in reality, it was moved from there
221
+ to `MultiSelectItem`, so that each item has its own `randomize` property, allowing
222
+ some items to keep a static position at the bottom/top, or more complicated layouts.
223
+ So we just need to reflect that in the documentation. Adding a code comment could
224
+ help explain how that works - I added an explanation to the summary of [the patch](https://phabricator.services.mozilla.com/D202513)
225
+ where I added per-item randomization, so we can use that.'
226
+ - 'PlacesFeed module is doing expensive work during Places notifications I took
227
+ a profile from the suggestion in bug 1533061
228
+
229
+ https://share.firefox.dev/3XWswU0
230
+
231
+
232
+ on notifications this is doing a "dispatch" that doesn''t look like being an actual
233
+ async dispatch https://searchfox.org/mozilla-central/source/browser/components/newtab/lib/PlacesFeed.jsm#64,160,178-193
234
+
235
+ That apparently accounts for a good 30% of the time.
236
+
237
+
238
+ Since this listener is pretty much always active, it is slowing down various bookmarks
239
+ (and maybe history) operations, it should do the minimum necessary in the notification
240
+ handler, and redispatch/batch any expensive work.'
241
+ - 'Server errors (503/429/...non-200) leave page in loading state Based on [the
242
+ xpcshell test](https://searchfox.org/mozilla-central/rev/7499890dc8f116a9e40f4a689a251a0311a9f461/toolkit/components/shopping/test/xpcshell/test_product.js#374-384)
243
+ the current expectation is that failing requests will be retried, and after a
244
+ number of retries, return `null` if they keep failing.
245
+
246
+
247
+ Unfortunately, the [actor code will then just pass the null data to the UI code](https://searchfox.org/mozilla-central/rev/7499890dc8f116a9e40f4a689a251a0311a9f461/browser/components/shopping/ShoppingSidebarChild.sys.mjs#259-262,268-277),
248
+ which will treat it as a "loading" state. This means that if the request fails
249
+ permanently, we never show anything other than the loading state.
250
+
251
+
252
+ We [already have generic error messaging](https://docs.google.com/spreadsheets/d/1FQUU2pqhKkAXRHNbmK9dhObwvevWoR3N6BOh3fu2skE/edit#gid=64342838&range=A50:D52)
253
+ for this, but it doesn''t appear that this exists in the ftl or is being used.
254
+
255
+
256
+ Assuming I''m not missing anything, this seems like something we should ideally
257
+ fix sooner rather than later. Ania?'
258
+ - source_sentence: "Loss of bookmarks state after OS hibernation User Agent: Mozilla/5.0\
259
+ \ (Windows NT 6.1; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0\n\nSteps to\
260
+ \ reproduce:\n\nKept Firefox 88.0 open for several days, using hibernate on Windows\
261
+ \ inbetween.\nCreated several bookmarks using Ctrl-D and Shift-Ctrl-D both for\
262
+ \ private as well as non-private sites.\nAlso created some bookmarks by dragging\
263
+ \ from address bar into subfolders of bookmark window.\nI was wondering, that\
264
+ \ on some windows star icon an address bar did not become blue.\nVerified, that\
265
+ \ i already had bookmarked this site by pressing Ctrl-D. The appropriate bookmarks\
266
+ \ subfolder was shown. Star still did not become true.\nAnything else seem ok.\n\
267
+ Did quit Firefox using File->Quit menu command, saving session with all open tabs.\n\
268
+ Did start Firefox with restoring previous session using menu command.\n\n\nActual\
269
+ \ results:\n\nSome of the bookmarks created above are lost. Can't remember most\
270
+ \ of them.\nHowever i remember a few sites, which i definitely had created a bookmark\
271
+ \ for.\nNo bookmark exists.\nAlso there is no entry in history for these sites.\
272
+ \ Missing entry in history might be due to using private window. (Can't remember\
273
+ \ if i was using a private window for opening this site.)\n\nI repeatedly had\
274
+ \ lost both bookmarks and history entries some month ago and then created a new\
275
+ \ profile. Did not import anything from previous profiles. \n\n\nExpected results:\n\
276
+ \nBookmarks should have been saved as expected."
277
+ sentences:
278
+ - 'Urlbar docs: In index.rst, the Architecture Overview link in the Where to Start
279
+ section is broken In index.rst, the Architecture Overview link in the Where to
280
+ Start section links to an unrelated page in the devtools docs.'
281
+ - 'Messages with triggers cause trigger listeners to initialize multiple times [Currently](https://searchfox.org/mozilla-central/rev/07342ce09126c513540c1c343476e026cfa907bf/browser/components/newtab/lib/ASRouter.jsm#827)
282
+ we call the `.init` function of every trigger (when defined).
283
+
284
+ This makes sense for `openURL` where we accumulate more URLs but for others it
285
+ is a bug
286
+
287
+ * moments trigger will add multiple setIntervals
288
+
289
+ * whatsNew will evaluate multiple times which is wasteful).'
290
+ - 'Firefox provides the option to install search engines as open search even if
291
+ they''re already installed **Affected versions**
292
+
293
+ * Fx99.0a1
294
+
295
+ * Fx97.0.1
296
+
297
+
298
+ **Affected platforms**
299
+
300
+ * Windows 10
301
+
302
+ * macOS
303
+
304
+ * Ubuntu 20.04
305
+
306
+
307
+ **Preconditions**
308
+
309
+ Have a zh-CN build downloaded.
310
+
311
+ Have a user.js in the profile root folder containing the following:
312
+
313
+ ``` user_pref("browser.search.region", "CN"); ```
314
+
315
+
316
+ **Steps to reproduce**
317
+
318
+ 1. Launch Firefox with the profile containing the user.js.
319
+
320
+ 2. Go to about:preferences#search and add the search bar. (optional)
321
+
322
+ 3. Using the address bar, perform a search.
323
+
324
+ 4. Observe the search bar magnifying glass icon.
325
+
326
+ 5. Open the address bar drop-down and inspect the one-offs section.
327
+
328
+
329
+ **Expected result**
330
+
331
+ * The shouldn''t be any indication to add an open search engine, as the default
332
+ baidu engine is already installed in Firefox.
333
+
334
+
335
+ **Actual result**
336
+
337
+ * There is a button to install the open search engine.
338
+
339
+
340
+ **Regression range**
341
+
342
+ * This doesn''t seem to be a recent regression as we could reproduce the issue
343
+ with an Fx93 build.
344
+
345
+
346
+ **Additional notes**
347
+
348
+ * If the engine is added and then clicked in the search bar drop-down it will
349
+ take the user to www.baidu.com
350
+
351
+ * This seems to be happening regardless of used locale build and with amazon as
352
+ well, but under a bit different steps:
353
+
354
+ 1. Open the searchbar dropdown.
355
+
356
+ 2. Click the amazon one-off from the dropdown
357
+
358
+ 3. After the website is loaded inspect the searchbar magnifying glass icon.
359
+
360
+
361
+ * See the screenshots for more details.'
362
+ pipeline_tag: sentence-similarity
363
+ library_name: sentence-transformers
364
+ metrics:
365
+ - cosine_accuracy
366
+ model-index:
367
+ - name: SentenceTransformer based on BAAI/bge-base-en
368
+ results:
369
+ - task:
370
+ type: triplet
371
+ name: Triplet
372
+ dataset:
373
+ name: bge base en train
374
+ type: bge-base-en-train
375
+ metrics:
376
+ - type: cosine_accuracy
377
+ value: 0.49041298031806946
378
+ name: Cosine Accuracy
379
+ - type: cosine_accuracy
380
+ value: 0.5141874551773071
381
+ name: Cosine Accuracy
382
+ ---
383
+
384
+ # SentenceTransformer based on BAAI/bge-base-en
385
+
386
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
387
+
388
+ ## Model Details
389
+
390
+ ### Model Description
391
+ - **Model Type:** Sentence Transformer
392
+ - **Base model:** [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) <!-- at revision b737bf5dcc6ee8bdc530531266b4804a5d77b5d8 -->
393
+ - **Maximum Sequence Length:** 512 tokens
394
+ - **Output Dimensionality:** 768 dimensions
395
+ - **Similarity Function:** Cosine Similarity
396
+ <!-- - **Training Dataset:** Unknown -->
397
+ <!-- - **Language:** Unknown -->
398
+ <!-- - **License:** Unknown -->
399
+
400
+ ### Model Sources
401
+
402
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
403
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
404
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
405
+
406
+ ### Full Model Architecture
407
+
408
+ ```
409
+ SentenceTransformer(
410
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
411
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
412
+ (2): Normalize()
413
+ )
414
+ ```
415
+
416
+ ## Usage
417
+
418
+ ### Direct Usage (Sentence Transformers)
419
+
420
+ First install the Sentence Transformers library:
421
+
422
+ ```bash
423
+ pip install -U sentence-transformers
424
+ ```
425
+
426
+ Then you can load this model and run inference.
427
+ ```python
428
+ from sentence_transformers import SentenceTransformer
429
+
430
+ # Download from the 🤗 Hub
431
+ model = SentenceTransformer("aaa961/finetuned-bge-base-en-firefox")
432
+ # Run inference
433
+ sentences = [
434
+ "Loss of bookmarks state after OS hibernation User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0\n\nSteps to reproduce:\n\nKept Firefox 88.0 open for several days, using hibernate on Windows inbetween.\nCreated several bookmarks using Ctrl-D and Shift-Ctrl-D both for private as well as non-private sites.\nAlso created some bookmarks by dragging from address bar into subfolders of bookmark window.\nI was wondering, that on some windows star icon an address bar did not become blue.\nVerified, that i already had bookmarked this site by pressing Ctrl-D. The appropriate bookmarks subfolder was shown. Star still did not become true.\nAnything else seem ok.\nDid quit Firefox using File->Quit menu command, saving session with all open tabs.\nDid start Firefox with restoring previous session using menu command.\n\n\nActual results:\n\nSome of the bookmarks created above are lost. Can't remember most of them.\nHowever i remember a few sites, which i definitely had created a bookmark for.\nNo bookmark exists.\nAlso there is no entry in history for these sites. Missing entry in history might be due to using private window. (Can't remember if i was using a private window for opening this site.)\n\nI repeatedly had lost both bookmarks and history entries some month ago and then created a new profile. Did not import anything from previous profiles. \n\n\nExpected results:\n\nBookmarks should have been saved as expected.",
435
+ 'Messages with triggers cause trigger listeners to initialize multiple times [Currently](https://searchfox.org/mozilla-central/rev/07342ce09126c513540c1c343476e026cfa907bf/browser/components/newtab/lib/ASRouter.jsm#827) we call the `.init` function of every trigger (when defined).\nThis makes sense for `openURL` where we accumulate more URLs but for others it is a bug\n* moments trigger will add multiple setIntervals\n* whatsNew will evaluate multiple times which is wasteful).',
436
+ 'Urlbar docs: In index.rst, the Architecture Overview link in the Where to Start section is broken In index.rst, the Architecture Overview link in the Where to Start section links to an unrelated page in the devtools docs.',
437
+ ]
438
+ embeddings = model.encode(sentences)
439
+ print(embeddings.shape)
440
+ # [3, 768]
441
+
442
+ # Get the similarity scores for the embeddings
443
+ similarities = model.similarity(embeddings, embeddings)
444
+ print(similarities)
445
+ # tensor([[1.0000, 1.0000, 1.0000],
446
+ # [1.0000, 1.0000, 1.0000],
447
+ # [1.0000, 1.0000, 1.0000]])
448
+ ```
449
+
450
+ <!--
451
+ ### Direct Usage (Transformers)
452
+
453
+ <details><summary>Click to see the direct usage in Transformers</summary>
454
+
455
+ </details>
456
+ -->
457
+
458
+ <!--
459
+ ### Downstream Usage (Sentence Transformers)
460
+
461
+ You can finetune this model on your own dataset.
462
+
463
+ <details><summary>Click to expand</summary>
464
+
465
+ </details>
466
+ -->
467
+
468
+ <!--
469
+ ### Out-of-Scope Use
470
+
471
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
472
+ -->
473
+
474
+ ## Evaluation
475
+
476
+ ### Metrics
477
+
478
+ #### Triplet
479
+
480
+ * Dataset: `bge-base-en-train`
481
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
482
+
483
+ | Metric | Value |
484
+ |:--------------------|:-----------|
485
+ | **cosine_accuracy** | **0.4904** |
486
+
487
+ #### Triplet
488
+
489
+ * Dataset: `bge-base-en-train`
490
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
491
+
492
+ | Metric | Value |
493
+ |:--------------------|:-----------|
494
+ | **cosine_accuracy** | **0.5142** |
495
+
496
+ <!--
497
+ ## Bias, Risks and Limitations
498
+
499
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
500
+ -->
501
+
502
+ <!--
503
+ ### Recommendations
504
+
505
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
506
+ -->
507
+
508
+ ## Training Details
509
+
510
+ ### Training Dataset
511
+
512
+ #### Unnamed Dataset
513
+
514
+ * Size: 5,424 training samples
515
+ * Columns: <code>texts</code> and <code>label</code>
516
+ * Approximate statistics based on the first 1000 samples:
517
+ | | texts | label |
518
+ |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|
519
+ | type | string | int |
520
+ | details | <ul><li>min: 17 tokens</li><li>mean: 211.84 tokens</li><li>max: 511 tokens</li></ul> | <ul><li>1: ~26.20%</li><li>2: ~21.70%</li><li>3: ~42.90%</li><li>4: ~0.80%</li><li>5: ~8.40%</li></ul> |
521
+ * Samples:
522
+ | texts | label |
523
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
524
+ | <code>The "X" button of the opt-in modal is only read as "button" using a screen reader **[Affected Versions]:**<br>- Firefox Beta 97.0b2 (Build ID: 20220111185943)<br>- Firefox Nightly 98.a1 (Build ID: 20220111093827)<br><br>**[Affected Platforms]:**<br>- Windows 10 x64<br>- Ubuntu 20.04 x64<br>- macOS 10.15.7<br><br>**[Prerequisites]:**<br>- Have Firefox Beta 97.0b2 downloaded on your computer.<br>- Have the "browser.search.region" set to "US".<br>- Have one of the [treatment user.js](https://drive.google.com/drive/folders/1R_Kl51yPO9RCiuUmCHNOCqs3FQQAU2bD?usp=sharing) on your computer.<br>- Make sure there is no other modal displayed when starting the browser (browser default window, onboarding for new users etc).<br>- Have a screen reader application opened.<br><br>**[Steps to reproduce]:**<br>1. Open Firefox Beta 97.0b2.<br>2. Navigate to the “about:support” page and paste the user.js file into the Profile folder.<br>3. Restart the browser.<br>4. Focus on the "X" button of the modal.<br>5. Listen to what the screen reader application reads.<br><br>**[Exp...</code> | <code>1</code> |
525
+ | <code>Text and radio buttons are overlapping on PDF Tested with:<br>Nightly 91.0a1 (2021-06-23)<br><br>Tested on:<br>Win 10<br><br>Preconditions:<br>In about:config, set pdfjs.enableXfa = true<br><br>Steps:<br><br>1. Launch Firefox<br>2. Open the attached pdf.<br>3. Go to "Taille de l'entreprise"<br><br>Actual result:<br>Radio buttons and text are overlapping.<br><br>Expected result:<br>Text and radio buttons should be properly displayed</code> | <code>1</code> |
526
+ | <code>opening a new private window shows the old private window start page for a moment with browser.privatebrowsing.felt-privacy-v1, causing a bad perceived performance STR:<br><br>1. Set browser.privatebrowsing.felt-privacy-v1 to true<br>2. Open a private window<br><br>Actual:<br><br>The content of the old private window start page is visible during the window creation. This causes a very visible jump of the content and the window creation feels slow. Please see the following screencast:<br><br>https://www.youtube.com/watch?v=Fax2VqefuuY<br><br>You can change the speed to 0,25 on YouTube to see it even better.<br><br>Expected:<br><br>No felt performance issue.</code> | <code>3</code> |
527
+ * Loss: [<code>BatchSemiHardTripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchsemihardtripletloss)
528
+
529
+ ### Evaluation Dataset
530
+
531
+ #### Unnamed Dataset
532
+
533
+ * Size: 1,162 evaluation samples
534
+ * Columns: <code>texts</code> and <code>label</code>
535
+ * Approximate statistics based on the first 1000 samples:
536
+ | | texts | label |
537
+ |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|
538
+ | type | string | int |
539
+ | details | <ul><li>min: 16 tokens</li><li>mean: 209.86 tokens</li><li>max: 508 tokens</li></ul> | <ul><li>1: ~26.30%</li><li>2: ~21.00%</li><li>3: ~42.50%</li><li>4: ~0.70%</li><li>5: ~9.50%</li></ul> |
540
+ * Samples:
541
+ | texts | label |
542
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
543
+ | <code>(Proton) (a11y) Selected tabs almost not to see STR:<br><br>1. select multiple tabs<br><br>Expected:<br><br>You have no difficulties to see your selected tabs.<br><br>Actual:<br><br>It's almost not possible (at least to me) to see the selected tabs due to a insufficient contrast between the tab bar color and the tab color.<br><br>Marking as regression since it has a11y implications compared to pre Proton.</code> | <code>2</code> |
544
+ | <code>The Not Now button from the Fakespot Onboarding sidebar is missing the Clicked State **Found in**<br>* Nightly 118.0a1 (2023-08-18)<br><br>**Affected versions**<br>* Nightly 118.0a1 (2023-08-18)<br><br>**Affected platforms**<br>* ALL<br><br>**Preconditions:**<br>Set the browser.shopping.experience2023.enabled - TRUE<br>Set the toolkit.shopping.useOHTTP - TRUE<br><br>**Steps to reproduce**<br>1. Reach about:preferences and turn off feature recommendations.<br>2. Reach the Amazon https://www.amazon.com/dp/B09B6ZXD2V/ref=sbl_dpx_office-desks_B0B4CYW8FB_0 link <br>3. Click and Hold the Not Now button from the Onboarding Shopping sidebar.<br><br>**Expected result**<br>* The Not now button from the Onboarding Shopping sidebar should change its state when Clicked.<br><br>**Actual result**<br>* The Not now button from the Onboarding Shopping sidebar is missing the Clicked state.<br><br>**Regression range**<br>Not Applicable</code> | <code>1</code> |
545
+ | <code>Missing data from table Tested with:<br>Nightly 91.0a1 (2021-06-22)<br><br>Tested on:<br>Win 10<br><br>Preconditions:<br>In about:config, set pdfjs.enableXfa = true<br><br>Steps:<br><br>1. Launch firefox<br>2. Open the attached PDF<br><br>Actual result:<br>No data in table is displayed<br><br>Expected result:<br>A table with data should be displayed</code> | <code>1</code> |
546
+ * Loss: [<code>BatchSemiHardTripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchsemihardtripletloss)
547
+
548
+ ### Training Hyperparameters
549
+ #### Non-Default Hyperparameters
550
+
551
+ - `eval_strategy`: steps
552
+ - `per_device_train_batch_size`: 2
553
+ - `per_device_eval_batch_size`: 2
554
+ - `gradient_accumulation_steps`: 8
555
+ - `learning_rate`: 2e-05
556
+ - `num_train_epochs`: 5
557
+ - `warmup_ratio`: 0.1
558
+ - `fp16`: True
559
+ - `batch_sampler`: group_by_label
560
+
561
+ #### All Hyperparameters
562
+ <details><summary>Click to expand</summary>
563
+
564
+ - `overwrite_output_dir`: False
565
+ - `do_predict`: False
566
+ - `eval_strategy`: steps
567
+ - `prediction_loss_only`: True
568
+ - `per_device_train_batch_size`: 2
569
+ - `per_device_eval_batch_size`: 2
570
+ - `per_gpu_train_batch_size`: None
571
+ - `per_gpu_eval_batch_size`: None
572
+ - `gradient_accumulation_steps`: 8
573
+ - `eval_accumulation_steps`: None
574
+ - `torch_empty_cache_steps`: None
575
+ - `learning_rate`: 2e-05
576
+ - `weight_decay`: 0.0
577
+ - `adam_beta1`: 0.9
578
+ - `adam_beta2`: 0.999
579
+ - `adam_epsilon`: 1e-08
580
+ - `max_grad_norm`: 1.0
581
+ - `num_train_epochs`: 5
582
+ - `max_steps`: -1
583
+ - `lr_scheduler_type`: linear
584
+ - `lr_scheduler_kwargs`: {}
585
+ - `warmup_ratio`: 0.1
586
+ - `warmup_steps`: 0
587
+ - `log_level`: passive
588
+ - `log_level_replica`: warning
589
+ - `log_on_each_node`: True
590
+ - `logging_nan_inf_filter`: True
591
+ - `save_safetensors`: True
592
+ - `save_on_each_node`: False
593
+ - `save_only_model`: False
594
+ - `restore_callback_states_from_checkpoint`: False
595
+ - `no_cuda`: False
596
+ - `use_cpu`: False
597
+ - `use_mps_device`: False
598
+ - `seed`: 42
599
+ - `data_seed`: None
600
+ - `jit_mode_eval`: False
601
+ - `use_ipex`: False
602
+ - `bf16`: False
603
+ - `fp16`: True
604
+ - `fp16_opt_level`: O1
605
+ - `half_precision_backend`: auto
606
+ - `bf16_full_eval`: False
607
+ - `fp16_full_eval`: False
608
+ - `tf32`: None
609
+ - `local_rank`: 0
610
+ - `ddp_backend`: None
611
+ - `tpu_num_cores`: None
612
+ - `tpu_metrics_debug`: False
613
+ - `debug`: []
614
+ - `dataloader_drop_last`: False
615
+ - `dataloader_num_workers`: 0
616
+ - `dataloader_prefetch_factor`: None
617
+ - `past_index`: -1
618
+ - `disable_tqdm`: False
619
+ - `remove_unused_columns`: True
620
+ - `label_names`: None
621
+ - `load_best_model_at_end`: False
622
+ - `ignore_data_skip`: False
623
+ - `fsdp`: []
624
+ - `fsdp_min_num_params`: 0
625
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
626
+ - `fsdp_transformer_layer_cls_to_wrap`: None
627
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
628
+ - `deepspeed`: None
629
+ - `label_smoothing_factor`: 0.0
630
+ - `optim`: adamw_torch
631
+ - `optim_args`: None
632
+ - `adafactor`: False
633
+ - `group_by_length`: False
634
+ - `length_column_name`: length
635
+ - `ddp_find_unused_parameters`: None
636
+ - `ddp_bucket_cap_mb`: None
637
+ - `ddp_broadcast_buffers`: False
638
+ - `dataloader_pin_memory`: True
639
+ - `dataloader_persistent_workers`: False
640
+ - `skip_memory_metrics`: True
641
+ - `use_legacy_prediction_loop`: False
642
+ - `push_to_hub`: False
643
+ - `resume_from_checkpoint`: None
644
+ - `hub_model_id`: None
645
+ - `hub_strategy`: every_save
646
+ - `hub_private_repo`: None
647
+ - `hub_always_push`: False
648
+ - `hub_revision`: None
649
+ - `gradient_checkpointing`: False
650
+ - `gradient_checkpointing_kwargs`: None
651
+ - `include_inputs_for_metrics`: False
652
+ - `include_for_metrics`: []
653
+ - `eval_do_concat_batches`: True
654
+ - `fp16_backend`: auto
655
+ - `push_to_hub_model_id`: None
656
+ - `push_to_hub_organization`: None
657
+ - `mp_parameters`:
658
+ - `auto_find_batch_size`: False
659
+ - `full_determinism`: False
660
+ - `torchdynamo`: None
661
+ - `ray_scope`: last
662
+ - `ddp_timeout`: 1800
663
+ - `torch_compile`: False
664
+ - `torch_compile_backend`: None
665
+ - `torch_compile_mode`: None
666
+ - `include_tokens_per_second`: False
667
+ - `include_num_input_tokens_seen`: False
668
+ - `neftune_noise_alpha`: None
669
+ - `optim_target_modules`: None
670
+ - `batch_eval_metrics`: False
671
+ - `eval_on_start`: False
672
+ - `use_liger_kernel`: False
673
+ - `liger_kernel_config`: None
674
+ - `eval_use_gather_object`: False
675
+ - `average_tokens_across_devices`: False
676
+ - `prompts`: None
677
+ - `batch_sampler`: group_by_label
678
+ - `multi_dataset_batch_sampler`: proportional
679
+ - `router_mapping`: {}
680
+ - `learning_rate_mapping`: {}
681
+
682
+ </details>
683
+
684
+ ### Training Logs
685
+ | Epoch | Step | Training Loss | Validation Loss | bge-base-en-train_cosine_accuracy |
686
+ |:------:|:----:|:-------------:|:---------------:|:---------------------------------:|
687
+ | 0.2950 | 100 | 5.4327 | 5.0655 | 0.5050 |
688
+ | 0.5900 | 200 | 5.2042 | 5.0281 | 0.4919 |
689
+ | 0.8850 | 300 | 5.1393 | 5.0186 | 0.5009 |
690
+ | 1.1829 | 400 | 5.1552 | 5.0137 | 0.5004 |
691
+ | 1.4779 | 500 | 5.0807 | 5.0111 | 0.4989 |
692
+ | 1.7729 | 600 | 5.0634 | 5.0092 | 0.4982 |
693
+ | 2.0708 | 700 | 5.1017 | 5.0079 | 0.4985 |
694
+ | 2.3658 | 800 | 5.0431 | 5.0067 | 0.4913 |
695
+ | 2.6608 | 900 | 5.0378 | 5.0058 | 0.4888 |
696
+ | 2.9558 | 1000 | 5.0332 | 5.0051 | 0.4904 |
697
+ | 3.2537 | 1100 | 5.08 | 5.0048 | 0.4897 |
698
+ | 3.5487 | 1200 | 5.0277 | 5.0043 | 0.4899 |
699
+ | 3.8437 | 1300 | 5.0257 | 5.0041 | 0.4888 |
700
+ | 4.1416 | 1400 | 5.075 | 5.0039 | 0.4924 |
701
+ | 4.4366 | 1500 | 5.0235 | 5.0038 | 0.4937 |
702
+ | 4.7316 | 1600 | 5.0224 | 5.0038 | 0.4904 |
703
+ | -1 | -1 | - | - | 0.5142 |
704
+
705
+
706
+ ### Framework Versions
707
+ - Python: 3.10.10
708
+ - Sentence Transformers: 5.1.0
709
+ - Transformers: 4.55.3
710
+ - PyTorch: 2.7.1+cu128
711
+ - Accelerate: 1.10.0
712
+ - Datasets: 4.0.0
713
+ - Tokenizers: 0.21.4
714
+
715
+ ## Citation
716
+
717
+ ### BibTeX
718
+
719
+ #### Sentence Transformers
720
+ ```bibtex
721
+ @inproceedings{reimers-2019-sentence-bert,
722
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
723
+ author = "Reimers, Nils and Gurevych, Iryna",
724
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
725
+ month = "11",
726
+ year = "2019",
727
+ publisher = "Association for Computational Linguistics",
728
+ url = "https://arxiv.org/abs/1908.10084",
729
+ }
730
+ ```
731
+
732
+ #### BatchSemiHardTripletLoss
733
+ ```bibtex
734
+ @misc{hermans2017defense,
735
+ title={In Defense of the Triplet Loss for Person Re-Identification},
736
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
737
+ year={2017},
738
+ eprint={1703.07737},
739
+ archivePrefix={arXiv},
740
+ primaryClass={cs.CV}
741
+ }
742
+ ```
743
+
744
+ <!--
745
+ ## Glossary
746
+
747
+ *Clearly define terms in order to be accessible across audiences.*
748
+ -->
749
+
750
+ <!--
751
+ ## Model Card Authors
752
+
753
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
754
+ -->
755
+
756
+ <!--
757
+ ## Model Card Contact
758
+
759
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
760
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.55.3",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.1.0",
4
+ "transformers": "4.55.3",
5
+ "pytorch": "2.7.1+cu128"
6
+ },
7
+ "model_type": "SentenceTransformer",
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83ba7e31b57d9ccf2d8b56a7c5142e23f05b160af6101847e8b52e2ce5a7a282
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff