Add new SentenceTransformer model

Browse files

Files changed (11) hide show

1_Pooling/config.json +10 -0
README.md +760 -0
config.json +31 -0
config_sentence_transformers.json +14 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +58 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 768,
+    "pooling_mode_cls_token": true,
+    "pooling_mode_mean_tokens": false,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,760 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- dense
+- generated_from_trainer
+- dataset_size:5424
+- loss:BatchSemiHardTripletLoss
+base_model: BAAI/bge-base-en
+widget:
+- source_sentence: Tab previews often seem to be captured too early Quite often when
+    I look at tab previews I find a preview that looks half rendered. Maybe the screenshot
+    was taking while the page was still loading or something? Here is an example of
+    the preview for WhatsApp's web interface. I see this for other sites too but WhatsApp
+    seems to be reasonably reproducible.
+  sentences:
+  - Firefox should not fetch keys in oauth flows even if it gets the keyFetchToken
+    The FxA server was accidentally sending us its keyFetchToken during oauth flows.
+    This caused a race between who was going to use it. While a PR for that has been
+    merged in FxA, we still might as well avoid even trying to use the token if we
+    get it some how.
+  - "Tab doesn't receive focus outline on keyboard navigation I don't yet have good\
+    \ steps to reproduce but while using the keyboard to navigate tabs, sometimes\
+    \ one of my tabs doesn't get the focus outline. \n\nTo reproduce, open several\
+    \ tabs and navigate to the tab strip with the keyboard by focusing the addressbar\
+    \ (ctrl+L) and hitting shift+tab a few times. Then arrow through your tabs and\
+    \ notice that one of them doesn't get the focus indicator.\n\nI'm missing some\
+    \ step because I can't consistently reproduce but I've seen it several times now\
+    \ so getting a bug on file."
+  - 'Options menu no longer accepts mouse click after changing default zoom. STR:
+    * Open Options Menu
+    * Change default zoom from 100% to 90%
+    Expected Behavior:
+    * Zoom is applied in options menu and user is able to interact further with Options
+    menu
+    Actual behavior (Firefox 77.0a1 20200407214402):
+    * Zoom is applied in options menu, but user is no longer able to interact with
+    Options menu using mouse. Clicking any button/dropdown/... no longer works.
+    Using keyboard (tab to navigate options) still works.'
+- source_sentence: '"Open Video in New Tab" should copy Mute settings STR:
+    1) Mute a tab.
+    2) Play a video on the tab.
+    3) Right click, "Open Video in New Tab"
+    4) MUSIC INSTANTLY BLASTING THROUGH THE OFFICE
+    The new tab should copy the mute settings from the original tab, particularly
+    because autoplay is enabled by default.'
+  sentences:
+  - 'The ingest logic for detecting newly enabled Suggest suggestions is broken In
+    bug 1907696 I added some logic to `BaseFeature` that tries to ingest only newly
+    enabled suggestion types when `update()` is called. It doesn''t work right because
+    by the time it [gets the old enabled suggestion types](https://searchfox.org/mozilla-central/rev/7c7e11a8e0352b0110923e86b873e4a26e3b0650/browser/components/urlbar/private/BaseFeature.sys.mjs#217-226),
+    prefs/variables have already changed, so `isRustSuggestionTypeEnabled()` reflects
+    the *new* enabled status, not the old. So later when it [tries to ingest only
+    newly enabled suggestion types](https://searchfox.org/mozilla-central/rev/7c7e11a8e0352b0110923e86b873e4a26e3b0650/browser/components/urlbar/private/BaseFeature.sys.mjs#237-249),
+    it won''t because it will think those types were already enabled.
+    This isn''t a huge problem because it only affects features that manage more than
+    one suggestion type, and only when one type was already enabled, i.e., only when
+    the feature itself was already enabled. `AdmWikipedia` is the only feature that
+    manages multiple types. So to trigger this, the user would have to have sponsored
+    enabled, nonsponsored disabled, and then turn on nonsponsored (or vice versa).
+    And even then, ingest on startup isn''t affected, so the worst case is that they
+    might end up with slightly outdated AMP or Wikipedia suggestions.'
+  - Possible to add multiple of the same tag in the Bookmarks Panel tag list betsymikal
+    was able to reproduce this (see screenshot), but I was not. needinfo'ing betsymikal
+    for steps to reproduce.
+  - '[Colorway Closet] Fix up illustration sizing Couple of things:
+    - We want to allow illustrations up to 300\*300px instead of the current 288\*288px.
+    - We currently use the **height** variable to set the max-**width** and the **width**
+    variable to set the max-**height**. We should fix that :)
+    - We also need to set a minimum container size so that layout doesn''t change
+    due to  different colorway illustrations having different sizes.'
+- source_sentence: 'Proton Menu Update Does Not Allow Easily Restoring Session User
+    Agent: Mozilla/5.0 (X11; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0
+    Steps to reproduce:
+    Enabled browser.proton.appmenu.
+    Restarted my browser.
+    Attempted to Access Restore Recent Session Item, either on the main menu, or in
+    a sub-menu.
+    Actual results:
+    I was unable to do so.
+    Expected results:
+    In order to easily facilitate restoring an old session, a menu item should be
+    included, either in the main menu, or within the history sub-menu.'
+  sentences:
+  - 'When there are fluent .orig files Storybook won''t start The [webpackInclude
+    comments](https://searchfox.org/mozilla-central/rev/a64647a2125cf3d334451051491fef6772e8eb57/browser/components/storybook/.storybook/preview.js#33,40)
+    for the fluent files is over-permissive and catching .orig files which then cause
+    webpack to throw an error and prevent storybook from starting.
+    STR
+    1. `cp browser/locales/en-US/browser/browser.ftl{,.orig}`
+    2. `./mach storybook`
+    Expected results: Storybook starts
+    Actual results: Webpack error :('
+  - '[Experiment] The “expose” event is not registered on the treatment branches of
+    the "Test Window modal vs Tab modal on about:welcome" experiment **[Affected versions]:**
+    - Firefox Release candidate 110.0 (Build ID: 20230206190557)
+    **[Affected Platforms]:**
+    - Windows 10 x64
+    - Windows 11 x64
+    **[Prerequisites]:**
+    - Have the latest version of Firefox Beta 110 installed.
+    - Have the Firefox browser pinned to Taskbar.
+    - Have the [user.js](https://drive.google.com/file/d/1rE_QlyzmNhqL598EWgoOA5Y4HsJEHamk/view?usp=share_link)
+    file saved to your PC.
+    **[Steps to reproduce]:**
+    1. Create a new Firefox profile but do not open it.
+    2. Navigate to the Firefox profile folder and paste the user.js file from the
+    prerequisites.
+    3. Open the browser using the previously created profile and the “--first-startup”
+    syntax.
+    4. Make sure that the window modal is shown.
+    5. Navigate to the “about:telemetry#events” page and search for the “expose” event.
+    **[Expected result]:**
+    - The “expose” event is displayed.
+    **[Actual result]:**
+    - No “expose” event is registered.
+    **[Notes]:**
+    - This issue is not reproducible on the Control branch of the experiment.'
+  - 'Search mode chiclet can get overloaded with Switch to tab text STR
+    1. Open Tabs search mode by typing `% ` or clicking the tabs search shortcut.
+    2. Press the down arrow until a non-remote Switch-to-Tab result is selected.
+    3. Press Esc.
+    Expected results: The search mode chiclet reads "Tabs"
+    Actual results: The search mode chiclet reads "Switch to tab:Tabs"'
+- source_sentence: "Limit address bar clipboard result for new tabs **Found in**\n\
+    * Fx 121.0a1\n\n**Affected versions**\n* Fx 121.0a1\n\n**Affected platforms**\n\
+    * Windows 10\n* Ubuntu\n* macOS\n\n**Preconditions**\n* Set browser.urlbar.clipboard.featureGate\
+    \ to true.\n\n**Steps to reproduce**\n1. Launch Firefox.\n2. Copy a website url.\n\
+    3. Click the address bar and inspect the clipboard result suggestion.\n4. Open\
+    \ a new tab and click the address bar - the clipboard result is shown.\n5. Redo\
+    \ the previous step several times.\n\n**Expected result**\n* Clipboard result\
+    \ is no longer shown after a few tries. \n\n**Actual result**\n* Clipboard result\
+    \ suggestion is displayed on every new tab. It seems that the impression for the\
+    \ clipboard result is not registered.\n\n**Regression range**\n* Not a regression.\n\
+    \n**Additional notes**\n* Clipboard result only appears twice on the same tab."
+  sentences:
+  - '[Docs] Feature callout doc incorrectly says randomize is a property of tiles
+    rather than MultiSelectItem If you search for randomize in [the doc](https://firefox-source-docs.mozilla.org/browser/components/asrouter/docs/feature-callout.html),
+    you''ll find it is on the `tiles` object. But in reality, it was moved from there
+    to `MultiSelectItem`, so that each item has its own `randomize` property, allowing
+    some items to keep a static position at the bottom/top, or more complicated layouts.
+    So we just need to reflect that in the documentation. Adding a code comment could
+    help explain how that works - I added an explanation to the summary of [the patch](https://phabricator.services.mozilla.com/D202513)
+    where I added per-item randomization, so we can use that.'
+  - 'PlacesFeed module is doing expensive work during Places notifications I took
+    a profile from the suggestion in bug 1533061
+    https://share.firefox.dev/3XWswU0
+    on notifications this is doing a "dispatch" that doesn''t look like being an actual
+    async dispatch https://searchfox.org/mozilla-central/source/browser/components/newtab/lib/PlacesFeed.jsm#64,160,178-193
+    That apparently accounts for a good 30% of the time.
+    Since this listener is pretty much always active, it is slowing down various bookmarks
+    (and maybe history) operations, it should do the minimum necessary in the notification
+    handler, and redispatch/batch any expensive work.'
+  - 'Server errors (503/429/...non-200) leave page in loading state Based on [the
+    xpcshell test](https://searchfox.org/mozilla-central/rev/7499890dc8f116a9e40f4a689a251a0311a9f461/toolkit/components/shopping/test/xpcshell/test_product.js#374-384)
+    the current expectation is that failing requests will be retried, and after a
+    number of retries, return `null` if they keep failing.
+    Unfortunately, the [actor code will then just pass the null data to the UI code](https://searchfox.org/mozilla-central/rev/7499890dc8f116a9e40f4a689a251a0311a9f461/browser/components/shopping/ShoppingSidebarChild.sys.mjs#259-262,268-277),
+    which will treat it as a "loading" state. This means that if the request fails
+    permanently, we never show anything other than the loading state.
+    We [already have generic error messaging](https://docs.google.com/spreadsheets/d/1FQUU2pqhKkAXRHNbmK9dhObwvevWoR3N6BOh3fu2skE/edit#gid=64342838&range=A50:D52)
+    for this, but it doesn''t appear that this exists in the ftl or is being used.
+    Assuming I''m not missing anything, this seems like something we should ideally
+    fix sooner rather than later. Ania?'
+- source_sentence: "Loss of bookmarks state after OS hibernation User Agent: Mozilla/5.0\
+    \ (Windows NT 6.1; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0\n\nSteps to\
+    \ reproduce:\n\nKept Firefox 88.0 open for several days, using hibernate on Windows\
+    \ inbetween.\nCreated several bookmarks using Ctrl-D and Shift-Ctrl-D both for\
+    \ private as well as non-private sites.\nAlso created some bookmarks by dragging\
+    \ from address bar into subfolders of bookmark window.\nI was wondering, that\
+    \ on some windows star icon an address bar did not become blue.\nVerified, that\
+    \ i already had bookmarked this site by pressing Ctrl-D. The appropriate bookmarks\
+    \ subfolder was shown. Star still did not become true.\nAnything else seem ok.\n\
+    Did quit Firefox using File->Quit menu command, saving session with all open tabs.\n\
+    Did start Firefox with restoring previous session using menu command.\n\n\nActual\
+    \ results:\n\nSome of the bookmarks created above are lost. Can't remember most\
+    \ of them.\nHowever i remember a few sites, which i definitely had created a bookmark\
+    \ for.\nNo bookmark exists.\nAlso there is no entry in history for these sites.\
+    \ Missing entry in history might be due to using private window. (Can't remember\
+    \ if i was using a private window for opening this site.)\n\nI repeatedly had\
+    \ lost both bookmarks and history entries some month ago and then created a new\
+    \ profile. Did not import anything from previous profiles. \n\n\nExpected results:\n\
+    \nBookmarks should have been saved as expected."
+  sentences:
+  - 'Urlbar docs: In index.rst, the Architecture Overview link in the Where to Start
+    section is broken In index.rst, the Architecture Overview link in the Where to
+    Start section links to an unrelated page in the devtools docs.'
+  - 'Messages with triggers cause trigger listeners to initialize multiple times [Currently](https://searchfox.org/mozilla-central/rev/07342ce09126c513540c1c343476e026cfa907bf/browser/components/newtab/lib/ASRouter.jsm#827)
+    we call the `.init` function of every trigger (when defined).
+    This makes sense for `openURL` where we accumulate more URLs but for others it
+    is a bug
+    * moments trigger will add multiple setIntervals
+    * whatsNew will evaluate multiple times which is wasteful).'
+  - 'Firefox provides the option to install search engines as open search even if
+    they''re already installed **Affected versions**
+    * Fx99.0a1
+    * Fx97.0.1
+    **Affected platforms**
+    * Windows 10
+    * macOS
+    * Ubuntu 20.04
+    **Preconditions**
+    Have a zh-CN build downloaded.
+    Have a user.js in the profile root folder containing the following:
+    ``` user_pref("browser.search.region", "CN"); ```
+    **Steps to reproduce**
+    1. Launch Firefox with the profile containing the user.js.
+    2. Go to about:preferences#search and add the search bar. (optional)
+    3. Using the address bar, perform a search.
+    4. Observe the search bar magnifying glass icon.
+    5. Open the address bar drop-down and inspect the one-offs section.
+    **Expected result**
+    * The shouldn''t be any indication to add an open search engine, as the default
+    baidu engine is already installed in Firefox.
+    **Actual result**
+    * There is a button to install the open search engine.
+    **Regression range**
+    * This doesn''t seem to be a recent regression as we could reproduce the issue
+    with an Fx93 build.
+    **Additional notes**
+    * If the engine is added and then clicked in the search bar drop-down it will
+    take the user to www.baidu.com
+    * This seems to be happening regardless of used locale build and with amazon as
+    well, but under a bit different steps:
+    1. Open the searchbar dropdown.
+    2. Click the amazon one-off from the dropdown
+    3. After the website is loaded inspect the  searchbar magnifying glass icon.
+    * See the screenshots for more details.'
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy
+model-index:
+- name: SentenceTransformer based on BAAI/bge-base-en
+  results:
+  - task:
+      type: triplet
+      name: Triplet
+    dataset:
+      name: bge base en train
+      type: bge-base-en-train
+    metrics:
+    - type: cosine_accuracy
+      value: 0.49041298031806946
+      name: Cosine Accuracy
+    - type: cosine_accuracy
+      value: 0.5141874551773071
+      name: Cosine Accuracy
+---
+# SentenceTransformer based on BAAI/bge-base-en
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) <!-- at revision b737bf5dcc6ee8bdc530531266b4804a5d77b5d8 -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 768 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("aaa961/finetuned-bge-base-en-firefox")
+# Run inference
+sentences = [
+    "Loss of bookmarks state after OS hibernation User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0\n\nSteps to reproduce:\n\nKept Firefox 88.0 open for several days, using hibernate on Windows inbetween.\nCreated several bookmarks using Ctrl-D and Shift-Ctrl-D both for private as well as non-private sites.\nAlso created some bookmarks by dragging from address bar into subfolders of bookmark window.\nI was wondering, that on some windows star icon an address bar did not become blue.\nVerified, that i already had bookmarked this site by pressing Ctrl-D. The appropriate bookmarks subfolder was shown. Star still did not become true.\nAnything else seem ok.\nDid quit Firefox using File->Quit menu command, saving session with all open tabs.\nDid start Firefox with restoring previous session using menu command.\n\n\nActual results:\n\nSome of the bookmarks created above are lost. Can't remember most of them.\nHowever i remember a few sites, which i definitely had created a bookmark for.\nNo bookmark exists.\nAlso there is no entry in history for these sites. Missing entry in history might be due to using private window. (Can't remember if i was using a private window for opening this site.)\n\nI repeatedly had lost both bookmarks and history entries some month ago and then created a new profile. Did not import anything from previous profiles. \n\n\nExpected results:\n\nBookmarks should have been saved as expected.",
+    'Messages with triggers cause trigger listeners to initialize multiple times [Currently](https://searchfox.org/mozilla-central/rev/07342ce09126c513540c1c343476e026cfa907bf/browser/components/newtab/lib/ASRouter.jsm#827) we call the `.init` function of every trigger (when defined).\nThis makes sense for `openURL` where we accumulate more URLs but for others it is a bug\n* moments trigger will add multiple setIntervals\n* whatsNew will evaluate multiple times which is wasteful).',
+    'Urlbar docs: In index.rst, the Architecture Overview link in the Where to Start section is broken In index.rst, the Architecture Overview link in the Where to Start section links to an unrelated page in the devtools docs.',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 768]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities)
+# tensor([[1.0000, 1.0000, 1.0000],
+#         [1.0000, 1.0000, 1.0000],
+#         [1.0000, 1.0000, 1.0000]])
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Triplet
+* Dataset: `bge-base-en-train`
+* Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
+| Metric              | Value      |
+|:--------------------|:-----------|
+| **cosine_accuracy** | **0.4904** |
+#### Triplet
+* Dataset: `bge-base-en-train`
+* Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
+| Metric              | Value      |
+|:--------------------|:-----------|
+| **cosine_accuracy** | **0.5142** |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 5,424 training samples
+* Columns: <code>texts</code> and <code>label</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | texts                                                                                | label                                                                                                  |
+  |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|
+  | type    | string                                                                               | int                                                                                                    |
+  | details | <ul><li>min: 17 tokens</li><li>mean: 211.84 tokens</li><li>max: 511 tokens</li></ul> | <ul><li>1: ~26.20%</li><li>2: ~21.70%</li><li>3: ~42.90%</li><li>4: ~0.80%</li><li>5: ~8.40%</li></ul> |
+* Samples:
+  | texts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | label          |
+  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
+  | <code>The "X" button of the opt-in modal is only read as "button" using a screen reader **[Affected Versions]:**<br>- Firefox Beta 97.0b2 (Build ID: 20220111185943)<br>- Firefox Nightly 98.a1 (Build ID: 20220111093827)<br><br>**[Affected Platforms]:**<br>- Windows 10 x64<br>- Ubuntu 20.04 x64<br>- macOS 10.15.7<br><br>**[Prerequisites]:**<br>- Have Firefox Beta 97.0b2 downloaded on your computer.<br>- Have the "browser.search.region" set to "US".<br>- Have one of the [treatment user.js](https://drive.google.com/drive/folders/1R_Kl51yPO9RCiuUmCHNOCqs3FQQAU2bD?usp=sharing) on your computer.<br>- Make sure there is no other modal displayed when starting the browser (browser default window, onboarding for new users etc).<br>- Have a screen reader application opened.<br><br>**[Steps to reproduce]:**<br>1. Open Firefox Beta 97.0b2.<br>2. Navigate to the “about:support” page and paste the user.js file into the Profile folder.<br>3. Restart the browser.<br>4. Focus on the "X" button of the modal.<br>5. Listen to what the screen reader application reads.<br><br>**[Exp...</code> | <code>1</code> |
+  | <code>Text and radio buttons are overlapping on PDF Tested with:<br>Nightly 91.0a1 (2021-06-23)<br><br>Tested on:<br>Win 10<br><br>Preconditions:<br>In about:config, set pdfjs.enableXfa = true<br><br>Steps:<br><br>1. Launch Firefox<br>2. Open the attached pdf.<br>3.  Go to "Taille de l'entreprise"<br><br>Actual result:<br>Radio buttons and text are overlapping.<br><br>Expected result:<br>Text and radio buttons should be properly displayed</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | <code>1</code> |
+  | <code>opening a new private window shows the old private window start page for a moment with browser.privatebrowsing.felt-privacy-v1, causing a bad perceived performance STR:<br><br>1. Set browser.privatebrowsing.felt-privacy-v1 to true<br>2. Open a private window<br><br>Actual:<br><br>The content of the old private window start page is visible during the window creation. This causes a very visible jump of the content and the window creation feels slow. Please see the following screencast:<br><br>https://www.youtube.com/watch?v=Fax2VqefuuY<br><br>You can change the speed to 0,25 on YouTube to see it even better.<br><br>Expected:<br><br>No felt performance issue.</code>                                                                                                                                                                                                                                                                                                                                                                                                                         | <code>3</code> |
+* Loss: [<code>BatchSemiHardTripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchsemihardtripletloss)
+### Evaluation Dataset
+#### Unnamed Dataset
+* Size: 1,162 evaluation samples
+* Columns: <code>texts</code> and <code>label</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | texts                                                                                | label                                                                                                  |
+  |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|
+  | type    | string                                                                               | int                                                                                                    |
+  | details | <ul><li>min: 16 tokens</li><li>mean: 209.86 tokens</li><li>max: 508 tokens</li></ul> | <ul><li>1: ~26.30%</li><li>2: ~21.00%</li><li>3: ~42.50%</li><li>4: ~0.70%</li><li>5: ~9.50%</li></ul> |
+* Samples:
+  | texts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | label          |
+  |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
+  | <code>(Proton) (a11y) Selected tabs almost not to see STR:<br><br>1. select multiple tabs<br><br>Expected:<br><br>You have no difficulties to see your selected tabs.<br><br>Actual:<br><br>It's almost not possible (at least to me) to see the selected tabs due to a insufficient contrast between the tab bar color and the tab color.<br><br>Marking as regression since it has a11y implications compared to pre Proton.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <code>2</code> |
+  | <code>The Not Now button from the Fakespot Onboarding sidebar is missing the Clicked State **Found in**<br>* Nightly 118.0a1 (2023-08-18)<br><br>**Affected versions**<br>* Nightly 118.0a1 (2023-08-18)<br><br>**Affected platforms**<br>* ALL<br><br>**Preconditions:**<br>Set the browser.shopping.experience2023.enabled - TRUE<br>Set the toolkit.shopping.useOHTTP - TRUE<br><br>**Steps to reproduce**<br>1. Reach about:preferences and turn off feature recommendations.<br>2. Reach the Amazon https://www.amazon.com/dp/B09B6ZXD2V/ref=sbl_dpx_office-desks_B0B4CYW8FB_0 link <br>3. Click and Hold the Not Now button from the Onboarding Shopping sidebar.<br><br>**Expected result**<br>* The Not now button from the Onboarding Shopping sidebar should change its state when Clicked.<br><br>**Actual result**<br>* The Not now button from the Onboarding Shopping sidebar is missing the Clicked state.<br><br>**Regression range**<br>Not Applicable</code> | <code>1</code> |
+  | <code>Missing data from table Tested with:<br>Nightly 91.0a1 (2021-06-22)<br><br>Tested on:<br>Win 10<br><br>Preconditions:<br>In about:config, set pdfjs.enableXfa = true<br><br>Steps:<br><br>1. Launch firefox<br>2. Open the attached PDF<br><br>Actual result:<br>No data in table is displayed<br><br>Expected result:<br>A table with data should be displayed</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | <code>1</code> |
+* Loss: [<code>BatchSemiHardTripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchsemihardtripletloss)
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 2
+- `per_device_eval_batch_size`: 2
+- `gradient_accumulation_steps`: 8
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 5
+- `warmup_ratio`: 0.1
+- `fp16`: True
+- `batch_sampler`: group_by_label
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 2
+- `per_device_eval_batch_size`: 2
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 8
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 5
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: True
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `hub_revision`: None
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `liger_kernel_config`: None
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: group_by_label
+- `multi_dataset_batch_sampler`: proportional
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
+</details>
+### Training Logs
+| Epoch  | Step | Training Loss | Validation Loss | bge-base-en-train_cosine_accuracy |
+|:------:|:----:|:-------------:|:---------------:|:---------------------------------:|
+| 0.2950 | 100  | 5.4327        | 5.0655          | 0.5050                            |
+| 0.5900 | 200  | 5.2042        | 5.0281          | 0.4919                            |
+| 0.8850 | 300  | 5.1393        | 5.0186          | 0.5009                            |
+| 1.1829 | 400  | 5.1552        | 5.0137          | 0.5004                            |
+| 1.4779 | 500  | 5.0807        | 5.0111          | 0.4989                            |
+| 1.7729 | 600  | 5.0634        | 5.0092          | 0.4982                            |
+| 2.0708 | 700  | 5.1017        | 5.0079          | 0.4985                            |
+| 2.3658 | 800  | 5.0431        | 5.0067          | 0.4913                            |
+| 2.6608 | 900  | 5.0378        | 5.0058          | 0.4888                            |
+| 2.9558 | 1000 | 5.0332        | 5.0051          | 0.4904                            |
+| 3.2537 | 1100 | 5.08          | 5.0048          | 0.4897                            |
+| 3.5487 | 1200 | 5.0277        | 5.0043          | 0.4899                            |
+| 3.8437 | 1300 | 5.0257        | 5.0041          | 0.4888                            |
+| 4.1416 | 1400 | 5.075         | 5.0039          | 0.4924                            |
+| 4.4366 | 1500 | 5.0235        | 5.0038          | 0.4937                            |
+| 4.7316 | 1600 | 5.0224        | 5.0038          | 0.4904                            |
+| -1     | -1   | -             | -               | 0.5142                            |
+### Framework Versions
+- Python: 3.10.10
+- Sentence Transformers: 5.1.0
+- Transformers: 4.55.3
+- PyTorch: 2.7.1+cu128
+- Accelerate: 1.10.0
+- Datasets: 4.0.0
+- Tokenizers: 0.21.4
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### BatchSemiHardTripletLoss
+```bibtex
+@misc{hermans2017defense,
+    title={In Defense of the Triplet Loss for Person Re-Identification},
+    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
+    year={2017},
+    eprint={1703.07737},
+    archivePrefix={arXiv},
+    primaryClass={cs.CV}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.55.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "__version__": {
+    "sentence_transformers": "5.1.0",
+    "transformers": "4.55.3",
+    "pytorch": "2.7.1+cu128"
+  },
+  "model_type": "SentenceTransformer",
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:83ba7e31b57d9ccf2d8b56a7c5142e23f05b160af6101847e8b52e2ce5a7a282
+size 437951328

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 512,
+    "do_lower_case": true
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff