rasyosef
/

splade-tiny

@@ -9,30 +9,37 @@ tags:
 - loss:SpladeLoss
 - loss:SparseMarginMSELoss
 - loss:FlopsLoss
-base_model: yosefw/SPLADE-BERT-Tiny-BS256
 widget:
-- text: 'Most Referenced:report - Return to the USDOJ/OIG Home Page - US Department
-    of JusticeReturn to the USDOJ/OIG Home Page - US Department of Justice. Opinion:Roberts:
-    Feds to stop using private prisons.'
-- text: 'Paul O''Neill, the founder of the Trans-Siberian Orchestra (pictured) has
-    died at age 61. Paul O''Neill, the founder of the popular Christmas-themed rock
-    ensemble Trans-Siberian Orchestra has died. A statement on the group''s Facebook
-    page reads: The entire Trans-Siberian Orchestra family, past and present, is heartbroken
-    to share the devastating news that Paul O’Neill has passed away from chronic illness.'
 - text: meaning for concern
-- text: 'Additional Tips. 1  Do not rub the ink stains as it can spread the stains
-    further. 2  Make sure you test the cleaning solution on a small, hidden area to
-    check if it is suitable for the material. 3  In case an ink stain has become old
-    and dried, the above mentioned home remedies may not be effective.arpet: For ink
-    stained spots on a carpet, you may apply a paste of cornstarch and milk. Leave
-    it for a few hours before brushing it off. Finally, clean the residue with a vacuum
-    cleaner. Leather: Try using a leather shampoo or a leather ink remover for removing
-    ink stains from leather items.'
-- text: 'See below: 1. Get your marriage license. Before you can change your name,
-    you''ll need the original (or certified) marriage license with the raised seal
-    and your new last name on it. Call the clerk''s office where your license was
-    filed to get copies if one wasn''t automatically sent to you. 2. Change your Social
-    Security card.'
 pipeline_tag: feature-extraction
 library_name: sentence-transformers
 metrics:
@@ -122,38 +129,32 @@ model-index:
     - type: corpus_sparsity_ratio
       value: 0.9960142510179831
       name: Corpus Sparsity Ratio
 ---
 # SPLADE Sparse Encoder
-This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [yosefw/SPLADE-BERT-Tiny-BS256](https://huggingface.co/yosefw/SPLADE-BERT-Tiny-BS256) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space   and can be used for semantic search and sparse retrieval.
-## Model Details
-### Model Description
-- **Model Type:** SPLADE Sparse Encoder
-- **Base model:** [yosefw/SPLADE-BERT-Tiny-BS256](https://huggingface.co/yosefw/SPLADE-BERT-Tiny-BS256) <!-- at revision 239bb34bbfcf6cc8b465eb5b94c76a20c574b47f -->
-- **Maximum Sequence Length:** 512 tokens
-- **Output Dimensionality:** 30522 dimensions
-- **Similarity Function:** Dot Product
-<!-- - **Training Dataset:** Unknown -->
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
-### Model Sources
-- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
-- **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
-- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
-- **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
-### Full Model Architecture
-```
-SparseEncoder(
-  (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
-  (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
-)
-```
 ## Usage
@@ -170,7 +171,7 @@ Then you can load this model and run inference.
 from sentence_transformers import SparseEncoder
 # Download from the 🤗 Hub
-model = SparseEncoder("yosefw/SPLADE-BERT-Tiny-BS256-distil-v3")
 # Run inference
 queries = [
     "what do i need to change my name on my license in ma",
@@ -215,6 +216,37 @@ You can finetune this model on your own dataset.
 *List how the model may foreseeably be misused and address what users ought not to do with the model.*
 -->
 ## Evaluation
 ### Metrics
@@ -510,4 +542,5 @@ You can finetune this model on your own dataset.
 ## Model Card Contact
 *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--->

 - loss:SpladeLoss
 - loss:SparseMarginMSELoss
 - loss:FlopsLoss
+base_model:
+- prajjwal1/bert-tiny
 widget:
+- text: >-
+    Most Referenced:report - Return to the USDOJ/OIG Home Page - US Department
+    of JusticeReturn to the USDOJ/OIG Home Page - US Department of Justice.
+    Opinion:Roberts: Feds to stop using private prisons.
+- text: >-
+    Paul O'Neill, the founder of the Trans-Siberian Orchestra (pictured) has
+    died at age 61. Paul O'Neill, the founder of the popular Christmas-themed
+    rock ensemble Trans-Siberian Orchestra has died. A statement on the group's
+    Facebook page reads: The entire Trans-Siberian Orchestra family, past and
+    present, is heartbroken to share the devastating news that Paul O’Neill has
+    passed away from chronic illness.
 - text: meaning for concern
+- text: >-
+    Additional Tips. 1  Do not rub the ink stains as it can spread the stains
+    further. 2  Make sure you test the cleaning solution on a small, hidden area
+    to check if it is suitable for the material. 3  In case an ink stain has
+    become old and dried, the above mentioned home remedies may not be
+    effective.arpet: For ink stained spots on a carpet, you may apply a paste of
+    cornstarch and milk. Leave it for a few hours before brushing it off.
+    Finally, clean the residue with a vacuum cleaner. Leather: Try using a
+    leather shampoo or a leather ink remover for removing ink stains from
+    leather items.
+- text: >-
+    See below: 1. Get your marriage license. Before you can change your name,
+    you'll need the original (or certified) marriage license with the raised
+    seal and your new last name on it. Call the clerk's office where your
+    license was filed to get copies if one wasn't automatically sent to you. 2.
+    Change your Social Security card.
 pipeline_tag: feature-extraction
 library_name: sentence-transformers
 metrics:
     - type: corpus_sparsity_ratio
       value: 0.9960142510179831
       name: Corpus Sparsity Ratio
+datasets:
+- microsoft/ms_marco
+language:
+- en
 ---
 # SPLADE Sparse Encoder
+This is a SPLADE sparse retrieval model based on BERT-Tiny (4M) that was trained by distilling a Cross-Encoder on the MSMARCO dataset. The cross-encoder used was [ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2).
+This Tiny SPLADE model beats `BM25` by `65.6%` on the MSMARCO benchmark. While this model is `15x` smaller than Naver's official `splade-v3-distilbert`, is posesses `80%` of it's performance on MSMARCO. This model is small enough to be used without a GPU on a dataset of a few thousand documents.
+- `Collection:` https://huggingface.co/collections/rasyosef/splade-tiny-msmarco-687c548c0691d95babf65b70
+- `Distillation Dataset:` https://huggingface.co/datasets/yosefw/msmarco-train-distil-v2
+- `Code:` https://github.com/rasyosef/splade-tiny-msmarco
+## Performance
+The splade models were evaluated on 55 thousand queries and 8.84 million documents from the [MSMARCO](https://huggingface.co/datasets/microsoft/ms_marco) dataset.
+||Size (# Params)|MRR@10 (MS MARCO dev)|
+|:---|:----|:-------------------|
+|`BM25`|-|18.6|-|-|
+|`rasyosef/splade-tiny`|4.4M|30.9|
+|`rasyosef/splade-mini`|11.2M|33.2|
+|`naver/splade-v3-distilbert`|67.0M|38.7|
 ## Usage
 from sentence_transformers import SparseEncoder
 # Download from the 🤗 Hub
+model = SparseEncoder("rasyosef/splade-tiny")
 # Run inference
 queries = [
     "what do i need to change my name on my license in ma",
 *List how the model may foreseeably be misused and address what users ought not to do with the model.*
 -->
+## Model Details
+### Model Description
+- **Model Type:** SPLADE Sparse Encoder
+- **Base model:** [prajjwal1/bert-tiny](https://huggingface.co/prajjwal1/bert-tiny)
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 30522 dimensions
+- **Similarity Function:** Dot Product
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
+### Full Model Architecture
+```
+SparseEncoder(
+  (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
+  (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
+)
+```
+## More
+<details><summary>Click to expand</summary>
 ## Evaluation
 ### Metrics
 ## Model Card Contact
 *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->
+</details>