rasyosef commited on
Commit
962fd50
·
verified ·
1 Parent(s): c2f9933

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -47
README.md CHANGED
@@ -9,30 +9,37 @@ tags:
9
  - loss:SpladeLoss
10
  - loss:SparseMarginMSELoss
11
  - loss:FlopsLoss
12
- base_model: yosefw/SPLADE-BERT-Tiny-BS256
 
13
  widget:
14
- - text: 'Most Referenced:report - Return to the USDOJ/OIG Home Page - US Department
15
- of JusticeReturn to the USDOJ/OIG Home Page - US Department of Justice. Opinion:Roberts:
16
- Feds to stop using private prisons.'
17
- - text: 'Paul O''Neill, the founder of the Trans-Siberian Orchestra (pictured) has
18
- died at age 61. Paul O''Neill, the founder of the popular Christmas-themed rock
19
- ensemble Trans-Siberian Orchestra has died. A statement on the group''s Facebook
20
- page reads: The entire Trans-Siberian Orchestra family, past and present, is heartbroken
21
- to share the devastating news that Paul O’Neill has passed away from chronic illness.'
 
 
 
22
  - text: meaning for concern
23
- - text: 'Additional Tips. 1 Do not rub the ink stains as it can spread the stains
24
- further. 2 Make sure you test the cleaning solution on a small, hidden area to
25
- check if it is suitable for the material. 3 In case an ink stain has become old
26
- and dried, the above mentioned home remedies may not be effective.arpet: For ink
27
- stained spots on a carpet, you may apply a paste of cornstarch and milk. Leave
28
- it for a few hours before brushing it off. Finally, clean the residue with a vacuum
29
- cleaner. Leather: Try using a leather shampoo or a leather ink remover for removing
30
- ink stains from leather items.'
31
- - text: 'See below: 1. Get your marriage license. Before you can change your name,
32
- you''ll need the original (or certified) marriage license with the raised seal
33
- and your new last name on it. Call the clerk''s office where your license was
34
- filed to get copies if one wasn''t automatically sent to you. 2. Change your Social
35
- Security card.'
 
 
 
36
  pipeline_tag: feature-extraction
37
  library_name: sentence-transformers
38
  metrics:
@@ -122,38 +129,32 @@ model-index:
122
  - type: corpus_sparsity_ratio
123
  value: 0.9960142510179831
124
  name: Corpus Sparsity Ratio
 
 
 
 
125
  ---
126
 
127
  # SPLADE Sparse Encoder
128
 
129
- This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [yosefw/SPLADE-BERT-Tiny-BS256](https://huggingface.co/yosefw/SPLADE-BERT-Tiny-BS256) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
130
- ## Model Details
131
 
132
- ### Model Description
133
- - **Model Type:** SPLADE Sparse Encoder
134
- - **Base model:** [yosefw/SPLADE-BERT-Tiny-BS256](https://huggingface.co/yosefw/SPLADE-BERT-Tiny-BS256) <!-- at revision 239bb34bbfcf6cc8b465eb5b94c76a20c574b47f -->
135
- - **Maximum Sequence Length:** 512 tokens
136
- - **Output Dimensionality:** 30522 dimensions
137
- - **Similarity Function:** Dot Product
138
- <!-- - **Training Dataset:** Unknown -->
139
- <!-- - **Language:** Unknown -->
140
- <!-- - **License:** Unknown -->
141
 
142
- ### Model Sources
 
 
143
 
144
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
145
- - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
146
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
147
- - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
148
 
149
- ### Full Model Architecture
150
 
151
- ```
152
- SparseEncoder(
153
- (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
154
- (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
155
- )
156
- ```
157
 
158
  ## Usage
159
 
@@ -170,7 +171,7 @@ Then you can load this model and run inference.
170
  from sentence_transformers import SparseEncoder
171
 
172
  # Download from the 🤗 Hub
173
- model = SparseEncoder("yosefw/SPLADE-BERT-Tiny-BS256-distil-v3")
174
  # Run inference
175
  queries = [
176
  "what do i need to change my name on my license in ma",
@@ -215,6 +216,37 @@ You can finetune this model on your own dataset.
215
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
216
  -->
217
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
218
  ## Evaluation
219
 
220
  ### Metrics
@@ -510,4 +542,5 @@ You can finetune this model on your own dataset.
510
  ## Model Card Contact
511
 
512
  *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
513
- -->
 
 
9
  - loss:SpladeLoss
10
  - loss:SparseMarginMSELoss
11
  - loss:FlopsLoss
12
+ base_model:
13
+ - prajjwal1/bert-tiny
14
  widget:
15
+ - text: >-
16
+ Most Referenced:report - Return to the USDOJ/OIG Home Page - US Department
17
+ of JusticeReturn to the USDOJ/OIG Home Page - US Department of Justice.
18
+ Opinion:Roberts: Feds to stop using private prisons.
19
+ - text: >-
20
+ Paul O'Neill, the founder of the Trans-Siberian Orchestra (pictured) has
21
+ died at age 61. Paul O'Neill, the founder of the popular Christmas-themed
22
+ rock ensemble Trans-Siberian Orchestra has died. A statement on the group's
23
+ Facebook page reads: The entire Trans-Siberian Orchestra family, past and
24
+ present, is heartbroken to share the devastating news that Paul O’Neill has
25
+ passed away from chronic illness.
26
  - text: meaning for concern
27
+ - text: >-
28
+ Additional Tips. 1 Do not rub the ink stains as it can spread the stains
29
+ further. 2 Make sure you test the cleaning solution on a small, hidden area
30
+ to check if it is suitable for the material. 3 In case an ink stain has
31
+ become old and dried, the above mentioned home remedies may not be
32
+ effective.arpet: For ink stained spots on a carpet, you may apply a paste of
33
+ cornstarch and milk. Leave it for a few hours before brushing it off.
34
+ Finally, clean the residue with a vacuum cleaner. Leather: Try using a
35
+ leather shampoo or a leather ink remover for removing ink stains from
36
+ leather items.
37
+ - text: >-
38
+ See below: 1. Get your marriage license. Before you can change your name,
39
+ you'll need the original (or certified) marriage license with the raised
40
+ seal and your new last name on it. Call the clerk's office where your
41
+ license was filed to get copies if one wasn't automatically sent to you. 2.
42
+ Change your Social Security card.
43
  pipeline_tag: feature-extraction
44
  library_name: sentence-transformers
45
  metrics:
 
129
  - type: corpus_sparsity_ratio
130
  value: 0.9960142510179831
131
  name: Corpus Sparsity Ratio
132
+ datasets:
133
+ - microsoft/ms_marco
134
+ language:
135
+ - en
136
  ---
137
 
138
  # SPLADE Sparse Encoder
139
 
140
+ This is a SPLADE sparse retrieval model based on BERT-Tiny (4M) that was trained by distilling a Cross-Encoder on the MSMARCO dataset. The cross-encoder used was [ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2).
 
141
 
142
+ This Tiny SPLADE model beats `BM25` by `65.6%` on the MSMARCO benchmark. While this model is `15x` smaller than Naver's official `splade-v3-distilbert`, is posesses `80%` of it's performance on MSMARCO. This model is small enough to be used without a GPU on a dataset of a few thousand documents.
 
 
 
 
 
 
 
 
143
 
144
+ - `Collection:` https://huggingface.co/collections/rasyosef/splade-tiny-msmarco-687c548c0691d95babf65b70
145
+ - `Distillation Dataset:` https://huggingface.co/datasets/yosefw/msmarco-train-distil-v2
146
+ - `Code:` https://github.com/rasyosef/splade-tiny-msmarco
147
 
148
+ ## Performance
 
 
 
149
 
150
+ The splade models were evaluated on 55 thousand queries and 8.84 million documents from the [MSMARCO](https://huggingface.co/datasets/microsoft/ms_marco) dataset.
151
 
152
+ ||Size (# Params)|MRR@10 (MS MARCO dev)|
153
+ |:---|:----|:-------------------|
154
+ |`BM25`|-|18.6|-|-|
155
+ |`rasyosef/splade-tiny`|4.4M|30.9|
156
+ |`rasyosef/splade-mini`|11.2M|33.2|
157
+ |`naver/splade-v3-distilbert`|67.0M|38.7|
158
 
159
  ## Usage
160
 
 
171
  from sentence_transformers import SparseEncoder
172
 
173
  # Download from the 🤗 Hub
174
+ model = SparseEncoder("rasyosef/splade-tiny")
175
  # Run inference
176
  queries = [
177
  "what do i need to change my name on my license in ma",
 
216
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
217
  -->
218
 
219
+ ## Model Details
220
+
221
+ ### Model Description
222
+ - **Model Type:** SPLADE Sparse Encoder
223
+ - **Base model:** [prajjwal1/bert-tiny](https://huggingface.co/prajjwal1/bert-tiny)
224
+ - **Maximum Sequence Length:** 512 tokens
225
+ - **Output Dimensionality:** 30522 dimensions
226
+ - **Similarity Function:** Dot Product
227
+ <!-- - **Training Dataset:** Unknown -->
228
+ <!-- - **Language:** Unknown -->
229
+ <!-- - **License:** Unknown -->
230
+
231
+ ### Model Sources
232
+
233
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
234
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
235
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
236
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
237
+
238
+ ### Full Model Architecture
239
+
240
+ ```
241
+ SparseEncoder(
242
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
243
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
244
+ )
245
+ ```
246
+
247
+ ## More
248
+ <details><summary>Click to expand</summary>
249
+
250
  ## Evaluation
251
 
252
  ### Metrics
 
542
  ## Model Card Contact
543
 
544
  *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
545
+ -->
546
+ </details>