SPLADE-Tiny-MSMARCO
Collection
SPLADE sparse retrieval models based on BERT-Tiny (4M) and BERT-Mini (11M) distilled from a Cross-Encoder on the MSMARCO dataset
•
2 items
•
Updated
This is a SPLADE sparse retrieval model based on BERT-Tiny (4M) that was trained by distilling a Cross-Encoder on the MSMARCO dataset. The cross-encoder used was ms-marco-MiniLM-L6-v2.
This Tiny SPLADE model beats BM25
by 65.6%
on the MSMARCO benchmark. While this model is 15x
smaller than Naver's official splade-v3-distilbert
, is posesses 80%
of it's performance on MSMARCO. This model is small enough to be used without a GPU on a dataset of a few thousand documents.
Collection:
https://huggingface.co/collections/rasyosef/splade-tiny-msmarco-687c548c0691d95babf65b70Distillation Dataset:
https://huggingface.co/datasets/yosefw/msmarco-train-distil-v2Code:
https://github.com/rasyosef/splade-tiny-msmarcoThe splade models were evaluated on 55 thousand queries and 8 million documents from the MSMARCO dataset.
Size (# Params) | MRR@10 (MS MARCO dev) | |
---|---|---|
BM25 |
- | 18.6 |
rasyosef/splade-tiny |
4.4M | 30.8 |
rasyosef/splade-mini |
11.2M | 32.8 |
naver/splade-v3-distilbert |
67.0M | 38.7 |
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SparseEncoder
# Download from the 🤗 Hub
model = SparseEncoder("rasyosef/splade-tiny")
# Run inference
queries = [
"what is eus appointment",
]
documents = [
"Endoscopic Ultrasound (EUS). You've been referred to have an endoscopic ultrasound, or EUS, which will help your doctor, evaluate or treat your condition. This brochure will give you a basic understanding of the procedure-how it is performed, how it can help, and what side effects you might experience.our doctor can use EUS to diagnose the cause of conditions such as abdominal pain or abnormal weight loss. Or, if your doctor has ruled out certain conditions, EUS can confirm your diagnosis and give you a clean bill of health.",
'About EUS (endoscopic ultrasound). An EUS, or endoscopic ultrasound, is an outpatient procedure used to closely examine the tissues in the digestive tract. The procedure is done using a standard endoscope and a tiny ultrasound device.The ultrasound sensor sends back visual images of the digestive tract to a screen, allowing the physician to see deeper into the tissues and the organs beneath the surface of the intestines.. In general, an EUS is a very safe procedure. If your procedure is being done on the upper GI tract, you may have a sore throat for a few days. As a result of the sedation, you should not drive, operate heavy machinery or make any important decisions for up to six hours following the procedure.',
'Endoscopic Ultrasound (EUS) allows your doctor to examine the lining and the walls of your upper and lower gastrointestinal tract.The upper tract is the esophagus, stomach, and duodenum; the lower tract includes your colon and rectum.Doctors also use EUS to study internal organs that lie next to the gastrointestinal tract, such as the gall bladder and the pancreas. Your endoscopist will use a thin, flexible tube called an endoscope.he upper tract is the esophagus, stomach, and duodenum; the lower tract includes your colon and rectum. Doctors also use EUS to study internal organs that lie next to the gastrointestinal tract, such as the gall bladder and the pancreas.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 30522] [3, 30522]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[12.9370, 14.3277, 12.9725]])
SparseEncoder(
(0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
(1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)
SparseInformationRetrievalEvaluator
Metric | Value |
---|---|
dot_accuracy@1 | 0.4602 |
dot_accuracy@3 | 0.7768 |
dot_accuracy@5 | 0.885 |
dot_accuracy@10 | 0.9548 |
dot_precision@1 | 0.4602 |
dot_precision@3 | 0.2653 |
dot_precision@5 | 0.1839 |
dot_precision@10 | 0.1002 |
dot_recall@1 | 0.4462 |
dot_recall@3 | 0.7631 |
dot_recall@5 | 0.8761 |
dot_recall@10 | 0.95 |
dot_ndcg@10 | 0.7094 |
dot_mrr@10 | 0.6345 |
dot_map@100 | 0.6307 |
query_active_dims | 16.7756 |
query_sparsity_ratio | 0.9995 |
corpus_active_dims | 102.4796 |
corpus_sparsity_ratio | 0.9966 |
query
, positive
, negative_1
, negative_2
, negative_3
, negative_4
, and label
query | positive | negative_1 | negative_2 | negative_3 | negative_4 | label | |
---|---|---|---|---|---|---|---|
type | string | string | string | string | string | string | list |
details |
|
|
|
|
|
|
|
query | positive | negative_1 | negative_2 | negative_3 | negative_4 | label |
---|---|---|---|---|---|---|
could Nexium antacid cause sweating |
Summary: Sweating-excessive is found among people who take Nexium, especially for people who are 60+ old, have been taking the drug for.Personalized health information: on eHealthMe you can find out what patients like me (same gender, age) reported their drugs and conditions on FDA and social media since 1977. I am a 56 year old female who has been taking Nexium for 13 years and has been plagued by shingles.. 2 Support group for people who have Sweating-Excessive. 3 Been on warfarin for 6 days and having sweating at times. |
More questions for: Nexium, Sweating-excessive. You may be interested at these reviews (Write a review): 1 Xarelto caused shortness of breath. 2 After taking Xarelto for 3 years I suddently experienced shortness of breath, sweating and pain in my arms. 3 Myrbetriq & hyperhidrosis (night sweats). I am a 56 year old female who has been taking Nexium for 13 years and has been plagued by shingles.. 2 Support group for people who have Sweating-Excessive. 3 Been on warfarin for 6 days and having sweating at times. |
NEXIUM may help your acid-related symptoms, but you could still have serious stomach problems. Talk with your doctor. NEXIUM can cause serious side effects, including: 1 Diarrhea. 2 NEXIUM may increase your risk of getting severe diarrhea.3 This diarrhea may be caused by an infection (Clostridium difficile) in your intestines.EXIUM can cause serious side effects, including: 1 Diarrhea. 2 NEXIUM may increase your risk of getting severe diarrhea. 3 This diarrhea may be caused by an infection (Clostridium difficile) in your intestines. |
Treatment for sweating. The treatment you have will depend on the cause of your sweating. If you have an infection, antibiotics will treat the infection and stop the sweating. If your sweating is due to cancer, treating the cancer can get rid of the sweating.If you have sweating because treatment has changed your hormone levels, it may settle down after a few weeks or months, once your body is used to the treatment. Talk to your doctor or nurse about your sweats.nfection. Infection is one of the most common causes of sweating in people who have cancer. Infection can give you a high temperature and your body sweats to try and reduce it. Treating the infection can control or stop the sweating. |
Esomeprazole is used to treat certain stomach and esophagus problems (such as acid reflux, ulcers). It works by decreasing the amount of acid your stomach makes.ide Effects. See also Precautions section. Headache or abdominal pain may occur. If any of these effects persist or worsen, tell your doctor or pharmacist promptly. Remember that your doctor has prescribed this medication because he or she has judged that the benefit to you is greater than the risk of side effects. |
[0.5, 6.390576362609863, 11.97206974029541, 16.409034729003906] |
what is electronic document access |
Electronic Document Access (EDA) is a web-based system that provides secure online access, storage, and retrieval of contracts, contract modifications, Government Bills of Lading (GBLs), DFAS Transactions for Others (E110), vouchers, and Contract Deficiency Reports (CDR) to authorized users throughout the Department of Defense (DoD). |
An electronic document management system (EDMS) is a software system for organizing and storing different kinds of documents. This type of system is a more particular kind of document management system, a more general type of storage system that helps users to organize and store paper or digital documents. |
In many cases, the specific documentation for original storage protocols is a major part of what makes an electronic document management system so valuable to a business or organization. |
Benefits derived from DoD EDA include: 1 Single-source, timely information. 2 Electronic search and retrieval – 24/7 access/retrieval capability. 3 Increased visibility of all procurement & payment actions. Reduction in data entry/human 1 error. Lower postage, handling, retention and document management costs. |
If YES, go to www.docusign.net and log in with your email and password. On the DocuSign Web Application, select the Documents tab. Your documents are listed there. If NO, you can access the document by opening the DocuSign Completed email. This email is sent to you once you have finished signing a DocuSign document. See the instructions below. Note: In some cases, your documents might be attached to the Completed email. 1. Open the DocuSign Completed email. |
[4.681269645690918, 9.322907447814941, 14.813400268554688, 20.356698989868164] |
does hpv cause uti |
So now you get in the acidic environment can hpv cause urinary tract infection for the area of the blockage of the fruits and fiber as a completely eliminate urinate at all. Spending money on prescription of antibiotics will kill all of the bacterial infection keeps happening to your veterinarian will work to cure the condition. |
HPV & Urinary Tract Infections. Human Papillomavirus (HPV) is a group of viruses that can cause warts and cancers of the cervix, anus and genitals. Urinary tract infection (UTI) occurs when bacteria multiply within the bladder, causing pain and urinary urgency. (Thomas Northcut/Digital Vision/Getty Images) Other People Are Reading. |
Some types of the HPV virus can infect the genital epithelial cells (skin and mucous membranes). Some types of HPV virus cause warts that appear on the genitals (vagina, vulva, penis, etc.) and anus of women and men. |
Most women with HPV have no signs of infection. Since most HPV infections go away on their own within two years, many women never know they had an infection. Some HPV infections cause genital warts that can be seen or felt. The only way to know if you have HPV is to ask your health care provider to do an HPV test. |
Genital warts are caused by low-risk types of human papillomavirus (HPV). These viruses may not cause warts in everyone. Women can get genital warts from sexual contact with someone who has HPV. Genital warts are spread by skin-to-skin contact, usually from contact with the warts. It can be spread by vaginal, anal, oral, or handgenital sexual contact. Genital warts will spread HPV while visible, and after recent treatment. |
[0.5, 2.4958395957946777, 3.76273775100708, 4.114340305328369] |
SpladeLoss
with these parameters:{
"loss": "SparseMarginMSELoss",
"document_regularizer_weight": 0.3,
"query_regularizer_weight": 0.5
}
eval_strategy
: epochper_device_train_batch_size
: 48per_device_eval_batch_size
: 48learning_rate
: 8e-05num_train_epochs
: 6lr_scheduler_type
: cosinewarmup_ratio
: 0.025fp16
: Trueload_best_model_at_end
: Trueoptim
: adamw_torch_fusedoverwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 48per_device_eval_batch_size
: 48per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 8e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 6max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.025warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
: auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportionalrouter_mapping
: {}learning_rate_mapping
: {}Epoch | Step | Training Loss | dot_ndcg@10 |
---|---|---|---|
1.0 | 10336 | 16309.8824 | 0.6698 |
2.0 | 20672 | 14.4047 | 0.6920 |
3.0 | 31008 | 13.0742 | 0.7004 |
4.0 | 41344 | 11.8023 | 0.7060 |
5.0 | 51680 | 11.0464 | 0.7085 |
6.0 | 62016 | 10.6766 | 0.7094 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{formal2022distillationhardnegativesampling,
title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
year={2022},
eprint={2205.04733},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2205.04733},
}
@misc{hofstätter2021improving,
title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
year={2021},
eprint={2010.02666},
archivePrefix={arXiv},
primaryClass={cs.IR}
}
@article{paria2020minimizing,
title={Minimizing flops to learn efficient sparse representations},
author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
journal={arXiv preprint arXiv:2004.05665},
year={2020}
}