SentenceTransformer based on facebook/esm2_t6_8M_UR50D

This is a sentence-transformers model finetuned from facebook/esm2_t6_8M_UR50D. It maps sentences & paragraphs to a 320-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: facebook/esm2_t6_8M_UR50D
  • Maximum Sequence Length: 1026 tokens
  • Output Dimensionality: 320 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1026, 'do_lower_case': False}) with Transformer model: EsmModel 
  (1): Pooling({'word_embedding_dimension': 320, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("HassanCS/TCRb_HLA_peptide_esm2_t6_8M_UR50D_up_to_epoch_8")
# Run inference
sentences = [
    'D A G V T Q S P T H L I K T R G Q Q V T L R C S P I S G H K S V S W Y Q Q V L G Q G P Q F I F Q Y Y E K E E R G R G N F P D R F S A R Q F P N Y S S E L N V N A L L L G D S A L Y L C C A S S P G T D Y G Y T F F G S G T R L T V V E',
    'E T G V T Q S P T H L I K T R G Q Q V T L R C S S Q S G H N T V S W Y Q Q A L G Q G P Q F I F Q Y Y R E E E N G R G N F P P R F S G L Q F P N Y S S E L N V N A L E L D D S A L Y L C C A S S S R T S G I N E Q F F F G P G T R L T V L E',
    'G A G V S Q S L R H K V A K K G K D V A L R Y D P I S G H N A L Y W Y R Q S L G Q G L E F P I Y F Q G K D A A D K S G L P R D R F S A Q R S E G S I S T L K F Q R T Q Q G D L A V Y L C A S S S T R G S R G E Q F F G P G T R L T V L E',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 320]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9311
spearman_cosine 0.9742

Training Details

Training Dataset

Unnamed Dataset

  • Size: 504,071 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 111 tokens
    • mean: 117.87 tokens
    • max: 125 tokens
    • min: 109 tokens
    • mean: 117.88 tokens
    • max: 132 tokens
    • min: 0.0
    • mean: 0.36
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    N A G V T Q T P K F Q V L K T G Q S M T L Q C A Q D M N H N S M Y W Y R Q D P G M G L R L I Y Y S A S E G T T D K G E V P N G Y N V S R L N K R E F S L R L E S A A P S Q T S V Y F C A S R S G S G T N Y N E Q F F G P G T R L T V L E G A V V S Q H P S W V I C K S G T S V K I E C R S L D F Q A T T M F W Y R Q F P K Q S L M L M A T S N E G S K A T Y E Q G V E K D K F L I N H A S L T L S T L T V T S A H P E D S S F Y I C S A P T S G G H N E Q F G P G T R L T V L E 0.36983471074380164
    D T G V S Q N P R H K I T K R G Q N V T F R C D P I S E H N R L Y W Y R Q T L G Q G P E F L T Y F Q N E A Q L E K S R L L S D R F S A E R P K G S F S T L E I Q R T E Q G D S A M Y L C A S S L I Q G A S W G Y T F G S G T R L T V V E N A G V T Q T P K F Q V L K T G Q S M T L Q C A Q D M N H E Y M S W Y R Q D P G M G L R L I H Y S V G A G I T D Q G E V P N G Y N V S R S T T E D F P L R L L S A A P S Q T S V Y F C A S S S L D G N Y G Y T F G S G T R L T V V E 0.8450413223140495
    D V K V T Q S S R Y L V K R T G E K V F L E C V Q D M D H E N M F W Y R Q D P G L G L R L I Y F S Y D V K M K E K G D I P E G Y S V S R E K K E R F S L I L E S A S T N Q T S M Y L C C A S R V R D R G R L D Y G Y T F F G S G T R L T V V E D G G I T Q S P K Y L F R K E G Q N V T L S C E Q N L N H D A M Y W Y R Q V P G Q G L R L I Y Y S H I V N D F Q K G D I A E G Y S V S R E K K E S F P L T V T S A Q K N P T A F Y L C C A S S S R S G N E K L F F F G S G T Q L S V L E 0.8347107438016529
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 56,008 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 110 tokens
    • mean: 117.92 tokens
    • max: 125 tokens
    • min: 111 tokens
    • mean: 117.93 tokens
    • max: 127 tokens
    • min: 0.0
    • mean: 0.38
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    N A G V T Q T P K F Q V L K T G Q S M T L Q C A Q D M N H E Y M S W Y R Q D P G M G L R L I H Y S V G A G I T D Q G E V P N G Y N V S R S T T E D F P L R L L S A A P S Q T S V Y F C C A S S P I T G T G I Y G Y T F F G S G T R L T V V E D G G I T Q S P K Y L F R K E G Q N V T L S C E Q N L N H D A M Y W Y R Q D P G Q G L R L I Y Y S Q I V N D F Q K G D I A E G Y S V S R E K K E S F P L T V T S A Q K N P T A F Y L C C A S S M I P D M N T E A F F F G Q G T R L T V V E 0.03925619834710743
    G A V V S Q H P S W V I C K S G T S V K I E C R S L D F Q A T T M F W Y R Q F P K Q S L M L M A T S N E G S K A T Y E Q G V E K D K F L I N H A S L T L S T L T V T S A H P E D S S F Y I C S A R D S T G N G Y T F G S G T R L T V V E S A V I S Q K P S R D I C Q R G T S L T I Q C Q V D S Q V T M M F W Y R Q Q P G Q S L T L I A T A N Q G S E A T Y E S G F V I D K F P I S R P N L T F S T L T V S N M S P E D S S I Y L C S V G T G G T N E K L F F G Q G T R L T V V E 0.8347107438016529
    D A R V T Q T P R H K V T E M G Q E V T M R C Q P I L G H N T V F W Y R Q T M M Q G L E L L A Y F R N R A P L D D S G M P K D R F S A E M P D A T L A T L K I Q P S E P R D S A V Y F C A S G T G E G S Y N E Q F F G P G T R L T V L E D A R V T Q T P R H K V T E M G Q E V T M R C Q P I L G H N T V F W Y R Q T M M Q G L E L L A Y F R N R A P L D D S G M P K D R F S A E M P D A T L A T L K I Q P S E P R D S A V Y F C A S G D Y G N R G P Y S N Q P Q H F G D G T R L S I L E 0.07024793388429751
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 0.001
  • weight_decay: 0.0001
  • num_train_epochs: 8
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.001
  • weight_decay: 0.0001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 8
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss all-dev_spearman_cosine
0.0508 100 10.3431 - -
0.1015 200 10.3239 - -
0.1523 300 10.3059 - -
0.2030 400 10.2992 - -
0.2538 500 10.2805 - -
0.3046 600 10.2669 - -
0.3553 700 10.2524 - -
0.4061 800 10.2405 - -
0.4569 900 10.2277 - -
0.5076 1000 10.2183 - -
0.5584 1100 10.1955 - -
0.6091 1200 10.1802 - -
0.6599 1300 10.1639 - -
0.7107 1400 10.1569 - -
0.7614 1500 10.142 - -
0.8122 1600 10.1199 - -
0.8629 1700 10.1018 - -
0.9137 1800 10.0895 - -
0.9645 1900 10.0613 - -
1.0 1970 - 10.0420 0.7603
1.0152 2000 9.9671 - -
1.0660 2100 9.9951 - -
1.1168 2200 9.984 - -
1.1675 2300 9.9659 - -
1.2183 2400 9.9412 - -
1.2690 2500 9.924 - -
1.3198 2600 9.9016 - -
1.3706 2700 9.8786 - -
1.4213 2800 9.8664 - -
1.4721 2900 9.8448 - -
1.5228 3000 9.8323 - -
1.5736 3100 9.8085 - -
1.6244 3200 9.7986 - -
1.6751 3300 9.7862 - -
1.7259 3400 9.7621 - -
1.7766 3500 9.75 - -
1.8274 3600 9.7384 - -
1.8782 3700 9.721 - -
1.9289 3800 9.7194 - -
1.9797 3900 9.7179 - -
2.0 3940 - 9.7322 0.8905
2.0305 4000 9.7386 - -
2.0812 4100 9.7514 - -
2.1320 4200 9.7336 - -
2.1827 4300 9.7278 - -
2.2335 4400 9.7203 - -
2.2843 4500 9.6991 - -
2.3350 4600 9.6859 - -
2.3858 4700 9.6665 - -
2.4365 4800 9.6652 - -
2.4873 4900 9.6321 - -
2.5381 5000 9.6195 - -
2.5888 5100 9.6006 - -
2.6396 5200 9.5913 - -
2.6904 5300 9.5792 - -
2.7411 5400 9.5701 - -
2.7919 5500 9.562 - -
2.8426 5600 9.5474 - -
2.8934 5700 9.5147 - -
2.9442 5800 9.5161 - -
2.9949 5900 9.5074 - -
3.0 5910 - 9.5105 0.9338
3.0457 6000 9.3679 - -
3.0964 6100 9.4141 - -
3.1472 6200 9.3998 - -
3.1980 6300 9.3777 - -
3.2487 6400 9.3718 - -
3.2995 6500 9.3744 - -
3.3503 6600 9.3661 - -
3.4010 6700 9.3472 - -
3.4518 6800 9.3239 - -
3.5025 6900 9.3358 - -
3.5533 7000 9.3072 - -
3.6041 7100 9.3102 - -
3.6548 7200 9.29 - -
3.7056 7300 9.3095 - -
3.7563 7400 9.2874 - -
3.8071 7500 9.2643 - -
3.8579 7600 9.259 - -
3.9086 7700 9.2706 - -
3.9594 7800 9.2403 - -
4.0 7880 - 9.3542 0.9535
4.0102 7900 9.2373 - -
4.0609 8000 9.3228 - -
4.1117 8100 9.3337 - -
4.1624 8200 9.3371 - -
4.2132 8300 9.3342 - -
4.2640 8400 9.3354 - -
4.3147 8500 9.32 - -
4.3655 8600 9.3151 - -
4.4162 8700 9.3038 - -
4.4670 8800 9.2938 - -
4.5178 8900 9.281 - -
4.5685 9000 9.285 - -
4.6193 9100 9.2787 - -
4.6701 9200 9.2665 - -
4.7208 9300 9.2467 - -
4.7716 9400 9.2345 - -
4.8223 9500 9.2412 - -
4.8731 9600 9.2245 - -
4.9239 9700 9.2366 - -
4.9746 9800 9.2023 - -
5.0 9850 - 9.3057 0.9580
5.0254 9900 9.0698 - -
5.0761 10000 9.1091 - -
5.1269 10100 9.1081 - -
5.1777 10200 9.1166 - -
5.2284 10300 9.1056 - -
5.2792 10400 9.098 - -
5.3299 10500 9.1041 - -
5.3807 10600 9.0786 - -
5.4315 10700 9.0669 - -
5.4822 10800 9.0534 - -
5.5330 10900 9.0634 - -
5.5838 11000 9.0708 - -
5.6345 11100 9.048 - -
5.6853 11200 9.0551 - -
5.7360 11300 9.0384 - -
5.7868 11400 9.0244 - -
5.8376 11500 9.0145 - -
5.8883 11600 9.0065 - -
5.9391 11700 9.0116 - -
5.9898 11800 8.9942 - -
6.0 11820 - 9.2116 0.9678
6.0406 11900 9.0665 - -
6.0914 12000 9.0975 - -
6.1421 12100 9.115 - -
6.1929 12200 9.1014 - -
6.2437 12300 9.1229 - -
6.2944 12400 9.1035 - -
6.3452 12500 9.0862 - -
6.3959 12600 9.0889 - -
6.4467 12700 9.0783 - -
6.4975 12800 9.0815 - -
6.5482 12900 9.066 - -
6.5990 13000 9.0758 - -
6.6497 13100 9.0685 - -
6.7005 13200 9.056 - -
6.7513 13300 9.058 - -
6.8020 13400 9.0499 - -
6.8528 13500 9.0246 - -
6.9036 13600 9.0354 - -
6.9543 13700 9.0156 - -
7.0 13790 - 9.1765 0.9688
7.0051 13800 8.9387 - -
7.0558 13900 8.9268 - -
7.1066 14000 8.9435 - -
7.1574 14100 8.9202 - -
7.2081 14200 8.9302 - -
7.2589 14300 8.9157 - -
7.3096 14400 8.9093 - -
7.3604 14500 8.9195 - -
7.4112 14600 8.9024 - -
7.4619 14700 8.8853 - -
7.5127 14800 8.8902 - -
7.5635 14900 8.8683 - -
7.6142 15000 8.8682 - -
7.6650 15100 8.871 - -
7.7157 15200 8.8676 - -
7.7665 15300 8.8705 - -
7.8173 15400 8.881 - -
7.8680 15500 8.8622 - -
7.9188 15600 8.8379 - -
7.9695 15700 8.8466 - -
8.0 15760 - 9.1395 0.9742
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.53.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.4.1
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
12
Safetensors
Model size
7.51M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HassanCS/TCRb_HLA_peptide_esm2_t6_8M_UR50D_up_to_epoch_8

Finetuned
(37)
this model

Evaluation results