SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'plocate */filename',
    'Look for a file by its exact filename (a pattern containing no globbing characters is interpreted as `*pattern*`)',
    'View documentation for the current command:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.5111, -0.1450],
#         [ 0.5111,  1.0000,  0.0595],
#         [-0.1450,  0.0595,  1.0000]])

Evaluation and Comparison

To demonstrate the effectiveness of fine-tuning, we compare the cosine similarity scores of the fine-tuned model against the stock sentence-transformers/all-MiniLM-L6-v2 model on a set of example command-description pairs. Higher positive pair similarity and lower negative pair similarity indicate better performance.

Command Description Fine-tuned Model (Positive Pair Similarity) Stock MiniLM (Positive Pair Similarity) Fine-tuned Model (Avg. Negative Pair Similarity) Stock MiniLM (Avg. Negative Pair Similarity)
ls -la List all files and directories, including hidden ones, in long format. 0.5747 0.1288 -0.086 0.052
grep -r "TODO" ./src Recursively search for the string 'TODO' in all files within the './src' directory. 0.7457 0.7489 0.019 0.119
find . -name "*.py" -mtime -7 Find all Python files modified in the last 7 days in the current directory and its subdirectories. 0.8176 0.6050 0.050 0.129
ps aux (python) List all running processes and filter for those containing the word 'python'. 0.8011 0.6616 0.009 0.099
tar -czvf archive.tar.gz /path/to/directory Create a gzipped tar archive named 'archive.tar.gz' from the specified directory. 0.7239 0.6875 -0.004 0.049
docker run -it --rm -p 8080:80 nginx:latest Run a Docker container from the 'nginx:latest' image, mapping port 8080 on the host to port 80 in the container, in interactive mode, and remove it when done. 0.8018 0.7462 -0.006 0.059
git checkout -b new-feature Create a new Git branch named 'new-feature' and switch to it. 0.6108 0.6355 -0.059 0.039
curl -X POST -H "Content-Type: application/json" -d '{"key": "value"}' https://api.example.com/submit Send a POST request with JSON data to a specified URL using curl. 0.6739 0.7476 -0.004 0.019
chmod 755 script.sh Change the permissions of 'script.sh' to allow the owner to read, write, and execute, and others to read and execute. 0.7668 0.6824 -0.019 0.059
df -h Display disk space usage of file systems in a human-readable format. 0.7445 0.2452 -0.060 0.029
echo "Hello, World!" > output.txt Write the string 'Hello, World!' to a file named 'output.txt', overwriting it if it exists. 0.7569 0.8295 -0.029 0.079
sudo apt-get update && sudo apt-get upgrade -y Update the package lists and then upgrade all installed packages on a Debian-based system without prompting for confirmation. 0.6937 0.4985 0.010 0.069

Key Observations:

  • Improved Positive Pair Similarity (Overall): For most command-description pairs, the fine-tuned model shows a higher positive pair similarity compared to the stock MiniLM model.
  • Lower Negative Pair Similarity (Overall): The fine-tuned model consistently produces lower (often negative) average cosine similarities for negative pairs, indicating better discrimination.
  • Specialization: The fine-tuned model demonstrates clear specialization for the domain of terminal commands and their descriptions, adapting its embedding space to better capture the nuances of this specific domain.

Training Details

Training Dataset

Unnamed Dataset

  • Size: 78,768 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 3 tokens
    • mean: 21.83 tokens
    • max: 121 tokens
    • min: 4 tokens
    • mean: 18.41 tokens
    • max: 103 tokens
  • Samples:
    sentence_0 sentence_1
    wajig daily-upgrade Perform an update and then a distupgrade:
    readlink -f /path/here/.. Print canonical filename of "/path/here/.."
    rustup doc {{std::fs usize
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.1016 500 0.391
0.2031 1000 0.2648
0.3047 1500 0.2107
0.4063 2000 0.1675
0.5078 2500 0.1571
0.6094 3000 0.1467
0.7109 3500 0.1284
0.8125 4000 0.1272
0.9141 4500 0.1156
1.0156 5000 0.0983
1.1172 5500 0.074
1.2188 6000 0.0799
1.3203 6500 0.0752
1.4219 7000 0.0686
1.5235 7500 0.0716
1.6250 8000 0.0672
1.7266 8500 0.0652
1.8282 9000 0.0563
1.9297 9500 0.0527
2.0313 10000 0.0519
2.1328 10500 0.0461
2.2344 11000 0.0405
2.3360 11500 0.0447
2.4375 12000 0.0454
2.5391 12500 0.0409
2.6407 13000 0.0408
2.7422 13500 0.0416
2.8438 14000 0.0397
2.9454 14500 0.0365
3.0469 15000 0.0372
3.1485 15500 0.0313
3.2501 16000 0.0317
3.3516 16500 0.0282
3.4532 17000 0.0293
3.5547 17500 0.0294
3.6563 18000 0.0278
3.7579 18500 0.0267
3.8594 19000 0.0281
3.9610 19500 0.0269
4.0626 20000 0.0264
4.1641 20500 0.0257
4.2657 21000 0.0272
4.3673 21500 0.0232
4.4688 22000 0.025
4.5704 22500 0.0258
4.6719 23000 0.026
4.7735 23500 0.0257
4.8751 24000 0.0244
4.9766 24500 0.0232

Framework Versions

  • Python: 3.10.18
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.1
  • PyTorch: 2.7.0+cu128
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
17
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mitchins/minilm-l6-v2-terminal-describer-embeddings

Finetuned
(473)
this model