embedder_model / README.md
Alexhuou's picture
Add new SentenceTransformer model
d79971d verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:5700
  - loss:TripletLoss
base_model: thenlper/gte-small
widget:
  - source_sentence: Statutes are often called ________ law.
    sentences:
      - >-
        Calculate spin density on the central carbon atom of malonic acid
        radical (•CH(COOH)2) if the hyperfine value for the α-hydrogen atom is
        21.9 G.
      - >-
        Which of the following quotations best describes the central thesis of
        difference feminism?
      - >-
        If a relevant variable is omitted from a regression equation, the
        consequences would be that:


        i) The standard errors would be biased



        ii) If the excluded variable is uncorrelated with all of the included
        variables, all of


        the slope coefficients will be inconsistent.



        iii) If the excluded variable is uncorrelated with all of the included
        variables, the


        intercept coefficient will be inconsistent.



        iv) If the excluded variable is uncorrelated with all of the included
        variables, all of


        the slope and intercept coefficients will be consistent and unbiased but
        inefficient.
  - source_sentence: >-
      Let M be a 5 x 5 real matrix. Exactly four of the following five
      conditions on M are equivalent to each other. Which of the five conditions
      is equivalent to NONE of the other four?
    sentences:
      - >-
        The royal graves of the Shang Dynasty consisted of enormous
        cruciform-shaped tombs, where the deceased kings were buried with:
      - >-
        The region bounded by the curves y = x and y = x^2 in the first quadrant
        of the xy-plane is rotated about the y-axis. The volume of the resulting
        solid of revolution is
      - >-
        The energy released from the breakdown of the high-energy phosphates,
        ATP and phosphocreatine, can sustain maximal exertion exercise for
        about:
  - source_sentence: Which sequence describes the systemic circulation?
    sentences:
      - >-
        Which of the following best describes the process whereby the stomach
        muscles contract to propel food through the digestive tract?
      - The fallacy of guilt by association is a specific type of
      - 'Baier argues that genuine moral rules:'
  - source_sentence: >-
      This question refers to the following information.

      Although in Protestant Europe, [Peter the Great] was surrounded by
      evidence of the new civil and political rights of individual men embodied
      in constitutions, bills of rights and parliaments, he did not return to
      Russia determined to share power with his people. On the contrary, he
      returned not only determined to change his country but also convinced that
      if Russia was to be transformed, it was he who must provide both the
      direction and the motive force. He would try to lead; but where education
      and persuasion were not enough, he could drive—and if necessary flog—the
      backward nation forward.

      —Robert K. Massie, Peter the Great: His Life and World

      Based on the above passage, what kinds of reforms did Peter the Great
      embrace?
    sentences:
      - >-
        Identify the antecedent of the following conditional proposition: When
        the university raises tuition, then either the governor approves of it
        or the board of trustees doesn't prevent it.
      - >-
        Which of the following disorders is not suitable for population carrier
        screening?
      - >-
        This question refers to the following information.

        "To slacken the tempo would mean falling behind. And those who fall
        behind get beaten. But we do not want to be beaten. No, we refuse to be
        beaten! One feature of the history of old Russia was the continual
        beatings she suffered because of her backwardness. She was beaten by the
        Mongol khans. She was beaten by the Turkish beys. She was beaten by the
        Swedish feudal lords. She was beaten by the Polish and Lithuanian
        gentry. She was beaten by the British and French capitalists. She was
        beaten by the Japanese barons. All beat her––because of her
        backwardness, because of her military backwardness, cultural
        backwardness, political backwardness, industrial backwardness,
        agricultural backwardness. They beat her because it was profitable and
        could be done with impunity. You remember the words of the
        pre-revolutionary poet: "You are poor and abundant, mighty and impotent,
        Mother Russia." Those gentlemen were quite familiar with the verses of
        the old poet. They beat her, saying: "You are abundant," so one can
        enrich oneself at your expense. They beat her, saying: "You are poor and
        impotent," so you can be beaten and plundered with impunity. Such is the
        law of the exploiters––to beat the backward and the weak. It is the
        jungle law of capitalism. You are backward, you are weak––therefore you
        are wrong; hence you can be beaten and enslaved. You are
        mighty––therefore you are right; hence we must be wary of you.

        That is why we must no longer lag behind."

        Joseph Stalin, speech delivered at the first All-Union Conference of
        Leading Personnel of Socialist Industry, February 4, 1931

        Stalin's efforts to advance Russia as justified by his mention of the
        "continual beatings" were vindicated by which of the following
        historical events?
  - source_sentence: >-
      Gulde’s tax basis in Chyme Partnership was $26,000 at the time Gulde
      received a liquidating distribution of $12,000 cash and land with an
      adjusted basis to Chyme of $10,000 and a fair market value of $30,000.
      Chyme did not have unrealized receivables, appreciated inventory, or
      properties that had been contributed by its partners. What was the amount
      of Gulde’s basis in the land?
    sentences:
      - What is direct diplomacy?
      - >-
        The percentage of children in Ethiopia (age 8) who reported physical
        punishment by teachers in the past week in 2009 was about what?
      - >-
        A company exchanged land with an appraised value of $50,000 and an
        original cost of $20,000 for machinery with a fair value of $55,000.
        Assuming that the transaction has commercial substance, what is the gain
        on the exchange?
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on thenlper/gte-small

This is a sentence-transformers model finetuned from thenlper/gte-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: thenlper/gte-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Alexhuou/embedder_model")
# Run inference
sentences = [
    'Gulde’s tax basis in Chyme Partnership was $26,000 at the time Gulde received a liquidating distribution of $12,000 cash and land with an adjusted basis to Chyme of $10,000 and a fair market value of $30,000. Chyme did not have unrealized receivables, appreciated inventory, or properties that had been contributed by its partners. What was the amount of Gulde’s basis in the land?',
    'A company exchanged land with an appraised value of $50,000 and an original cost of $20,000 for machinery with a fair value of $55,000. Assuming that the transaction has commercial substance, what is the gain on the exchange?',
    'The percentage of children in Ethiopia (age 8) who reported physical punishment by teachers in the past week in 2009 was about what?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,700 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 5 tokens
    • mean: 49.22 tokens
    • max: 512 tokens
    • min: 5 tokens
    • mean: 48.59 tokens
    • max: 440 tokens
    • min: 5 tokens
    • mean: 41.92 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    This question refers to the following information.
    "The spontaneous forces of capitalism have been steadily growing in the countryside in recent years, with new rich peasants springing up everywhere and many well-to-do middle peasants striving to become rich peasants. On the other hand, many poor peasants are still living in poverty for lack of sufficient means of production, with some in debt and others selling or renting out their land. If this tendency goes unchecked, the polarization in the countryside will inevitably be aggravated day by day. Those peasants who lose their land and those who remain in poverty will complain that we are doing nothing to save them from ruin or to help them overcome their difficulties. Nor will the well-to-do middle peasants who are heading in the capitalist direction be pleased with us, for we shall never be able to satisfy their demands unless we intend to take the capitalist road. Can the worker-peasant alliance continue to stand in these circumstan...
    This question refers to the following information.
    Woman, wake up; the bell of reason is being heard throughout the whole universe; discover your rights. Enslaved man has multiplied his strength, [but] having become free, he has become unjust to his companion. Oh, women, women! When will you cease to be blind? What advantage have you received from the Revolution? A more pronounced scorn, a more marked disdain. If our leaders persist, courageously oppose the force of reason to their empty pretentions of superiority. Regardless of what barriers confront you, it is in your power to free yourselves!
    Olympe de Gouges, "Declaration of the Rights of Woman and the Female Citizen," 1791
    The independence? Nothing of what I hoped for was achieved. I had expected that my children would be able to have an education, but they did not get it. We were poor peasants then, we are poor peasants now. Nothing has changed. Everything is the same. The only thing is that we are free, the war is over, we work ...
    Which of the following most likely explains why Venus does not have a strong magnetic field?
    In conducting international market research, there are three types of equivalence. Which of the following is NOT one of the equivalences? Economic—marketing should encourage long-term economic development as opposed to short-term economic development. The domain of the function $h(x) = \sqrt{25-x^2}+\sqrt{-(x-2)}$ is an interval of what width?
    Which value is the most reasonable estimate of the volume of air an adult breathes in one day? By what nickname is the Federal National Mortgage Association known? If technology makes production less expensive and at the same time exports decrease which of the following will result with certainty?
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 30
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 30
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
1.4006 500 1.7342
2.8011 1000 0.8812
1.4006 500 0.5667
2.8011 1000 0.3886
4.2017 1500 0.2434
5.6022 2000 0.1532
7.0028 2500 0.1159
8.4034 3000 0.079
9.8039 3500 0.0524
11.2045 4000 0.0442
12.6050 4500 0.03
14.0056 5000 0.0246
15.4062 5500 0.0196
16.8067 6000 0.0137
18.2073 6500 0.0161
19.6078 7000 0.0093
21.0084 7500 0.0109
22.4090 8000 0.0055
23.8095 8500 0.0047
25.2101 9000 0.0044
26.6106 9500 0.0033
28.0112 10000 0.0043
29.4118 10500 0.0027

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}