CATIE-AQ/distilcamembert-base-embedding

Description

This is a sentence-transformers model finetuned from cmarkea/distilcamembert-base (68.1M parameters). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Score on the MTEB leaderboard:

Model Average Classification Clustering PairClassification Reranking Retrieval STS Summarization
CATIE-AQ/camembert-base-embedding (111M) 60,057 66,117 45,41 79,675 71,303 45,769 82,049 30,074
CATIE-AQ/distilcamembert-base-embedding (68M) 58,297 63,904 44,549 79,102 67,961 42,222 80,204 30,138

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: cmarkea/distilcamembert-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: CamembertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': True, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("CATIE-AQ/distilcamembert-base-embedding")
# Run inference
sentences = [
    "Tenet est sous surveillance depuis novembre, lorsque l'ancien directeur général Jeffrey Barbakow a déclaré que la société a utilisé des prix agressifs pour déclencher des paiements plus élevés pour les patients les plus malades de l'assurance maladie.",
    "En novembre, Jeffrey Brabakow, le directeur général de l'époque, a déclaré que la société utilisait des prix agressifs pour obtenir des paiements plus élevés pour les patients les plus malades de l'assurance maladie.",
    'La femme est en route pour un rendez-vous.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Citation

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
23
Safetensors
Model size
68.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CATIE-AQ/distilcamembert-base-embedding

Finetuned
(7)
this model

Datasets used to train CATIE-AQ/distilcamembert-base-embedding

Collection including CATIE-AQ/distilcamembert-base-embedding

Evaluation results