Instructions to use Alibaba-NLP/gte-Qwen2-1.5B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Alibaba-NLP/gte-Qwen2-1.5B-instruct with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Alibaba-NLP/gte-Qwen2-1.5B-instruct", trust_remote_code=True) sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Alibaba-NLP/gte-Qwen2-1.5B-instruct with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Alibaba-NLP/gte-Qwen2-1.5B-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Alibaba-NLP/gte-Qwen2-1.5B-instruct", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
score mteb french
Hello,
It's great with your open-source model. But there seems to be confusion about the model's score when evaluating on mteb-french. I tried running evaluation locally, the average score is 59.92, different from 66.6 as on the leaderboard.
Could you please provide your evaluation results or at least some results from your dataset? We would like to compare the results of the dataset. Please note that this model is trained with instructions (instruct training), and when encoding the text, it is necessary to concatenate the instruction on the query side.
"""Example script for benchmarking all datasets constituting the MTEB French leaderboard & average scores"""
from future import annotations
import os
import logging
import torch
import gc
from sentence_transformers import SentenceTransformer
device = torch.device('cuda:0')
torch.cuda.set_device(device)
from mteb import MTEB
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("main")
TASK_LIST_CLASSIFICATION = [
"AmazonReviewsClassification",
"MasakhaNEWSClassification",
"MassiveIntentClassification",
"MassiveScenarioClassification",
"MTOPDomainClassification",
"MTOPIntentClassification",
]
TASK_LIST_CLUSTERING = [
"AlloProfClusteringP2P",
"AlloProfClusteringS2S",
"HALClusteringS2S",
"MasakhaNEWSClusteringP2P",
"MasakhaNEWSClusteringS2S",
"MLSUMClusteringP2P",
"MLSUMClusteringS2S",
]
TASK_LIST_PAIR_CLASSIFICATION = [
"OpusparcusPC",
"PawsX",
]
TASK_LIST_RERANKING = ["SyntecReranking", "AlloprofReranking"]
TASK_LIST_RETRIEVAL = [
"AlloprofRetrieval",
"BSARDRetrieval",
"SyntecRetrieval",
"XPQARetrieval",
"MintakaRetrieval",
]
TASK_LIST_STS = ["SummEvalFr", "STSBenchmarkMultilingualSTS", "STS22", "SICKFr"]
TASK_LIST = (
TASK_LIST_CLASSIFICATION
+ TASK_LIST_CLUSTERING
+ TASK_LIST_PAIR_CLASSIFICATION
+ TASK_LIST_RERANKING
+ TASK_LIST_RETRIEVAL
+ TASK_LIST_STS
)
model_name = "Alibaba-NLP/gte-Qwen2-1.5B-instruct"
model = SentenceTransformer(model_name, trust_remote_code=True)
logger.info(f"Task list : {TASK_LIST}")
for task in TASK_LIST:
logger.info(f"Running task: {task}")
evaluation = MTEB(
tasks=[task], task_langs=["fr"]
) # Remove "fr" for running all languages
evaluation.run(model, batch_size = 1, output_folder=f"results/{model_name}")
This is the result after I ran the above code with 26 resulting json files:
https://www.dropbox.com/scl/fi/7is59edlapzdnhacp2ysf/Alibaba-NLP__gte-Qwen2-1.5B-instruct.zip?rlkey=pv0hppw7dvdbb25e7rftybd2c&st=867jjbh0&dl=0
@abhamadi
In TASK_LIST_PAIR_CLASSIFICATION "PawsX" is not FRA (It's CMN). See MTEB Tasks
and how do you get the average score: 59.92 ?
is in this way ?
evaluation = MTEB(tasks=[task], task_langs=["fr"])
results = evaluation.run(model, batch_size=1, output_folder=f"results/{model_name}")
# Calculate the average score across all tasks
average_score = sum(results.values()) / len(results)
print(f"Average Score: {average_score}")