Text Ranking
Transformers
Safetensors
sentence-transformers
mistral
text-generation
cross-encoder
reranker
text-embeddings-inference
Instructions to use ContextualAI/ctxl-rerank-v2-instruct-multilingual-6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ContextualAI/ctxl-rerank-v2-instruct-multilingual-6b with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ContextualAI/ctxl-rerank-v2-instruct-multilingual-6b") model = AutoModelForCausalLM.from_pretrained("ContextualAI/ctxl-rerank-v2-instruct-multilingual-6b") - sentence-transformers
How to use ContextualAI/ctxl-rerank-v2-instruct-multilingual-6b with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("ContextualAI/ctxl-rerank-v2-instruct-multilingual-6b") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Integrate with Sentence Transformers v5.4
#1
by tomaarsen HF Staff - opened
- 1_LogitScore/config.json +5 -0
- README.md +47 -17
- chat_template.jinja +7 -0
- config_sentence_transformers.json +11 -0
- modules.json +14 -0
- sentence_bert_config.json +15 -0
1_LogitScore/config.json
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"true_token_id": 0,
|
| 3 |
+
"false_token_id": null,
|
| 4 |
+
"module_input_name": "causal_logits"
|
| 5 |
+
}
|
README.md
CHANGED
|
@@ -2,6 +2,10 @@
|
|
| 2 |
library_name: transformers
|
| 3 |
license: cc-by-nc-sa-4.0
|
| 4 |
pipeline_tag: text-ranking
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
<div align="center">
|
|
@@ -51,37 +55,56 @@ Use this reranker when you need to:
|
|
| 51 |
|
| 52 |
## Quickstart
|
| 53 |
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
```python
|
| 57 |
-
|
| 58 |
-
|
| 59 |
|
| 60 |
-
|
| 61 |
|
| 62 |
query = "What are the health benefits of exercise?"
|
| 63 |
instruction = "Prioritize recent medical research"
|
| 64 |
documents = [
|
| 65 |
"Regular exercise reduces risk of heart disease and improves mental health.",
|
| 66 |
"A 2024 study shows exercise enhances cognitive function in older adults.",
|
| 67 |
-
"Ancient Greeks valued physical fitness for military training."
|
| 68 |
]
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|
|
|
|
|
|
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
|
|
|
| 75 |
```
|
| 76 |
|
| 77 |
-
|
| 78 |
-
```
|
| 79 |
-
Query: What are the health benefits of exercise?
|
| 80 |
-
Instruction: Prioritize recent medical research
|
| 81 |
-
Score: -2.2969 | Doc: A 2024 study shows exercise enhances cognitive function in older adults.
|
| 82 |
-
Score: -4.6875 | Doc: Regular exercise reduces risk of heart disease and improves mental health.
|
| 83 |
-
Score: -12.3750 | Doc: Ancient Greeks valued physical fitness for military training.
|
| 84 |
-
```
|
| 85 |
|
| 86 |
### vLLM Usage (Recommended for Production)
|
| 87 |
|
|
@@ -223,6 +246,13 @@ def infer_w_hf(model_path: str, query: str, instruction: str, documents: list[st
|
|
| 223 |
print(f"Instruction: {instruction}")
|
| 224 |
for score, doc_id, doc in results:
|
| 225 |
print(f"Score: {score:.4f} | Doc: {doc}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 226 |
```
|
| 227 |
|
| 228 |
## Citation
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: cc-by-nc-sa-4.0
|
| 4 |
pipeline_tag: text-ranking
|
| 5 |
+
tags:
|
| 6 |
+
- sentence-transformers
|
| 7 |
+
- cross-encoder
|
| 8 |
+
- reranker
|
| 9 |
---
|
| 10 |
|
| 11 |
<div align="center">
|
|
|
|
| 55 |
|
| 56 |
## Quickstart
|
| 57 |
|
| 58 |
+
Each path below uses the same example inputs:
|
| 59 |
+
|
| 60 |
+
```
|
| 61 |
+
Query: What are the health benefits of exercise?
|
| 62 |
+
Instruction: Prioritize recent medical research
|
| 63 |
+
Documents:
|
| 64 |
+
- Regular exercise reduces risk of heart disease and improves mental health.
|
| 65 |
+
- A 2024 study shows exercise enhances cognitive function in older adults.
|
| 66 |
+
- Ancient Greeks valued physical fitness for military training.
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
**Expected Output:**
|
| 70 |
+
```
|
| 71 |
+
Score: -2.2969 | Doc: A 2024 study shows exercise enhances cognitive function in older adults.
|
| 72 |
+
Score: -4.6875 | Doc: Regular exercise reduces risk of heart disease and improves mental health.
|
| 73 |
+
Score: -12.3750 | Doc: Ancient Greeks valued physical fitness for military training.
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
### Using Sentence Transformers
|
| 77 |
+
|
| 78 |
+
Install Sentence Transformers:
|
| 79 |
+
```bash
|
| 80 |
+
pip install sentence_transformers
|
| 81 |
+
```
|
| 82 |
|
| 83 |
```python
|
| 84 |
+
import torch
|
| 85 |
+
from sentence_transformers import CrossEncoder
|
| 86 |
|
| 87 |
+
model = CrossEncoder("ContextualAI/ctxl-rerank-v2-instruct-multilingual-6b", model_kwargs={"dtype": torch.bfloat16})
|
| 88 |
|
| 89 |
query = "What are the health benefits of exercise?"
|
| 90 |
instruction = "Prioritize recent medical research"
|
| 91 |
documents = [
|
| 92 |
"Regular exercise reduces risk of heart disease and improves mental health.",
|
| 93 |
"A 2024 study shows exercise enhances cognitive function in older adults.",
|
| 94 |
+
"Ancient Greeks valued physical fitness for military training.",
|
| 95 |
]
|
| 96 |
|
| 97 |
+
pairs = [(query, doc) for doc in documents]
|
| 98 |
+
scores = model.predict(pairs, prompt=instruction)
|
| 99 |
+
print(scores)
|
| 100 |
+
# [ -4.6875 -2.171875 -12.4375 ]
|
| 101 |
|
| 102 |
+
rankings = model.rank(query, documents, prompt=instruction)
|
| 103 |
+
print(rankings)
|
| 104 |
+
# [{'corpus_id': 1, 'score': np.float32(-2.171875)}, {'corpus_id': 0, 'score': np.float32(-4.6875)}, {'corpus_id': 2, 'score': np.float32(-12.4375)}]
|
| 105 |
```
|
| 106 |
|
| 107 |
+
The `prompt` argument is optional, you can omit it to score pairs without any custom instruction. Scores are the raw bfloat16 logits at token id 0 at the final position (matching the `Transformers` path below), so higher means more relevant.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
### vLLM Usage (Recommended for Production)
|
| 110 |
|
|
|
|
| 246 |
print(f"Instruction: {instruction}")
|
| 247 |
for score, doc_id, doc in results:
|
| 248 |
print(f"Score: {score:.4f} | Doc: {doc}")
|
| 249 |
+
"""
|
| 250 |
+
Query: What are the health benefits of exercise?
|
| 251 |
+
Instruction: Prioritize recent medical research
|
| 252 |
+
Score: -2.1719 | Doc: A 2024 study shows exercise enhances cognitive function in older adults.
|
| 253 |
+
Score: -4.6875 | Doc: Regular exercise reduces risk of heart disease and improves mental health.
|
| 254 |
+
Score: -12.4375 | Doc: Ancient Greeks valued physical fitness for military training.
|
| 255 |
+
"""
|
| 256 |
```
|
| 257 |
|
| 258 |
## Citation
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- set instruction_text = messages | selectattr("role", "eq", "system") | map(attribute="content") | first | default("") -%}
|
| 2 |
+
{%- set query_text = messages | selectattr("role", "eq", "query") | map(attribute="content") | first -%}
|
| 3 |
+
{%- set document_text = messages | selectattr("role", "eq", "document") | map(attribute="content") | first -%}
|
| 4 |
+
{{- bos_token -}}
|
| 5 |
+
Check whether a given document contains information helpful to answer the query.
|
| 6 |
+
<Document> {{ document_text }}
|
| 7 |
+
<Query> {{ query_text }}{% if instruction_text %} {{ instruction_text }}{% endif %} ??
|
config_sentence_transformers.json
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"__version__": {
|
| 3 |
+
"pytorch": "2.10.0+cu128",
|
| 4 |
+
"sentence_transformers": "5.4.0",
|
| 5 |
+
"transformers": "5.5.0.dev0"
|
| 6 |
+
},
|
| 7 |
+
"activation_fn": "torch.nn.modules.linear.Identity",
|
| 8 |
+
"default_prompt_name": null,
|
| 9 |
+
"model_type": "CrossEncoder",
|
| 10 |
+
"prompts": {}
|
| 11 |
+
}
|
modules.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"idx": 0,
|
| 4 |
+
"name": "0",
|
| 5 |
+
"path": "",
|
| 6 |
+
"type": "sentence_transformers.base.modules.transformer.Transformer"
|
| 7 |
+
},
|
| 8 |
+
{
|
| 9 |
+
"idx": 1,
|
| 10 |
+
"name": "1",
|
| 11 |
+
"path": "1_LogitScore",
|
| 12 |
+
"type": "sentence_transformers.cross_encoder.modules.logit_score.LogitScore"
|
| 13 |
+
}
|
| 14 |
+
]
|
sentence_bert_config.json
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"transformer_task": "text-generation",
|
| 3 |
+
"modality_config": {
|
| 4 |
+
"text": {
|
| 5 |
+
"method": "forward",
|
| 6 |
+
"method_output_name": "logits"
|
| 7 |
+
},
|
| 8 |
+
"message": {
|
| 9 |
+
"method": "forward",
|
| 10 |
+
"method_output_name": "logits",
|
| 11 |
+
"format": "flat"
|
| 12 |
+
}
|
| 13 |
+
},
|
| 14 |
+
"module_output_name": "causal_logits"
|
| 15 |
+
}
|