Update README.md
Browse files
README.md
CHANGED
|
@@ -84,3 +84,47 @@ Next, we evaluate the model in a cross-language context, with queries in French
|
|
| 84 |
As observed, the cross-language context does not significantly impact the behavior of our models. If the model is used in a reranking context along with filtering of the
|
| 85 |
Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts
|
| 86 |
for RAG-type applications.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
As observed, the cross-language context does not significantly impact the behavior of our models. If the model is used in a reranking context along with filtering of the
|
| 85 |
Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts
|
| 86 |
for RAG-type applications.
|
| 87 |
+
|
| 88 |
+
How to Use Bloomz-3b-reranking
|
| 89 |
+
------------------------------
|
| 90 |
+
|
| 91 |
+
The following example utilizes the API Pipeline of the Transformers library.
|
| 92 |
+
|
| 93 |
+
```python
|
| 94 |
+
import numpy as np
|
| 95 |
+
from transformers import pipeline
|
| 96 |
+
from scipy.spatial.distance import cdist
|
| 97 |
+
|
| 98 |
+
retriever = pipeline('feature-extraction', 'cmarkea/bloomz-3b-retriever')
|
| 99 |
+
|
| 100 |
+
# Inportant: take only last token!
|
| 101 |
+
infer = lambda x: [ii[0][-1] for ii in retriever(x)]
|
| 102 |
+
|
| 103 |
+
list_of_contexts = [...]
|
| 104 |
+
emb_contexts = np.concatenate(infer(list_of_contexts), axis=0)
|
| 105 |
+
list_of_queries = [...]
|
| 106 |
+
emb_queries = np.concatenate(infer(list_of_queries), axis=0)
|
| 107 |
+
|
| 108 |
+
# Important: take l2 distance!
|
| 109 |
+
dist = cdist(emb_queries, emb_contexts, 'euclidean')
|
| 110 |
+
top_k = lambda x: [
|
| 111 |
+
[list_of_contexts[qq] for qq in ii]
|
| 112 |
+
for ii in dist.argsort(axis=-1)[:,:x]
|
| 113 |
+
]
|
| 114 |
+
|
| 115 |
+
# top 5 nearest contexts for each queries
|
| 116 |
+
top_contexts = top_k(5)
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
Citation
|
| 120 |
+
--------
|
| 121 |
+
|
| 122 |
+
```bibtex
|
| 123 |
+
@online{DeBloomzReranking,
|
| 124 |
+
AUTHOR = {Cyrile Delestre},
|
| 125 |
+
ORGANIZATION = {Cr{\'e}dit Mutuel Ark{\'e}a},
|
| 126 |
+
URL = {https://huggingface.co/cmarkea/bloomz-3b-reranking},
|
| 127 |
+
YEAR = {2024},
|
| 128 |
+
KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
|
| 129 |
+
}
|
| 130 |
+
```
|