cmarkea
/

bloomz-3b-reranking

text-classification

text-generation-inference

Model card Files Files and versions

Cyrile commited on Apr 11, 2024

Commit

f93ddb0

·

verified ·

1 Parent(s): 37f957a

Update README.md

Files changed (1) hide show

README.md +44 -0

README.md CHANGED Viewed

@@ -84,3 +84,47 @@ Next, we evaluate the model in a cross-language context, with queries in French
 As observed, the cross-language context does not significantly impact the behavior of our models. If the model is used in a reranking context along with filtering of the
 Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts
 for RAG-type applications.

 As observed, the cross-language context does not significantly impact the behavior of our models. If the model is used in a reranking context along with filtering of the
 Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts
 for RAG-type applications.
+How to Use Bloomz-3b-reranking
+------------------------------
+The following example utilizes the API Pipeline of the Transformers library.
+```python
+import numpy as np
+from transformers import pipeline
+from scipy.spatial.distance import cdist
+retriever = pipeline('feature-extraction', 'cmarkea/bloomz-3b-retriever')
+# Inportant: take only last token!
+infer = lambda x: [ii[0][-1] for ii in retriever(x)]
+list_of_contexts = [...]
+emb_contexts = np.concatenate(infer(list_of_contexts), axis=0)
+list_of_queries = [...]
+emb_queries = np.concatenate(infer(list_of_queries), axis=0)
+# Important: take l2 distance!
+dist = cdist(emb_queries, emb_contexts, 'euclidean')
+top_k = lambda x: [
+    [list_of_contexts[qq] for qq in ii]
+    for ii in dist.argsort(axis=-1)[:,:x]
+]
+# top 5 nearest contexts for each queries
+top_contexts = top_k(5)
+```
+Citation
+--------
+```bibtex
+@online{DeBloomzReranking,
+  AUTHOR = {Cyrile Delestre},
+  ORGANIZATION = {Cr{\'e}dit Mutuel Ark{\'e}a},
+  URL = {https://huggingface.co/cmarkea/bloomz-3b-reranking},
+  YEAR = {2024},
+  KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
+}
+```