|  | --- | 
					
						
						|  | language: en | 
					
						
						|  | library_name: FlexRAG | 
					
						
						|  | tags: | 
					
						
						|  | - FlexRAG | 
					
						
						|  | - retrieval | 
					
						
						|  | - search | 
					
						
						|  | - lexical | 
					
						
						|  | - RAG | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # The BM25SRetriever for the wiki2021 corpus | 
					
						
						|  |  | 
					
						
						|  | The corpus was created by the [Atlas](https://github.com/facebookresearch/atlas) project and the index was built using the [FlexRAG](https://github.com/ictnlp/flexrag) library. | 
					
						
						|  |  | 
					
						
						|  | | Corpus Attribute | Value                                                           | | 
					
						
						|  | | ---------------- | --------------------------------------------------------------- | | 
					
						
						|  | | Language         | English                                                         | | 
					
						
						|  | | Domain           | Wikipedia                                                       | | 
					
						
						|  | | Size             | 37.5M (33.1M text, 4.3M infobox)                                | | 
					
						
						|  | | Dump Date        | Dec 2021                                                        | | 
					
						
						|  | | Provideer        | [Atlas](https://github.com/facebookresearch/atlas)              | | 
					
						
						|  | | License          | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) | | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | | Index Attribute | Value                                                           | | 
					
						
						|  | | --------------- | --------------------------------------------------------------- | | 
					
						
						|  | | Index Type      | BM25S                                                           | | 
					
						
						|  | | Index Method    | Lucene                                                          | | 
					
						
						|  | | Preprocessing   | LengthFilter(min_char=10, max_char=4096)                        | | 
					
						
						|  | | Provideer       | [FlexRAG](https://github.com/ictnlp/flexrag)                    | | 
					
						
						|  | | License         | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) | | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Installation | 
					
						
						|  |  | 
					
						
						|  | You can install the `FlexRAG` library with `pip`: | 
					
						
						|  |  | 
					
						
						|  | ```bash | 
					
						
						|  | pip install flexrag | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ## Loading a `FlexRAG` retriever | 
					
						
						|  |  | 
					
						
						|  | You can use this retriever for information retrieval tasks. Here is an example: | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from flexrag.retriever import LocalRetriever | 
					
						
						|  |  | 
					
						
						|  | # Load the retriever from the HuggingFace Hub | 
					
						
						|  | retriever = LocalRetriever.load_from_hub("FlexRAG/wiki2021_atlas_bm25s") | 
					
						
						|  |  | 
					
						
						|  | # You can retrieve now | 
					
						
						|  | results = retriever.search("Who is Bruce Wayne?") | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ## Running the RAG application with the retriever | 
					
						
						|  |  | 
					
						
						|  | You can run the **GUI application** of the RAG assistant with this retriever. Here is an example: | 
					
						
						|  |  | 
					
						
						|  | ```bash | 
					
						
						|  | python -m flexrag.entrypoints.run_interactive \ | 
					
						
						|  | assistant_type=modular \ | 
					
						
						|  | modular_config.used_fields=[title,text] \ | 
					
						
						|  | modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \ | 
					
						
						|  | modular_config.response_type=original \ | 
					
						
						|  | modular_config.generator_type=openai \ | 
					
						
						|  | modular_config.openai_config.model_name='gpt-4o-mini' \ | 
					
						
						|  | modular_config.openai_config.api_key=$OPENAI_KEY \ | 
					
						
						|  | modular_config.do_sample=False | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | You can also run the **FlexRAG's RAG evaluation pipeline** with this retriever. Here is an example that evaluates the **ModularAssistant** with the retriever on the *Natural Questions* test split: | 
					
						
						|  |  | 
					
						
						|  | ```bash | 
					
						
						|  | OUTPUT_PATH=<path_to_output> | 
					
						
						|  | DB_PATH=<path_to_database> | 
					
						
						|  | OPENAI_KEY=<your_openai_key> | 
					
						
						|  |  | 
					
						
						|  | python -m flexrag.entrypoints.run_assistant \ | 
					
						
						|  | name=nq \ | 
					
						
						|  | split=test \ | 
					
						
						|  | output_path=${OUTPUT_PATH} \ | 
					
						
						|  | assistant_type=modular \ | 
					
						
						|  | modular_config.used_fields=[title,text] \ | 
					
						
						|  | modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \ | 
					
						
						|  | modular_config.generator_type=openai \ | 
					
						
						|  | modular_config.openai_config.model_name='gpt-4o-mini' \ | 
					
						
						|  | modular_config.openai_config.api_key=$OPENAI_KEY \ | 
					
						
						|  | modular_config.do_sample=False \ | 
					
						
						|  | eval_config.metrics_type=[retrieval_success_rate,generation_f1,generation_em] \ | 
					
						
						|  | eval_config.retrieval_success_rate_config.context_preprocess.processor_type=[simplify_answer] \ | 
					
						
						|  | eval_config.retrieval_success_rate_config.eval_field=text \ | 
					
						
						|  | eval_config.response_preprocess.processor_type=[simplify_answer] | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ## License | 
					
						
						|  | As the corpus is based on the [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license, the retriever is also licensed under the same license. | 
					
						
						|  |  | 
					
						
						|  | ## Related Links | 
					
						
						|  |  | 
					
						
						|  | FlexRAG Related Links: | 
					
						
						|  | * π[Documentation](https://flexrag.readthedocs.io/en/latest/) | 
					
						
						|  | * π»[GitHub Repository](https://github.com/ictnlp/flexrag) | 
					
						
						|  |  |