zrguo
commited on
Commit
·
ba5302b
1
Parent(s):
fe5faf4
Update README.md
Browse files
README.md
CHANGED
@@ -20,8 +20,8 @@ This repository hosts the code of LightRAG. The structure of this code is based
|
|
20 |
</div>
|
21 |
|
22 |
## 🎉 News
|
23 |
-
- [x] [2024.10.16]🎯🎯📢📢LightRAG now supports [Ollama models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#
|
24 |
-
- [x] [2024.10.15]🎯🎯📢📢LightRAG now supports [Hugging Face models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#
|
25 |
|
26 |
## Install
|
27 |
|
@@ -76,7 +76,9 @@ print(rag.query("What are the top themes in this story?", param=QueryParam(mode=
|
|
76 |
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")))
|
77 |
```
|
78 |
|
79 |
-
|
|
|
|
|
80 |
LightRAG also support Open AI-like chat/embeddings APIs:
|
81 |
```python
|
82 |
async def llm_model_func(
|
@@ -110,8 +112,11 @@ rag = LightRAG(
|
|
110 |
)
|
111 |
)
|
112 |
```
|
|
|
113 |
|
114 |
-
|
|
|
|
|
115 |
If you want to use Hugging Face models, you only need to set LightRAG as follows:
|
116 |
```python
|
117 |
from lightrag.llm import hf_model_complete, hf_embedding
|
@@ -134,9 +139,12 @@ rag = LightRAG(
|
|
134 |
),
|
135 |
)
|
136 |
```
|
|
|
137 |
|
138 |
-
|
|
|
139 |
If you want to use Ollama models, you only need to set LightRAG as follows:
|
|
|
140 |
```python
|
141 |
from lightrag.llm import ollama_model_complete, ollama_embedding
|
142 |
|
@@ -156,6 +164,7 @@ rag = LightRAG(
|
|
156 |
),
|
157 |
)
|
158 |
```
|
|
|
159 |
|
160 |
### Batch Insert
|
161 |
```python
|
@@ -178,6 +187,10 @@ The dataset used in LightRAG can be download from [TommyChien/UltraDomain](https
|
|
178 |
|
179 |
### Generate Query
|
180 |
LightRAG uses the following prompt to generate high-level queries, with the corresponding code located in `example/generate_query.py`.
|
|
|
|
|
|
|
|
|
181 |
```python
|
182 |
Given the following description of a dataset:
|
183 |
|
@@ -201,9 +214,14 @@ Output the results in the following structure:
|
|
201 |
- User 5: [user description]
|
202 |
...
|
203 |
```
|
|
|
204 |
|
205 |
### Batch Eval
|
206 |
To evaluate the performance of two RAG systems on high-level queries, LightRAG uses the following prompt, with the specific code available in `example/batch_eval.py`.
|
|
|
|
|
|
|
|
|
207 |
```python
|
208 |
---Role---
|
209 |
You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
|
@@ -246,6 +264,7 @@ Output your evaluation in the following JSON format:
|
|
246 |
}}
|
247 |
}}
|
248 |
```
|
|
|
249 |
|
250 |
### Overall Performance Table
|
251 |
| | **Agriculture** | | **CS** | | **Legal** | | **Mix** | |
|
@@ -276,6 +295,10 @@ All the code can be found in the `./reproduce` directory.
|
|
276 |
|
277 |
### Step-0 Extract Unique Contexts
|
278 |
First, we need to extract unique contexts in the datasets.
|
|
|
|
|
|
|
|
|
279 |
```python
|
280 |
def extract_unique_contexts(input_directory, output_directory):
|
281 |
|
@@ -327,10 +350,14 @@ def extract_unique_contexts(input_directory, output_directory):
|
|
327 |
print("All files have been processed.")
|
328 |
|
329 |
```
|
|
|
330 |
|
331 |
### Step-1 Insert Contexts
|
332 |
For the extracted contexts, we insert them into the LightRAG system.
|
333 |
|
|
|
|
|
|
|
334 |
```python
|
335 |
def insert_text(rag, file_path):
|
336 |
with open(file_path, mode='r') as f:
|
@@ -349,10 +376,15 @@ def insert_text(rag, file_path):
|
|
349 |
if retries == max_retries:
|
350 |
print("Insertion failed after exceeding the maximum number of retries")
|
351 |
```
|
|
|
352 |
|
353 |
### Step-2 Generate Queries
|
354 |
|
355 |
We extract tokens from both the first half and the second half of each context in the dataset, then combine them as the dataset description to generate queries.
|
|
|
|
|
|
|
|
|
356 |
```python
|
357 |
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
358 |
|
@@ -368,9 +400,14 @@ def get_summary(context, tot_tokens=2000):
|
|
368 |
|
369 |
return summary
|
370 |
```
|
|
|
371 |
|
372 |
### Step-3 Query
|
373 |
For the queries generated in Step-2, we will extract them and query LightRAG.
|
|
|
|
|
|
|
|
|
374 |
```python
|
375 |
def extract_queries(file_path):
|
376 |
with open(file_path, 'r') as f:
|
@@ -382,6 +419,7 @@ def extract_queries(file_path):
|
|
382 |
|
383 |
return queries
|
384 |
```
|
|
|
385 |
|
386 |
## Code Structure
|
387 |
|
|
|
20 |
</div>
|
21 |
|
22 |
## 🎉 News
|
23 |
+
- [x] [2024.10.16]🎯🎯📢📢LightRAG now supports [Ollama models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
|
24 |
+
- [x] [2024.10.15]🎯🎯📢📢LightRAG now supports [Hugging Face models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
|
25 |
|
26 |
## Install
|
27 |
|
|
|
76 |
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")))
|
77 |
```
|
78 |
|
79 |
+
<details>
|
80 |
+
<summary> Using Open AI-like APIs </summary>
|
81 |
+
|
82 |
LightRAG also support Open AI-like chat/embeddings APIs:
|
83 |
```python
|
84 |
async def llm_model_func(
|
|
|
112 |
)
|
113 |
)
|
114 |
```
|
115 |
+
</details>
|
116 |
|
117 |
+
<details>
|
118 |
+
<summary> Using Hugging Face Models </summary>
|
119 |
+
|
120 |
If you want to use Hugging Face models, you only need to set LightRAG as follows:
|
121 |
```python
|
122 |
from lightrag.llm import hf_model_complete, hf_embedding
|
|
|
139 |
),
|
140 |
)
|
141 |
```
|
142 |
+
</details>
|
143 |
|
144 |
+
<details>
|
145 |
+
<summary> Using Ollama Models </summary>
|
146 |
If you want to use Ollama models, you only need to set LightRAG as follows:
|
147 |
+
|
148 |
```python
|
149 |
from lightrag.llm import ollama_model_complete, ollama_embedding
|
150 |
|
|
|
164 |
),
|
165 |
)
|
166 |
```
|
167 |
+
</details>
|
168 |
|
169 |
### Batch Insert
|
170 |
```python
|
|
|
187 |
|
188 |
### Generate Query
|
189 |
LightRAG uses the following prompt to generate high-level queries, with the corresponding code located in `example/generate_query.py`.
|
190 |
+
|
191 |
+
<details>
|
192 |
+
<summary> Prompt </summary>
|
193 |
+
|
194 |
```python
|
195 |
Given the following description of a dataset:
|
196 |
|
|
|
214 |
- User 5: [user description]
|
215 |
...
|
216 |
```
|
217 |
+
</details>
|
218 |
|
219 |
### Batch Eval
|
220 |
To evaluate the performance of two RAG systems on high-level queries, LightRAG uses the following prompt, with the specific code available in `example/batch_eval.py`.
|
221 |
+
|
222 |
+
<details>
|
223 |
+
<summary> Prompt </summary>
|
224 |
+
|
225 |
```python
|
226 |
---Role---
|
227 |
You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
|
|
|
264 |
}}
|
265 |
}}
|
266 |
```
|
267 |
+
</details>
|
268 |
|
269 |
### Overall Performance Table
|
270 |
| | **Agriculture** | | **CS** | | **Legal** | | **Mix** | |
|
|
|
295 |
|
296 |
### Step-0 Extract Unique Contexts
|
297 |
First, we need to extract unique contexts in the datasets.
|
298 |
+
|
299 |
+
<details>
|
300 |
+
<summary> Code </summary>
|
301 |
+
|
302 |
```python
|
303 |
def extract_unique_contexts(input_directory, output_directory):
|
304 |
|
|
|
350 |
print("All files have been processed.")
|
351 |
|
352 |
```
|
353 |
+
</details>
|
354 |
|
355 |
### Step-1 Insert Contexts
|
356 |
For the extracted contexts, we insert them into the LightRAG system.
|
357 |
|
358 |
+
<details>
|
359 |
+
<summary> Code </summary>
|
360 |
+
|
361 |
```python
|
362 |
def insert_text(rag, file_path):
|
363 |
with open(file_path, mode='r') as f:
|
|
|
376 |
if retries == max_retries:
|
377 |
print("Insertion failed after exceeding the maximum number of retries")
|
378 |
```
|
379 |
+
</details>
|
380 |
|
381 |
### Step-2 Generate Queries
|
382 |
|
383 |
We extract tokens from both the first half and the second half of each context in the dataset, then combine them as the dataset description to generate queries.
|
384 |
+
|
385 |
+
<details>
|
386 |
+
<summary> Code </summary>
|
387 |
+
|
388 |
```python
|
389 |
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
390 |
|
|
|
400 |
|
401 |
return summary
|
402 |
```
|
403 |
+
</details>
|
404 |
|
405 |
### Step-3 Query
|
406 |
For the queries generated in Step-2, we will extract them and query LightRAG.
|
407 |
+
|
408 |
+
<details>
|
409 |
+
<summary> Code </summary>
|
410 |
+
|
411 |
```python
|
412 |
def extract_queries(file_path):
|
413 |
with open(file_path, 'r') as f:
|
|
|
419 |
|
420 |
return queries
|
421 |
```
|
422 |
+
</details>
|
423 |
|
424 |
## Code Structure
|
425 |
|