zrguo commited on
Commit
ba5302b
·
1 Parent(s): fe5faf4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -5
README.md CHANGED
@@ -20,8 +20,8 @@ This repository hosts the code of LightRAG. The structure of this code is based
20
  </div>
21
 
22
  ## 🎉 News
23
- - [x] [2024.10.16]🎯🎯📢📢LightRAG now supports [Ollama models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-ollama-models)!
24
- - [x] [2024.10.15]🎯🎯📢📢LightRAG now supports [Hugging Face models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-hugging-face-models)!
25
 
26
  ## Install
27
 
@@ -76,7 +76,9 @@ print(rag.query("What are the top themes in this story?", param=QueryParam(mode=
76
  print(rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")))
77
  ```
78
 
79
- ### Open AI-like APIs
 
 
80
  LightRAG also support Open AI-like chat/embeddings APIs:
81
  ```python
82
  async def llm_model_func(
@@ -110,8 +112,11 @@ rag = LightRAG(
110
  )
111
  )
112
  ```
 
113
 
114
- ### Using Hugging Face Models
 
 
115
  If you want to use Hugging Face models, you only need to set LightRAG as follows:
116
  ```python
117
  from lightrag.llm import hf_model_complete, hf_embedding
@@ -134,9 +139,12 @@ rag = LightRAG(
134
  ),
135
  )
136
  ```
 
137
 
138
- ### Using Ollama Models
 
139
  If you want to use Ollama models, you only need to set LightRAG as follows:
 
140
  ```python
141
  from lightrag.llm import ollama_model_complete, ollama_embedding
142
 
@@ -156,6 +164,7 @@ rag = LightRAG(
156
  ),
157
  )
158
  ```
 
159
 
160
  ### Batch Insert
161
  ```python
@@ -178,6 +187,10 @@ The dataset used in LightRAG can be download from [TommyChien/UltraDomain](https
178
 
179
  ### Generate Query
180
  LightRAG uses the following prompt to generate high-level queries, with the corresponding code located in `example/generate_query.py`.
 
 
 
 
181
  ```python
182
  Given the following description of a dataset:
183
 
@@ -201,9 +214,14 @@ Output the results in the following structure:
201
  - User 5: [user description]
202
  ...
203
  ```
 
204
 
205
  ### Batch Eval
206
  To evaluate the performance of two RAG systems on high-level queries, LightRAG uses the following prompt, with the specific code available in `example/batch_eval.py`.
 
 
 
 
207
  ```python
208
  ---Role---
209
  You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
@@ -246,6 +264,7 @@ Output your evaluation in the following JSON format:
246
  }}
247
  }}
248
  ```
 
249
 
250
  ### Overall Performance Table
251
  | | **Agriculture** | | **CS** | | **Legal** | | **Mix** | |
@@ -276,6 +295,10 @@ All the code can be found in the `./reproduce` directory.
276
 
277
  ### Step-0 Extract Unique Contexts
278
  First, we need to extract unique contexts in the datasets.
 
 
 
 
279
  ```python
280
  def extract_unique_contexts(input_directory, output_directory):
281
 
@@ -327,10 +350,14 @@ def extract_unique_contexts(input_directory, output_directory):
327
  print("All files have been processed.")
328
 
329
  ```
 
330
 
331
  ### Step-1 Insert Contexts
332
  For the extracted contexts, we insert them into the LightRAG system.
333
 
 
 
 
334
  ```python
335
  def insert_text(rag, file_path):
336
  with open(file_path, mode='r') as f:
@@ -349,10 +376,15 @@ def insert_text(rag, file_path):
349
  if retries == max_retries:
350
  print("Insertion failed after exceeding the maximum number of retries")
351
  ```
 
352
 
353
  ### Step-2 Generate Queries
354
 
355
  We extract tokens from both the first half and the second half of each context in the dataset, then combine them as the dataset description to generate queries.
 
 
 
 
356
  ```python
357
  tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
358
 
@@ -368,9 +400,14 @@ def get_summary(context, tot_tokens=2000):
368
 
369
  return summary
370
  ```
 
371
 
372
  ### Step-3 Query
373
  For the queries generated in Step-2, we will extract them and query LightRAG.
 
 
 
 
374
  ```python
375
  def extract_queries(file_path):
376
  with open(file_path, 'r') as f:
@@ -382,6 +419,7 @@ def extract_queries(file_path):
382
 
383
  return queries
384
  ```
 
385
 
386
  ## Code Structure
387
 
 
20
  </div>
21
 
22
  ## 🎉 News
23
+ - [x] [2024.10.16]🎯🎯📢📢LightRAG now supports [Ollama models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
24
+ - [x] [2024.10.15]🎯🎯📢📢LightRAG now supports [Hugging Face models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
25
 
26
  ## Install
27
 
 
76
  print(rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")))
77
  ```
78
 
79
+ <details>
80
+ <summary> Using Open AI-like APIs </summary>
81
+
82
  LightRAG also support Open AI-like chat/embeddings APIs:
83
  ```python
84
  async def llm_model_func(
 
112
  )
113
  )
114
  ```
115
+ </details>
116
 
117
+ <details>
118
+ <summary> Using Hugging Face Models </summary>
119
+
120
  If you want to use Hugging Face models, you only need to set LightRAG as follows:
121
  ```python
122
  from lightrag.llm import hf_model_complete, hf_embedding
 
139
  ),
140
  )
141
  ```
142
+ </details>
143
 
144
+ <details>
145
+ <summary> Using Ollama Models </summary>
146
  If you want to use Ollama models, you only need to set LightRAG as follows:
147
+
148
  ```python
149
  from lightrag.llm import ollama_model_complete, ollama_embedding
150
 
 
164
  ),
165
  )
166
  ```
167
+ </details>
168
 
169
  ### Batch Insert
170
  ```python
 
187
 
188
  ### Generate Query
189
  LightRAG uses the following prompt to generate high-level queries, with the corresponding code located in `example/generate_query.py`.
190
+
191
+ <details>
192
+ <summary> Prompt </summary>
193
+
194
  ```python
195
  Given the following description of a dataset:
196
 
 
214
  - User 5: [user description]
215
  ...
216
  ```
217
+ </details>
218
 
219
  ### Batch Eval
220
  To evaluate the performance of two RAG systems on high-level queries, LightRAG uses the following prompt, with the specific code available in `example/batch_eval.py`.
221
+
222
+ <details>
223
+ <summary> Prompt </summary>
224
+
225
  ```python
226
  ---Role---
227
  You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
 
264
  }}
265
  }}
266
  ```
267
+ </details>
268
 
269
  ### Overall Performance Table
270
  | | **Agriculture** | | **CS** | | **Legal** | | **Mix** | |
 
295
 
296
  ### Step-0 Extract Unique Contexts
297
  First, we need to extract unique contexts in the datasets.
298
+
299
+ <details>
300
+ <summary> Code </summary>
301
+
302
  ```python
303
  def extract_unique_contexts(input_directory, output_directory):
304
 
 
350
  print("All files have been processed.")
351
 
352
  ```
353
+ </details>
354
 
355
  ### Step-1 Insert Contexts
356
  For the extracted contexts, we insert them into the LightRAG system.
357
 
358
+ <details>
359
+ <summary> Code </summary>
360
+
361
  ```python
362
  def insert_text(rag, file_path):
363
  with open(file_path, mode='r') as f:
 
376
  if retries == max_retries:
377
  print("Insertion failed after exceeding the maximum number of retries")
378
  ```
379
+ </details>
380
 
381
  ### Step-2 Generate Queries
382
 
383
  We extract tokens from both the first half and the second half of each context in the dataset, then combine them as the dataset description to generate queries.
384
+
385
+ <details>
386
+ <summary> Code </summary>
387
+
388
  ```python
389
  tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
390
 
 
400
 
401
  return summary
402
  ```
403
+ </details>
404
 
405
  ### Step-3 Query
406
  For the queries generated in Step-2, we will extract them and query LightRAG.
407
+
408
+ <details>
409
+ <summary> Code </summary>
410
+
411
  ```python
412
  def extract_queries(file_path):
413
  with open(file_path, 'r') as f:
 
419
 
420
  return queries
421
  ```
422
+ </details>
423
 
424
  ## Code Structure
425