Magicyuan commited on
Commit
d6ef863
·
1 Parent(s): 4a5787d

docs(readme): Add batch size configuration documentation

Browse files

文档(readme): 添加批处理大小配置说明

- Add documentation for insert_batch_size parameter in addon_params
- 在 addon_params 中添加 insert_batch_size 参数的文档说明

- Explain default batch size value and its usage
- 说明默认批处理大小值及其用途

- Add example configuration for batch processing
- 添加批处理配置的示例

Files changed (1) hide show
  1. README.md +17 -2
README.md CHANGED
@@ -278,10 +278,25 @@ class QueryParam:
278
  ### Batch Insert
279
 
280
  ```python
281
- # Batch Insert: Insert multiple texts at once
282
  rag.insert(["TEXT1", "TEXT2",...])
 
 
 
 
 
 
 
 
 
283
  ```
284
 
 
 
 
 
 
 
285
  ### Incremental Insert
286
 
287
  ```python
@@ -594,7 +609,7 @@ if __name__ == "__main__":
594
  | **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
595
  | **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database (currently not used) | |
596
  | **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
597
- | **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"]}`: sets example limit and output language | `example_number: all examples, language: English` |
598
  | **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
599
  | **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
600
 
 
278
  ### Batch Insert
279
 
280
  ```python
281
+ # Basic Batch Insert: Insert multiple texts at once
282
  rag.insert(["TEXT1", "TEXT2",...])
283
+
284
+ # Batch Insert with custom batch size configuration
285
+ rag = LightRAG(
286
+ working_dir=WORKING_DIR,
287
+ addon_params={
288
+ "insert_batch_size": 20 # Process 20 documents per batch
289
+ }
290
+ )
291
+ rag.insert(["TEXT1", "TEXT2", "TEXT3", ...]) # Documents will be processed in batches of 20
292
  ```
293
 
294
+ The `insert_batch_size` parameter in `addon_params` controls how many documents are processed in each batch during insertion. This is useful for:
295
+ - Managing memory usage with large document collections
296
+ - Optimizing processing speed
297
+ - Providing better progress tracking
298
+ - Default value is 10 if not specified
299
+
300
  ### Incremental Insert
301
 
302
  ```python
 
609
  | **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
610
  | **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database (currently not used) | |
611
  | **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
612
+ | **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` |
613
  | **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
614
  | **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
615