docs(readme): Add batch size configuration documentation
Browse files文档(readme): 添加批处理大小配置说明
- Add documentation for insert_batch_size parameter in addon_params
- 在 addon_params 中添加 insert_batch_size 参数的文档说明
- Explain default batch size value and its usage
- 说明默认批处理大小值及其用途
- Add example configuration for batch processing
- 添加批处理配置的示例
README.md
CHANGED
@@ -278,10 +278,25 @@ class QueryParam:
|
|
278 |
### Batch Insert
|
279 |
|
280 |
```python
|
281 |
-
# Batch Insert: Insert multiple texts at once
|
282 |
rag.insert(["TEXT1", "TEXT2",...])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
283 |
```
|
284 |
|
|
|
|
|
|
|
|
|
|
|
|
|
285 |
### Incremental Insert
|
286 |
|
287 |
```python
|
@@ -594,7 +609,7 @@ if __name__ == "__main__":
|
|
594 |
| **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
|
595 |
| **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database (currently not used) | |
|
596 |
| **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
|
597 |
-
| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"]}`: sets example limit
|
598 |
| **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
|
599 |
| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
|
600 |
|
|
|
278 |
### Batch Insert
|
279 |
|
280 |
```python
|
281 |
+
# Basic Batch Insert: Insert multiple texts at once
|
282 |
rag.insert(["TEXT1", "TEXT2",...])
|
283 |
+
|
284 |
+
# Batch Insert with custom batch size configuration
|
285 |
+
rag = LightRAG(
|
286 |
+
working_dir=WORKING_DIR,
|
287 |
+
addon_params={
|
288 |
+
"insert_batch_size": 20 # Process 20 documents per batch
|
289 |
+
}
|
290 |
+
)
|
291 |
+
rag.insert(["TEXT1", "TEXT2", "TEXT3", ...]) # Documents will be processed in batches of 20
|
292 |
```
|
293 |
|
294 |
+
The `insert_batch_size` parameter in `addon_params` controls how many documents are processed in each batch during insertion. This is useful for:
|
295 |
+
- Managing memory usage with large document collections
|
296 |
+
- Optimizing processing speed
|
297 |
+
- Providing better progress tracking
|
298 |
+
- Default value is 10 if not specified
|
299 |
+
|
300 |
### Incremental Insert
|
301 |
|
302 |
```python
|
|
|
609 |
| **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
|
610 |
| **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database (currently not used) | |
|
611 |
| **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
|
612 |
+
| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` |
|
613 |
| **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
|
614 |
| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
|
615 |
|