yangdx commited on
Commit
45e3112
·
1 Parent(s): 5a7b64e

Adjust concurrency limits more LLM friendly settings for new comers

Browse files

- Lowered max async LLM processes to 4
- Enabled LLM cache for entity extraction
- Reduced max parallel insert to 2

README.md CHANGED
@@ -1061,7 +1061,7 @@ Valid modes are:
1061
  | **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
1062
  | **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
1063
  | **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) |
1064
- | **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`(default value changed by env var MAX_ASYNC) |
1065
  | **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
1066
  | **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) |
1067
  | **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
 
1061
  | **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
1062
  | **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
1063
  | **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) |
1064
+ | **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `4`(default value changed by env var MAX_ASYNC) |
1065
  | **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
1066
  | **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) |
1067
  | **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
env.example CHANGED
@@ -50,7 +50,8 @@
50
  # MAX_TOKEN_SUMMARY=500 # Max tokens for entity or relations summary
51
  # SUMMARY_LANGUAGE=English
52
  # MAX_EMBED_TOKENS=8192
53
- # ENABLE_LLM_CACHE_FOR_EXTRACT=false # Enable LLM cache for entity extraction, defaults to false
 
54
 
55
  ### LLM Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)
56
  LLM_BINDING=ollama
 
50
  # MAX_TOKEN_SUMMARY=500 # Max tokens for entity or relations summary
51
  # SUMMARY_LANGUAGE=English
52
  # MAX_EMBED_TOKENS=8192
53
+ # ENABLE_LLM_CACHE_FOR_EXTRACT=true # Enable LLM cache for entity extraction
54
+ # MAX_PARALLEL_INSERT=2 # Maximum number of parallel processing documents in pipeline
55
 
56
  ### LLM Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)
57
  LLM_BINDING=ollama
lightrag/api/README.md CHANGED
@@ -224,7 +224,7 @@ LightRAG supports binding to various LLM/Embedding backends:
224
  Use environment variables `LLM_BINDING` or CLI argument `--llm-binding` to select LLM backend type. Use environment variables `EMBEDDING_BINDING` or CLI argument `--embedding-binding` to select LLM backend type.
225
 
226
  ### Entity Extraction Configuration
227
- * ENABLE_LLM_CACHE_FOR_EXTRACT: Enable LLM cache for entity extraction (default: false)
228
 
229
  It's very common to set `ENABLE_LLM_CACHE_FOR_EXTRACT` to true for test environment to reduce the cost of LLM calls.
230
 
 
224
  Use environment variables `LLM_BINDING` or CLI argument `--llm-binding` to select LLM backend type. Use environment variables `EMBEDDING_BINDING` or CLI argument `--embedding-binding` to select LLM backend type.
225
 
226
  ### Entity Extraction Configuration
227
+ * ENABLE_LLM_CACHE_FOR_EXTRACT: Enable LLM cache for entity extraction (default: true)
228
 
229
  It's very common to set `ENABLE_LLM_CACHE_FOR_EXTRACT` to true for test environment to reduce the cost of LLM calls.
230
 
lightrag/api/utils_api.py CHANGED
@@ -364,7 +364,7 @@ def parse_args(is_uvicorn_mode: bool = False) -> argparse.Namespace:
364
 
365
  # Inject LLM cache configuration
366
  args.enable_llm_cache_for_extract = get_env_value(
367
- "ENABLE_LLM_CACHE_FOR_EXTRACT", False, bool
368
  )
369
 
370
  # Select Document loading tool (DOCLING, DEFAULT)
 
364
 
365
  # Inject LLM cache configuration
366
  args.enable_llm_cache_for_extract = get_env_value(
367
+ "ENABLE_LLM_CACHE_FOR_EXTRACT", True, bool
368
  )
369
 
370
  # Select Document loading tool (DOCLING, DEFAULT)
lightrag/lightrag.py CHANGED
@@ -214,7 +214,7 @@ class LightRAG:
214
  llm_model_max_token_size: int = field(default=int(os.getenv("MAX_TOKENS", 32768)))
215
  """Maximum number of tokens allowed per LLM response."""
216
 
217
- llm_model_max_async: int = field(default=int(os.getenv("MAX_ASYNC", 16)))
218
  """Maximum number of concurrent LLM calls."""
219
 
220
  llm_model_kwargs: dict[str, Any] = field(default_factory=dict)
@@ -238,7 +238,7 @@ class LightRAG:
238
  # Extensions
239
  # ---
240
 
241
- max_parallel_insert: int = field(default=int(os.getenv("MAX_PARALLEL_INSERT", 20)))
242
  """Maximum number of parallel insert operations."""
243
 
244
  addon_params: dict[str, Any] = field(
 
214
  llm_model_max_token_size: int = field(default=int(os.getenv("MAX_TOKENS", 32768)))
215
  """Maximum number of tokens allowed per LLM response."""
216
 
217
+ llm_model_max_async: int = field(default=int(os.getenv("MAX_ASYNC", 4)))
218
  """Maximum number of concurrent LLM calls."""
219
 
220
  llm_model_kwargs: dict[str, Any] = field(default_factory=dict)
 
238
  # Extensions
239
  # ---
240
 
241
+ max_parallel_insert: int = field(default=int(os.getenv("MAX_PARALLEL_INSERT", 2)))
242
  """Maximum number of parallel insert operations."""
243
 
244
  addon_params: dict[str, Any] = field(