Spaces:

rm-lht
/

lightrag

Configuration error

App Files Files Community

lightrag / docs /LightRAG_concurrent_explain.md

earayu

feat: add doc

51c6017 3 months ago

preview code

raw

history blame

10.5 kB

	# LightRAG Multi-Document Processing: Concurrent Control Strategy Analysis

	LightRAG employs a multi-layered concurrent control strategy when processing multiple documents. This article provides an in-depth analysis of the concurrent control mechanisms at document level, chunk level, and LLM request level, helping you understand why specific concurrent behaviors occur.

	## Overview

	LightRAG's concurrent control is divided into three layers:

	1. Document-level concurrency: Controls the number of documents processed simultaneously
	2. Chunk-level concurrency: Controls the number of chunks processed simultaneously within a single document
	3. LLM request-level concurrency: Controls the global concurrent number of LLM requests

	## 1. Document-Level Concurrent Control

	Control Parameter: `max_parallel_insert`

	Document-level concurrency is controlled by the `max_parallel_insert` parameter, with a default value of 2.

	```python
	# lightrag/lightrag.py
	max_parallel_insert: int = field(default=int(os.getenv("MAX_PARALLEL_INSERT", 2)))
	```

	### Implementation Mechanism

	In the `apipeline_process_enqueue_documents` method, a semaphore is used to control document concurrency:

	```python
	# lightrag/lightrag.py - apipeline_process_enqueue_documents method
	async def process_document(
	doc_id: str,
	status_doc: DocProcessingStatus,
	split_by_character: str \| None,
	split_by_character_only: bool,
	pipeline_status: dict,
	pipeline_status_lock: asyncio.Lock,
	semaphore: asyncio.Semaphore, # Document-level semaphore
	) -> None:
	"""Process single document"""
	async with semaphore: # 🔥 Document-level concurrent control
	# ... Process all chunks of a single document

	# Create document-level semaphore
	semaphore = asyncio.Semaphore(self.max_parallel_insert) # Default 2

	# Create processing tasks for each document
	doc_tasks = []
	for doc_id, status_doc in to_process_docs.items():
	doc_tasks.append(
	process_document(
	doc_id, status_doc, split_by_character, split_by_character_only,
	pipeline_status, pipeline_status_lock, semaphore
	)
	)

	# Wait for all documents to complete processing
	await asyncio.gather(*doc_tasks)
	```

	## 2. Chunk-Level Concurrent Control

	Control Parameter: `llm_model_max_async`

	Key Point: Each document independently creates its own chunk semaphore!

	```python
	# lightrag/lightrag.py
	llm_model_max_async: int = field(default=int(os.getenv("MAX_ASYNC", 4)))
	```

	### Implementation Mechanism

	In the `extract_entities` function, each document independently creates its own chunk semaphore:

	```python
	# lightrag/operate.py - extract_entities function
	async def extract_entities(chunks: dict[str, TextChunkSchema], global_config: dict[str, str], ...):
	# 🔥 Key: Each document independently creates this semaphore!
	llm_model_max_async = global_config.get("llm_model_max_async", 4)
	semaphore = asyncio.Semaphore(llm_model_max_async) # Chunk semaphore for each document

	async def _process_with_semaphore(chunk):
	async with semaphore: # 🔥 Chunk concurrent control within document
	return await _process_single_content(chunk)

	# Create tasks for each chunk
	tasks = []
	for c in ordered_chunks:
	task = asyncio.create_task(_process_with_semaphore(c))
	tasks.append(task)

	# Wait for all chunks to complete processing
	done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
	chunk_results = [task.result() for task in tasks]
	return chunk_results
	```

	### Important Inference: System Overall Chunk Concurrency

	Since each document independently creates chunk semaphores, the theoretical chunk concurrency of the system is:

	Theoretical Chunk Concurrency = max_parallel_insert × llm_model_max_async

	For example:
	- `max_parallel_insert = 2` (process 2 documents simultaneously)
	- `llm_model_max_async = 4` (maximum 4 chunk concurrency per document)
	- Theoretical result: Maximum 2 × 4 = 8 chunks simultaneously in "processing" state

	## 3. LLM Request-Level Concurrent Control (The Real Bottleneck)

	Control Parameter: `llm_model_max_async` (globally shared)

	Key: Although there might be 8 chunks "in processing", all LLM requests share the same global priority queue!

	```python
	# lightrag/lightrag.py - __post_init__ method
	self.llm_model_func = priority_limit_async_func_call(self.llm_model_max_async)(
	partial(
	self.llm_model_func,
	hashing_kv=hashing_kv,
	**self.llm_model_kwargs,
	)
	)
	# 🔥 Global LLM queue size = llm_model_max_async = 4
	```

	### Priority Queue Implementation

	```python
	# lightrag/utils.py - priority_limit_async_func_call function
	def priority_limit_async_func_call(max_size: int, max_queue_size: int = 1000):
	def final_decro(func):
	queue = asyncio.PriorityQueue(maxsize=max_queue_size)
	tasks = set()

	async def worker():
	"""Worker that processes tasks in the priority queue"""
	while not shutdown_event.is_set():
	try:
	priority, count, future, args, kwargs = await asyncio.wait_for(queue.get(), timeout=1.0)
	result = await func(args, *kwargs) # 🔥 Actual LLM call
	if not future.done():
	future.set_result(result)
	except Exception as e:
	# Error handling...
	finally:
	queue.task_done()

	# 🔥 Create fixed number of workers (max_size), this is the real concurrency limit
	for _ in range(max_size):
	task = asyncio.create_task(worker())
	tasks.add(task)
	```

	## 4. Chunk Internal Processing Mechanism (Serial)

	### Why Serial?

	Internal processing of each chunk strictly follows this serial execution order:

	```python
	# lightrag/operate.py - _process_single_content function
	async def _process_single_content(chunk_key_dp: tuple[str, TextChunkSchema]):
	# Step 1: Initial entity extraction
	hint_prompt = entity_extract_prompt.format({context_base, "input_text": content})
	final_result = await use_llm_func_with_cache(hint_prompt, use_llm_func, ...)

	# Process initial extraction results
	maybe_nodes, maybe_edges = await _process_extraction_result(final_result, chunk_key, file_path)

	# Step 2: Gleaning phase
	for now_glean_index in range(entity_extract_max_gleaning):
	# 🔥 Serial wait for gleaning results
	glean_result = await use_llm_func_with_cache(
	continue_prompt, use_llm_func,
	llm_response_cache=llm_response_cache,
	history_messages=history, cache_type="extract"
	)

	# Process gleaning results
	glean_nodes, glean_edges = await _process_extraction_result(glean_result, chunk_key, file_path)

	# Merge results...

	# Step 3: Determine whether to continue loop
	if now_glean_index == entity_extract_max_gleaning - 1:
	break

	# 🔥 Serial wait for loop decision results
	if_loop_result = await use_llm_func_with_cache(
	if_loop_prompt, use_llm_func,
	llm_response_cache=llm_response_cache,
	history_messages=history, cache_type="extract"
	)

	if if_loop_result.strip().strip('"').strip("'").lower() != "yes":
	break

	return maybe_nodes, maybe_edges
	```

	## 5. Complete Concurrent Hierarchy Diagram
	![lightrag_indexing.png](assets%2Flightrag_indexing.png)

	### Chunk Internal Processing (Serial)
	```
	Initial Extraction → Gleaning → Loop Decision → Complete
	```

	## 6. Real-World Scenario Analysis

	### Scenario 1: Single Document with Multiple Chunks
	Assume 1 document with 6 chunks:

	- Document level: Only 1 document, not limited by `max_parallel_insert`
	- Chunk level: Maximum 4 chunks processed simultaneously (limited by `llm_model_max_async=4`)
	- LLM level: Global maximum 4 LLM requests concurrent

	Expected behavior: 4 chunks process concurrently, remaining 2 chunks wait.

	### Scenario 2: Multiple Documents with Multiple Chunks
	Assume 3 documents, each with 10 chunks:

	- Document level: Maximum 2 documents processed simultaneously
	- Chunk level: Maximum 4 chunks per document processed simultaneously
	- Theoretical Chunk concurrency: 2 × 4 = 8 chunks processed simultaneously
	- Actual LLM concurrency: Only 4 LLM requests actually execute

	Actual state distribution:
	```
	# Possible system state:
	Document 1: 4 chunks "processing" (2 executing LLM, 2 waiting for LLM response)
	Document 2: 4 chunks "processing" (2 executing LLM, 2 waiting for LLM response)
	Document 3: Waiting for document-level semaphore

	Total:
	- 8 chunks in "processing" state
	- 4 LLM requests actually executing
	- 4 chunks waiting for LLM response
	```

	## 7. Performance Optimization Recommendations

	### Understanding the Bottleneck

	The real bottleneck is the global LLM queue, not the chunk semaphores!

	### Adjustment Strategies

	Strategy 1: Increase LLM Concurrent Capacity

	```bash
	# Environment variable configuration
	export MAX_PARALLEL_INSERT=2 # Keep document concurrency
	export MAX_ASYNC=8 # 🔥 Increase LLM request concurrency
	```

	Strategy 2: Balance Document and LLM Concurrency

	```python
	rag = LightRAG(
	max_parallel_insert=3, # Moderately increase document concurrency
	llm_model_max_async=12, # Significantly increase LLM concurrency
	entity_extract_max_gleaning=0, # Reduce serial steps within chunks
	)
	```

	## 8. Summary

	Key characteristics of LightRAG's multi-document concurrent processing mechanism:

	### Concurrent Layers
	1. Inter-document competition: Controlled by `max_parallel_insert`, default 2 documents concurrent
	2. Theoretical Chunk concurrency: Each document independently creates semaphores, total = max_parallel_insert × llm_model_max_async
	3. Actual LLM concurrency: All chunks share global LLM queue, controlled by `llm_model_max_async`
	4. Intra-chunk serial: Multiple LLM requests within each chunk execute strictly serially

	### Key Insights
	- Theoretical vs Actual: System may have many chunks "in processing", but only few are actually executing LLM requests
	- Real Bottleneck: Global LLM request queue is the performance bottleneck, not chunk semaphores
	- Optimization Focus: Increasing `llm_model_max_async` is more effective than increasing `max_parallel_insert`