jin commited on
Commit
7cf01b2
·
1 Parent(s): ac222b0

add LightRAG init parameters in readme

Browse files
README.md CHANGED
@@ -511,6 +511,35 @@ if __name__ == "__main__":
511
 
512
  </details>
513
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
514
  ## API Server Implementation
515
 
516
  LightRAG also provides a FastAPI-based server implementation for RESTful API access to RAG operations. This allows you to run LightRAG as a service and interact with it through HTTP requests.
 
511
 
512
  </details>
513
 
514
+ ### LightRAG init parameters
515
+
516
+ | **Parameter** | **Type** | **Explanation** | **Default** |
517
+ | --- | --- | --- | --- |
518
+ | **working\_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
519
+ | **kv\_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` |
520
+ | **vector\_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` |
521
+ | **graph\_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` |
522
+ | **log\_level** | | Log level for application runtime | `logging.DEBUG` |
523
+ | **chunk\_token\_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
524
+ | **chunk\_overlap\_token\_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
525
+ | **tiktoken\_model\_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
526
+ | **entity\_extract\_max\_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` |
527
+ | **entity\_summary\_to\_max\_tokens** | `int` | Maximum token size for each entity summary | `500` |
528
+ | **node\_embedding\_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` |
529
+ | **node2vec\_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
530
+ | **embedding\_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embedding` |
531
+ | **embedding\_batch\_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` |
532
+ | **embedding\_func\_max\_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` |
533
+ | **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
534
+ | **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
535
+ | **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768` |
536
+ | **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16` |
537
+ | **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
538
+ | **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database (currently not used) | |
539
+ | **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
540
+ | **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese"}`: sets example limit and output language | `example_number: all examples, language: English` |
541
+ | **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
542
+
543
  ## API Server Implementation
544
 
545
  LightRAG also provides a FastAPI-based server implementation for RESTful API access to RAG operations. This allows you to run LightRAG as a service and interact with it through HTTP requests.
examples/lightrag_api_oracle_demo..py CHANGED
@@ -81,7 +81,7 @@ async def get_embedding_dim():
81
 
82
  async def init():
83
  # Detect embedding dimension
84
- embedding_dimension = 1024 # await get_embedding_dim()
85
  print(f"Detected embedding dimension: {embedding_dimension}")
86
  # Create Oracle DB connection
87
  # The `config` parameter is the connection configuration of Oracle DB
@@ -105,6 +105,7 @@ async def init():
105
  await oracle_db.check_tables()
106
  # Initialize LightRAG
107
  # We use Oracle DB as the KV/vector/graph storage
 
108
  rag = LightRAG(
109
  enable_llm_cache=False,
110
  working_dir=WORKING_DIR,
 
81
 
82
  async def init():
83
  # Detect embedding dimension
84
+ embedding_dimension = await get_embedding_dim()
85
  print(f"Detected embedding dimension: {embedding_dimension}")
86
  # Create Oracle DB connection
87
  # The `config` parameter is the connection configuration of Oracle DB
 
105
  await oracle_db.check_tables()
106
  # Initialize LightRAG
107
  # We use Oracle DB as the KV/vector/graph storage
108
+ # You can add `addon_params={"example_number": 1, "language": "Simplfied Chinese"}` to control the prompt
109
  rag = LightRAG(
110
  enable_llm_cache=False,
111
  working_dir=WORKING_DIR,
examples/lightrag_oracle_demo.py CHANGED
@@ -84,6 +84,7 @@ async def main():
84
 
85
  # Initialize LightRAG
86
  # We use Oracle DB as the KV/vector/graph storage
 
87
  rag = LightRAG(
88
  enable_llm_cache=False,
89
  working_dir=WORKING_DIR,
@@ -96,8 +97,7 @@ async def main():
96
  ),
97
  graph_storage="OracleGraphStorage",
98
  kv_storage="OracleKVStorage",
99
- vector_storage="OracleVectorDBStorage",
100
- addon_params={"example_number": 1, "language": "Simplfied Chinese"},
101
  )
102
 
103
  # Setthe KV/vector/graph storage's `db` property, so all operation will use same connection pool
 
84
 
85
  # Initialize LightRAG
86
  # We use Oracle DB as the KV/vector/graph storage
87
+ # You can add `addon_params={"example_number": 1, "language": "Simplfied Chinese"}` to control the prompt
88
  rag = LightRAG(
89
  enable_llm_cache=False,
90
  working_dir=WORKING_DIR,
 
97
  ),
98
  graph_storage="OracleGraphStorage",
99
  kv_storage="OracleKVStorage",
100
+ vector_storage="OracleVectorDBStorage"
 
101
  )
102
 
103
  # Setthe KV/vector/graph storage's `db` property, so all operation will use same connection pool
lightrag/llm.py CHANGED
@@ -72,7 +72,7 @@ async def openai_complete_if_cache(
72
  content = response.choices[0].message.content
73
  if r"\u" in content:
74
  content = content.encode("utf-8").decode("unicode_escape")
75
- print(content)
76
  if hashing_kv is not None:
77
  await hashing_kv.upsert(
78
  {args_hash: {"return": response.choices[0].message.content, "model": model}}
 
72
  content = response.choices[0].message.content
73
  if r"\u" in content:
74
  content = content.encode("utf-8").decode("unicode_escape")
75
+ # print(content)
76
  if hashing_kv is not None:
77
  await hashing_kv.upsert(
78
  {args_hash: {"return": response.choices[0].message.content, "model": model}}
lightrag/operate.py CHANGED
@@ -571,19 +571,19 @@ async def _build_query_context(
571
  hl_text_units_context,
572
  )
573
  return f"""
574
- # -----Entities-----
575
- # ```csv
576
- # {entities_context}
577
- # ```
578
- # -----Relationships-----
579
- # ```csv
580
- # {relations_context}
581
- # ```
582
- # -----Sources-----
583
- # ```csv
584
- # {text_units_context}
585
- # ```
586
- # """
587
 
588
 
589
  async def _get_node_data(
@@ -593,18 +593,18 @@ async def _get_node_data(
593
  text_chunks_db: BaseKVStorage[TextChunkSchema],
594
  query_param: QueryParam,
595
  ):
596
- # 获取相似的实体
597
  results = await entities_vdb.query(query, top_k=query_param.top_k)
598
  if not len(results):
599
  return None
600
- # 获取实体信息
601
  node_datas = await asyncio.gather(
602
  *[knowledge_graph_inst.get_node(r["entity_name"]) for r in results]
603
  )
604
  if not all([n is not None for n in node_datas]):
605
  logger.warning("Some nodes are missing, maybe the storage is damaged")
606
 
607
- # 获取实体的度
608
  node_degrees = await asyncio.gather(
609
  *[knowledge_graph_inst.node_degree(r["entity_name"]) for r in results]
610
  )
@@ -613,11 +613,11 @@ async def _get_node_data(
613
  for k, n, d in zip(results, node_datas, node_degrees)
614
  if n is not None
615
  ] # what is this text_chunks_db doing. dont remember it in airvx. check the diagram.
616
- # 根据实体获取文本片段
617
  use_text_units = await _find_most_related_text_unit_from_entities(
618
  node_datas, query_param, text_chunks_db, knowledge_graph_inst
619
  )
620
- # 获取关联的边
621
  use_relations = await _find_most_related_edges_from_entities(
622
  node_datas, query_param, knowledge_graph_inst
623
  )
@@ -625,7 +625,7 @@ async def _get_node_data(
625
  f"Local query uses {len(node_datas)} entites, {len(use_relations)} relations, {len(use_text_units)} text units"
626
  )
627
 
628
- # 构建提示词
629
  entites_section_list = [["id", "entity", "type", "description", "rank"]]
630
  for i, n in enumerate(node_datas):
631
  entites_section_list.append(
 
571
  hl_text_units_context,
572
  )
573
  return f"""
574
+ -----Entities-----
575
+ ```csv
576
+ {entities_context}
577
+ ```
578
+ -----Relationships-----
579
+ ```csv
580
+ {relations_context}
581
+ ```
582
+ -----Sources-----
583
+ ```csv
584
+ {text_units_context}
585
+ ```
586
+ """
587
 
588
 
589
  async def _get_node_data(
 
593
  text_chunks_db: BaseKVStorage[TextChunkSchema],
594
  query_param: QueryParam,
595
  ):
596
+ # get similar entities
597
  results = await entities_vdb.query(query, top_k=query_param.top_k)
598
  if not len(results):
599
  return None
600
+ # get entity information
601
  node_datas = await asyncio.gather(
602
  *[knowledge_graph_inst.get_node(r["entity_name"]) for r in results]
603
  )
604
  if not all([n is not None for n in node_datas]):
605
  logger.warning("Some nodes are missing, maybe the storage is damaged")
606
 
607
+ # get entity degree
608
  node_degrees = await asyncio.gather(
609
  *[knowledge_graph_inst.node_degree(r["entity_name"]) for r in results]
610
  )
 
613
  for k, n, d in zip(results, node_datas, node_degrees)
614
  if n is not None
615
  ] # what is this text_chunks_db doing. dont remember it in airvx. check the diagram.
616
+ # get entitytext chunk
617
  use_text_units = await _find_most_related_text_unit_from_entities(
618
  node_datas, query_param, text_chunks_db, knowledge_graph_inst
619
  )
620
+ # get relate edges
621
  use_relations = await _find_most_related_edges_from_entities(
622
  node_datas, query_param, knowledge_graph_inst
623
  )
 
625
  f"Local query uses {len(node_datas)} entites, {len(use_relations)} relations, {len(use_text_units)} text units"
626
  )
627
 
628
+ # build prompt
629
  entites_section_list = [["id", "entity", "type", "description", "rank"]]
630
  for i, n in enumerate(node_datas):
631
  entites_section_list.append(