YanSte commited on
Commit
3c3f1b5
·
1 Parent(s): 33eca92

updated doc

Browse files
Files changed (1) hide show
  1. README.md +61 -58
README.md CHANGED
@@ -566,7 +566,7 @@ rag.insert(text_content.decode('utf-8'))
566
  ```
567
  </details>
568
 
569
- ### Storage
570
 
571
  <details>
572
  <summary> <b>Using Neo4J for Storage</b> </summary>
@@ -682,8 +682,8 @@ async def embedding_func(texts: list[str]) -> np.ndarray:
682
 
683
  </details>
684
 
 
685
 
686
- ### Delete
687
  ```python
688
 
689
  rag = LightRAG(
@@ -703,11 +703,63 @@ rag.delete_by_entity("Project Gutenberg")
703
  rag.delete_by_doc_id("doc_id")
704
  ```
705
 
 
706
 
707
- ### Graph Visualization
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
708
 
709
  <details>
710
- <summary> Graph visualization with html </summary>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
711
 
712
  * The following code can be found in `examples/graph_visual_with_html.py`
713
 
@@ -731,7 +783,8 @@ net.show('knowledge_graph.html')
731
  </details>
732
 
733
  <details>
734
- <summary> Graph visualization with Neo4j </summary>
 
735
 
736
  * The following code can be found in `examples/graph_visual_with_neo4j.py`
737
 
@@ -858,52 +911,13 @@ if __name__ == "__main__":
858
 
859
  </details>
860
 
861
- ### LightRAG init parameters
862
-
863
  <details>
864
- <summary> Parameters </summary>
865
 
866
- | **Parameter** | **Type** | **Explanation** | **Default** |
867
- |----------------------------------------------| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
868
- | **working\_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
869
- | **kv\_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` |
870
- | **vector\_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` |
871
- | **graph\_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` |
872
- | **log\_level** | | Log level for application runtime | `logging.DEBUG` |
873
- | **chunk\_token\_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
874
- | **chunk\_overlap\_token\_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
875
- | **tiktoken\_model\_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
876
- | **entity\_extract\_max\_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` |
877
- | **entity\_summary\_to\_max\_tokens** | `int` | Maximum token size for each entity summary | `500` |
878
- | **node\_embedding\_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` |
879
- | **node2vec\_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
880
- | **embedding\_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` |
881
- | **embedding\_batch\_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` |
882
- | **embedding\_func\_max\_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` |
883
- | **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
884
- | **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
885
- | **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) |
886
- | **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`(default value changed by env var MAX_ASYNC) |
887
- | **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
888
- | **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) |
889
- | **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
890
- | **enable\_llm\_cache\_for\_entity\_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` |
891
- | **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` |
892
- | **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
893
- | **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
894
- |**log\_dir** | `str` | Directory to store logs. | `./` |
895
-
896
- </details>
897
-
898
- ### Error Handling
899
 
900
- <details>
901
- <summary>Click to view error handling details</summary>
902
 
903
- The API includes comprehensive error handling:
904
- - File not found errors (404)
905
- - Processing errors (500)
906
- - Supports multiple file encodings (UTF-8 and GBK)
907
  </details>
908
 
909
  ## Evaluation
@@ -1147,17 +1161,6 @@ def extract_queries(file_path):
1147
  ```
1148
  </details>
1149
 
1150
- ## API
1151
- LightRag can be installed with API support to serve a Fast api interface to perform data upload and indexing/Rag operations/Rescan of the input folder etc..
1152
-
1153
- The documentation can be found [here](lightrag/api/README.md)
1154
-
1155
- ## Graph viewer
1156
- LightRag can be installed with Tools support to add extra tools like the graphml 3d visualizer.
1157
-
1158
- The documentation can be found [here](lightrag/tools/lightrag_visualizer/README.md)
1159
-
1160
-
1161
  ## Star History
1162
 
1163
  <a href="https://star-history.com/#HKUDS/LightRAG&Date">
 
566
  ```
567
  </details>
568
 
569
+ ## Storage
570
 
571
  <details>
572
  <summary> <b>Using Neo4J for Storage</b> </summary>
 
682
 
683
  </details>
684
 
685
+ ## Delete
686
 
 
687
  ```python
688
 
689
  rag = LightRAG(
 
703
  rag.delete_by_doc_id("doc_id")
704
  ```
705
 
706
+ ## LightRAG init parameters
707
 
708
+ <details>
709
+ <summary> Parameters </summary>
710
+
711
+ | **Parameter** | **Type** | **Explanation** | **Default** |
712
+ |----------------------------------------------| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
713
+ | **working\_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
714
+ | **kv\_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` |
715
+ | **vector\_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` |
716
+ | **graph\_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` |
717
+ | **log\_level** | | Log level for application runtime | `logging.DEBUG` |
718
+ | **chunk\_token\_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
719
+ | **chunk\_overlap\_token\_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
720
+ | **tiktoken\_model\_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
721
+ | **entity\_extract\_max\_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` |
722
+ | **entity\_summary\_to\_max\_tokens** | `int` | Maximum token size for each entity summary | `500` |
723
+ | **node\_embedding\_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` |
724
+ | **node2vec\_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
725
+ | **embedding\_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` |
726
+ | **embedding\_batch\_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` |
727
+ | **embedding\_func\_max\_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` |
728
+ | **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
729
+ | **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
730
+ | **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) |
731
+ | **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`(default value changed by env var MAX_ASYNC) |
732
+ | **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
733
+ | **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) |
734
+ | **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
735
+ | **enable\_llm\_cache\_for\_entity\_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` |
736
+ | **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` |
737
+ | **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
738
+ | **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
739
+ |**log\_dir** | `str` | Directory to store logs. | `./` |
740
+
741
+ </details>
742
+
743
+ ## Error Handling
744
 
745
  <details>
746
+ <summary>Click to view error handling details</summary>
747
+
748
+ The API includes comprehensive error handling:
749
+ - File not found errors (404)
750
+ - Processing errors (500)
751
+ - Supports multiple file encodings (UTF-8 and GBK)
752
+ </details>
753
+
754
+ ## API
755
+ LightRag can be installed with API support to serve a Fast api interface to perform data upload and indexing/Rag operations/Rescan of the input folder etc..
756
+
757
+ [LightRag API](lightrag/api/README.md)
758
+
759
+ ## Graph Visualization
760
+
761
+ <details>
762
+ <summary> <b>Graph visualization with html</b> </summary>
763
 
764
  * The following code can be found in `examples/graph_visual_with_html.py`
765
 
 
783
  </details>
784
 
785
  <details>
786
+ <summary> <b>Graph visualization with Neo4</b> </summary>
787
+
788
 
789
  * The following code can be found in `examples/graph_visual_with_neo4j.py`
790
 
 
911
 
912
  </details>
913
 
 
 
914
  <details>
915
+ <summary> <b>Graphml 3d visualizer</b> </summary>
916
 
917
+ LightRag can be installed with Tools support to add extra tools like the graphml 3d visualizer.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
918
 
919
+ [LightRag Visualizer](lightrag/tools/lightrag_visualizer/README.md)
 
920
 
 
 
 
 
921
  </details>
922
 
923
  ## Evaluation
 
1161
  ```
1162
  </details>
1163
 
 
 
 
 
 
 
 
 
 
 
 
1164
  ## Star History
1165
 
1166
  <a href="https://star-history.com/#HKUDS/LightRAG&Date">