updated doc
Browse files
README.md
CHANGED
@@ -566,7 +566,7 @@ rag.insert(text_content.decode('utf-8'))
|
|
566 |
```
|
567 |
</details>
|
568 |
|
569 |
-
|
570 |
|
571 |
<details>
|
572 |
<summary> <b>Using Neo4J for Storage</b> </summary>
|
@@ -682,8 +682,8 @@ async def embedding_func(texts: list[str]) -> np.ndarray:
|
|
682 |
|
683 |
</details>
|
684 |
|
|
|
685 |
|
686 |
-
### Delete
|
687 |
```python
|
688 |
|
689 |
rag = LightRAG(
|
@@ -703,11 +703,63 @@ rag.delete_by_entity("Project Gutenberg")
|
|
703 |
rag.delete_by_doc_id("doc_id")
|
704 |
```
|
705 |
|
|
|
706 |
|
707 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
708 |
|
709 |
<details>
|
710 |
-
<summary>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
711 |
|
712 |
* The following code can be found in `examples/graph_visual_with_html.py`
|
713 |
|
@@ -731,7 +783,8 @@ net.show('knowledge_graph.html')
|
|
731 |
</details>
|
732 |
|
733 |
<details>
|
734 |
-
<summary> Graph visualization with
|
|
|
735 |
|
736 |
* The following code can be found in `examples/graph_visual_with_neo4j.py`
|
737 |
|
@@ -858,52 +911,13 @@ if __name__ == "__main__":
|
|
858 |
|
859 |
</details>
|
860 |
|
861 |
-
### LightRAG init parameters
|
862 |
-
|
863 |
<details>
|
864 |
-
<summary>
|
865 |
|
866 |
-
|
867 |
-
|----------------------------------------------| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
|
868 |
-
| **working\_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
869 |
-
| **kv\_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` |
|
870 |
-
| **vector\_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` |
|
871 |
-
| **graph\_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` |
|
872 |
-
| **log\_level** | | Log level for application runtime | `logging.DEBUG` |
|
873 |
-
| **chunk\_token\_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
|
874 |
-
| **chunk\_overlap\_token\_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
|
875 |
-
| **tiktoken\_model\_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
|
876 |
-
| **entity\_extract\_max\_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` |
|
877 |
-
| **entity\_summary\_to\_max\_tokens** | `int` | Maximum token size for each entity summary | `500` |
|
878 |
-
| **node\_embedding\_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` |
|
879 |
-
| **node2vec\_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
|
880 |
-
| **embedding\_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` |
|
881 |
-
| **embedding\_batch\_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` |
|
882 |
-
| **embedding\_func\_max\_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` |
|
883 |
-
| **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
|
884 |
-
| **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
|
885 |
-
| **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) |
|
886 |
-
| **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`(default value changed by env var MAX_ASYNC) |
|
887 |
-
| **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
|
888 |
-
| **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) |
|
889 |
-
| **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
|
890 |
-
| **enable\_llm\_cache\_for\_entity\_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` |
|
891 |
-
| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` |
|
892 |
-
| **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
|
893 |
-
| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
|
894 |
-
|**log\_dir** | `str` | Directory to store logs. | `./` |
|
895 |
-
|
896 |
-
</details>
|
897 |
-
|
898 |
-
### Error Handling
|
899 |
|
900 |
-
|
901 |
-
<summary>Click to view error handling details</summary>
|
902 |
|
903 |
-
The API includes comprehensive error handling:
|
904 |
-
- File not found errors (404)
|
905 |
-
- Processing errors (500)
|
906 |
-
- Supports multiple file encodings (UTF-8 and GBK)
|
907 |
</details>
|
908 |
|
909 |
## Evaluation
|
@@ -1147,17 +1161,6 @@ def extract_queries(file_path):
|
|
1147 |
```
|
1148 |
</details>
|
1149 |
|
1150 |
-
## API
|
1151 |
-
LightRag can be installed with API support to serve a Fast api interface to perform data upload and indexing/Rag operations/Rescan of the input folder etc..
|
1152 |
-
|
1153 |
-
The documentation can be found [here](lightrag/api/README.md)
|
1154 |
-
|
1155 |
-
## Graph viewer
|
1156 |
-
LightRag can be installed with Tools support to add extra tools like the graphml 3d visualizer.
|
1157 |
-
|
1158 |
-
The documentation can be found [here](lightrag/tools/lightrag_visualizer/README.md)
|
1159 |
-
|
1160 |
-
|
1161 |
## Star History
|
1162 |
|
1163 |
<a href="https://star-history.com/#HKUDS/LightRAG&Date">
|
|
|
566 |
```
|
567 |
</details>
|
568 |
|
569 |
+
## Storage
|
570 |
|
571 |
<details>
|
572 |
<summary> <b>Using Neo4J for Storage</b> </summary>
|
|
|
682 |
|
683 |
</details>
|
684 |
|
685 |
+
## Delete
|
686 |
|
|
|
687 |
```python
|
688 |
|
689 |
rag = LightRAG(
|
|
|
703 |
rag.delete_by_doc_id("doc_id")
|
704 |
```
|
705 |
|
706 |
+
## LightRAG init parameters
|
707 |
|
708 |
+
<details>
|
709 |
+
<summary> Parameters </summary>
|
710 |
+
|
711 |
+
| **Parameter** | **Type** | **Explanation** | **Default** |
|
712 |
+
|----------------------------------------------| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
|
713 |
+
| **working\_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
714 |
+
| **kv\_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` |
|
715 |
+
| **vector\_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` |
|
716 |
+
| **graph\_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` |
|
717 |
+
| **log\_level** | | Log level for application runtime | `logging.DEBUG` |
|
718 |
+
| **chunk\_token\_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
|
719 |
+
| **chunk\_overlap\_token\_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
|
720 |
+
| **tiktoken\_model\_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
|
721 |
+
| **entity\_extract\_max\_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` |
|
722 |
+
| **entity\_summary\_to\_max\_tokens** | `int` | Maximum token size for each entity summary | `500` |
|
723 |
+
| **node\_embedding\_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` |
|
724 |
+
| **node2vec\_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
|
725 |
+
| **embedding\_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` |
|
726 |
+
| **embedding\_batch\_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` |
|
727 |
+
| **embedding\_func\_max\_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` |
|
728 |
+
| **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
|
729 |
+
| **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
|
730 |
+
| **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) |
|
731 |
+
| **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`(default value changed by env var MAX_ASYNC) |
|
732 |
+
| **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
|
733 |
+
| **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) |
|
734 |
+
| **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
|
735 |
+
| **enable\_llm\_cache\_for\_entity\_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` |
|
736 |
+
| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` |
|
737 |
+
| **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
|
738 |
+
| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
|
739 |
+
|**log\_dir** | `str` | Directory to store logs. | `./` |
|
740 |
+
|
741 |
+
</details>
|
742 |
+
|
743 |
+
## Error Handling
|
744 |
|
745 |
<details>
|
746 |
+
<summary>Click to view error handling details</summary>
|
747 |
+
|
748 |
+
The API includes comprehensive error handling:
|
749 |
+
- File not found errors (404)
|
750 |
+
- Processing errors (500)
|
751 |
+
- Supports multiple file encodings (UTF-8 and GBK)
|
752 |
+
</details>
|
753 |
+
|
754 |
+
## API
|
755 |
+
LightRag can be installed with API support to serve a Fast api interface to perform data upload and indexing/Rag operations/Rescan of the input folder etc..
|
756 |
+
|
757 |
+
[LightRag API](lightrag/api/README.md)
|
758 |
+
|
759 |
+
## Graph Visualization
|
760 |
+
|
761 |
+
<details>
|
762 |
+
<summary> <b>Graph visualization with html</b> </summary>
|
763 |
|
764 |
* The following code can be found in `examples/graph_visual_with_html.py`
|
765 |
|
|
|
783 |
</details>
|
784 |
|
785 |
<details>
|
786 |
+
<summary> <b>Graph visualization with Neo4</b> </summary>
|
787 |
+
|
788 |
|
789 |
* The following code can be found in `examples/graph_visual_with_neo4j.py`
|
790 |
|
|
|
911 |
|
912 |
</details>
|
913 |
|
|
|
|
|
914 |
<details>
|
915 |
+
<summary> <b>Graphml 3d visualizer</b> </summary>
|
916 |
|
917 |
+
LightRag can be installed with Tools support to add extra tools like the graphml 3d visualizer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
918 |
|
919 |
+
[LightRag Visualizer](lightrag/tools/lightrag_visualizer/README.md)
|
|
|
920 |
|
|
|
|
|
|
|
|
|
921 |
</details>
|
922 |
|
923 |
## Evaluation
|
|
|
1161 |
```
|
1162 |
</details>
|
1163 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1164 |
## Star History
|
1165 |
|
1166 |
<a href="https://star-history.com/#HKUDS/LightRAG&Date">
|