Update README.md
Browse files- README-zh.md +152 -186
- README.md +150 -200
README-zh.md
CHANGED
@@ -260,6 +260,11 @@ class QueryParam:
|
|
260 |
If provided, this will be used instead of the global model function.
|
261 |
This allows using different models for different query modes.
|
262 |
"""
|
|
|
|
|
|
|
|
|
|
|
263 |
```
|
264 |
|
265 |
> top_k的默认值可以通过环境变量TOP_K更改。
|
@@ -527,128 +532,23 @@ response = rag.query(
|
|
527 |
)
|
528 |
```
|
529 |
|
530 |
-
###
|
531 |
|
532 |
-
|
533 |
|
534 |
```python
|
535 |
# 创建查询参数
|
536 |
query_param = QueryParam(
|
537 |
-
mode="hybrid", # 或其他模式:"local"、"global"、"hybrid"、"mix"和"naive"
|
|
|
538 |
)
|
539 |
|
540 |
-
#
|
541 |
response_default = rag.query(
|
542 |
-
"
|
543 |
param=query_param
|
544 |
)
|
545 |
print(response_default)
|
546 |
-
|
547 |
-
# 示例2:使用自定义提示
|
548 |
-
custom_prompt = """
|
549 |
-
您是环境科学领域的专家助手。请提供详细且结构化的答案,并附带示例。
|
550 |
-
---对话历史---
|
551 |
-
{history}
|
552 |
-
|
553 |
-
---知识库---
|
554 |
-
{context_data}
|
555 |
-
|
556 |
-
---响应规则---
|
557 |
-
|
558 |
-
- 目标格式和长度:{response_type}
|
559 |
-
"""
|
560 |
-
response_custom = rag.query(
|
561 |
-
"可再生能源的主要好处是什么?",
|
562 |
-
param=query_param,
|
563 |
-
system_prompt=custom_prompt # 传递自定义提示
|
564 |
-
)
|
565 |
-
print(response_custom)
|
566 |
-
```
|
567 |
-
|
568 |
-
### 关键词提取
|
569 |
-
|
570 |
-
我们引入了新函数`query_with_separate_keyword_extraction`来增强关键词提取功能。该函数将关键词提取过程与用户提示分开,专注于查询以提高提取关键词的相关性。
|
571 |
-
|
572 |
-
* 工作原理
|
573 |
-
|
574 |
-
该函数将输入分为两部分:
|
575 |
-
|
576 |
-
- `用户查询`
|
577 |
-
- `提示`
|
578 |
-
|
579 |
-
然后仅对`用户查询`执行关键词提取。这种分离确保提取过程是集中和相关的,不受`提示`中任何额外语言的影响。它还允许`提示`纯粹用于响应格式化,保持用户原始问题的意图和清晰度。
|
580 |
-
|
581 |
-
* 使用示例
|
582 |
-
|
583 |
-
这个`示例`展示了如何为教育内容定制函数,专注于为高年级学生提供详细解释。
|
584 |
-
|
585 |
-
```python
|
586 |
-
rag.query_with_separate_keyword_extraction(
|
587 |
-
query="解释重力定律",
|
588 |
-
prompt="提供适合学习物理的高中生的详细解释。",
|
589 |
-
param=QueryParam(mode="hybrid")
|
590 |
-
)
|
591 |
-
```
|
592 |
-
|
593 |
-
### 插入自定义知识
|
594 |
-
|
595 |
-
```python
|
596 |
-
custom_kg = {
|
597 |
-
"chunks": [
|
598 |
-
{
|
599 |
-
"content": "Alice和Bob正在合作进行量子计算研究。",
|
600 |
-
"source_id": "doc-1"
|
601 |
-
}
|
602 |
-
],
|
603 |
-
"entities": [
|
604 |
-
{
|
605 |
-
"entity_name": "Alice",
|
606 |
-
"entity_type": "person",
|
607 |
-
"description": "Alice是一位专门研究量子物理的研究员。",
|
608 |
-
"source_id": "doc-1"
|
609 |
-
},
|
610 |
-
{
|
611 |
-
"entity_name": "Bob",
|
612 |
-
"entity_type": "person",
|
613 |
-
"description": "Bob是一位数学家。",
|
614 |
-
"source_id": "doc-1"
|
615 |
-
},
|
616 |
-
{
|
617 |
-
"entity_name": "量子计算",
|
618 |
-
"entity_type": "technology",
|
619 |
-
"description": "量子计算利用量子力学现象进行计算。",
|
620 |
-
"source_id": "doc-1"
|
621 |
-
}
|
622 |
-
],
|
623 |
-
"relationships": [
|
624 |
-
{
|
625 |
-
"src_id": "Alice",
|
626 |
-
"tgt_id": "Bob",
|
627 |
-
"description": "Alice和Bob是研究伙伴。",
|
628 |
-
"keywords": "合作 研究",
|
629 |
-
"weight": 1.0,
|
630 |
-
"source_id": "doc-1"
|
631 |
-
},
|
632 |
-
{
|
633 |
-
"src_id": "Alice",
|
634 |
-
"tgt_id": "量子计算",
|
635 |
-
"description": "Alice进行量子计算研究。",
|
636 |
-
"keywords": "研究 专业",
|
637 |
-
"weight": 1.0,
|
638 |
-
"source_id": "doc-1"
|
639 |
-
},
|
640 |
-
{
|
641 |
-
"src_id": "Bob",
|
642 |
-
"tgt_id": "量子计算",
|
643 |
-
"description": "Bob研究量子计算。",
|
644 |
-
"keywords": "研究 应用",
|
645 |
-
"weight": 1.0,
|
646 |
-
"source_id": "doc-1"
|
647 |
-
}
|
648 |
-
]
|
649 |
-
}
|
650 |
-
|
651 |
-
rag.insert_custom_kg(custom_kg)
|
652 |
```
|
653 |
|
654 |
### 插入
|
@@ -934,23 +834,160 @@ updated_relation = rag.edit_relation("Google", "Google Mail", {
|
|
934 |
})
|
935 |
```
|
936 |
|
|
|
|
|
937 |
</details>
|
938 |
|
939 |
-
|
|
|
940 |
|
941 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
942 |
|
943 |
- **create_entity**:创建具有指定属性的新实体
|
944 |
- **edit_entity**:更新现有实体的属性或重命名它
|
945 |
|
946 |
-
#### 关系操作
|
947 |
-
|
948 |
- **create_relation**:在现有实体之间创建新关系
|
949 |
- **edit_relation**:更新现有关系的属性
|
950 |
|
951 |
这些操作在图数据库和向量数据库组件之间保持数据一致性,确保您的知识图谱保持连贯。
|
952 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
953 |
## Token统计功能
|
|
|
954 |
<details>
|
955 |
<summary> <b>概述和使用</b> </summary>
|
956 |
|
@@ -1048,77 +1085,6 @@ rag.export_data("complete_data.csv", include_vector_data=True)
|
|
1048 |
* 关系数据(实体之间的连接)
|
1049 |
* 来自向量数据库的关系信息
|
1050 |
|
1051 |
-
## 实体合并
|
1052 |
-
|
1053 |
-
<details>
|
1054 |
-
<summary> <b>合并实体及其关系</b> </summary>
|
1055 |
-
|
1056 |
-
LightRAG现在支持将多个实体合并为单个实体,自动处理所有关系:
|
1057 |
-
|
1058 |
-
```python
|
1059 |
-
# 基本实体合并
|
1060 |
-
rag.merge_entities(
|
1061 |
-
source_entities=["人工智能", "AI", "机器智能"],
|
1062 |
-
target_entity="AI技术"
|
1063 |
-
)
|
1064 |
-
```
|
1065 |
-
|
1066 |
-
使用自定义合并策略:
|
1067 |
-
|
1068 |
-
```python
|
1069 |
-
# 为不同字段定义自定义合并策略
|
1070 |
-
rag.merge_entities(
|
1071 |
-
source_entities=["约翰·史密斯", "史密斯博士", "J·史密斯"],
|
1072 |
-
target_entity="约翰·史密斯",
|
1073 |
-
merge_strategy={
|
1074 |
-
"description": "concatenate", # 组合所有描述
|
1075 |
-
"entity_type": "keep_first", # 保留第一个实体的类型
|
1076 |
-
"source_id": "join_unique" # 组合所有唯一的源ID
|
1077 |
-
}
|
1078 |
-
)
|
1079 |
-
```
|
1080 |
-
|
1081 |
-
使用自定义目标实体数据:
|
1082 |
-
|
1083 |
-
```python
|
1084 |
-
# 为合并后的实体指定确切值
|
1085 |
-
rag.merge_entities(
|
1086 |
-
source_entities=["纽约", "NYC", "大苹果"],
|
1087 |
-
target_entity="纽约市",
|
1088 |
-
target_entity_data={
|
1089 |
-
"entity_type": "LOCATION",
|
1090 |
-
"description": "纽约市是美国人口最多的城市。",
|
1091 |
-
}
|
1092 |
-
)
|
1093 |
-
```
|
1094 |
-
|
1095 |
-
结合两种方法的高级用法:
|
1096 |
-
|
1097 |
-
```python
|
1098 |
-
# 使用策略和自定义数据合并公司实体
|
1099 |
-
rag.merge_entities(
|
1100 |
-
source_entities=["微软公司", "Microsoft Corporation", "MSFT"],
|
1101 |
-
target_entity="微软",
|
1102 |
-
merge_strategy={
|
1103 |
-
"description": "concatenate", # 组合所有描述
|
1104 |
-
"source_id": "join_unique" # 组合源ID
|
1105 |
-
},
|
1106 |
-
target_entity_data={
|
1107 |
-
"entity_type": "ORGANIZATION",
|
1108 |
-
}
|
1109 |
-
)
|
1110 |
-
```
|
1111 |
-
|
1112 |
-
合并实体时:
|
1113 |
-
|
1114 |
-
* 所有来自源实体的关系都会重定向到目标实体
|
1115 |
-
* 重复的关系会被智能合并
|
1116 |
-
* 防止自我关系(循环)
|
1117 |
-
* 合并后删除源实体
|
1118 |
-
* 保留关系权重和属性
|
1119 |
-
|
1120 |
-
</details>
|
1121 |
-
|
1122 |
## 缓存
|
1123 |
|
1124 |
<details>
|
|
|
260 |
If provided, this will be used instead of the global model function.
|
261 |
This allows using different models for different query modes.
|
262 |
"""
|
263 |
+
|
264 |
+
user_prompt: str | None = None
|
265 |
+
"""User-provided prompt for the query.
|
266 |
+
If proivded, this will be use instead of the default vaulue from prompt template.
|
267 |
+
"""
|
268 |
```
|
269 |
|
270 |
> top_k的默认值可以通过环境变量TOP_K更改。
|
|
|
532 |
)
|
533 |
```
|
534 |
|
535 |
+
### 自定义用户提示词
|
536 |
|
537 |
+
自定义用户提示词不影响查询内容,仅仅用于向LLM指示如何处理查询结果。以下是使用方法:
|
538 |
|
539 |
```python
|
540 |
# 创建查询参数
|
541 |
query_param = QueryParam(
|
542 |
+
mode = "hybrid", # 或其他模式:"local"、"global"、"hybrid"、"mix"和"naive"
|
543 |
+
user_prompt = "Please create the diagram using the Mermaid syntax"
|
544 |
)
|
545 |
|
546 |
+
# 查询和处理
|
547 |
response_default = rag.query(
|
548 |
+
"Please draw a character relationship diagram for Scrooge",
|
549 |
param=query_param
|
550 |
)
|
551 |
print(response_default)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
552 |
```
|
553 |
|
554 |
### 插入
|
|
|
834 |
})
|
835 |
```
|
836 |
|
837 |
+
所有操作都有同步和异步版本。异步版本带有前缀"a"(例如,`acreate_entity`,`aedit_relation`)。
|
838 |
+
|
839 |
</details>
|
840 |
|
841 |
+
<details>
|
842 |
+
<summary> <b>插入自定义知识</b> </summary>
|
843 |
|
844 |
+
```python
|
845 |
+
custom_kg = {
|
846 |
+
"chunks": [
|
847 |
+
{
|
848 |
+
"content": "Alice和Bob正在合作进行量子计算研究。",
|
849 |
+
"source_id": "doc-1"
|
850 |
+
}
|
851 |
+
],
|
852 |
+
"entities": [
|
853 |
+
{
|
854 |
+
"entity_name": "Alice",
|
855 |
+
"entity_type": "person",
|
856 |
+
"description": "Alice是一位专门研究量子物理的研究员。",
|
857 |
+
"source_id": "doc-1"
|
858 |
+
},
|
859 |
+
{
|
860 |
+
"entity_name": "Bob",
|
861 |
+
"entity_type": "person",
|
862 |
+
"description": "Bob是一位数学家。",
|
863 |
+
"source_id": "doc-1"
|
864 |
+
},
|
865 |
+
{
|
866 |
+
"entity_name": "量子计算",
|
867 |
+
"entity_type": "technology",
|
868 |
+
"description": "量子计算利用量子力学现象进行计算。",
|
869 |
+
"source_id": "doc-1"
|
870 |
+
}
|
871 |
+
],
|
872 |
+
"relationships": [
|
873 |
+
{
|
874 |
+
"src_id": "Alice",
|
875 |
+
"tgt_id": "Bob",
|
876 |
+
"description": "Alice和Bob是研究伙伴。",
|
877 |
+
"keywords": "合作 研究",
|
878 |
+
"weight": 1.0,
|
879 |
+
"source_id": "doc-1"
|
880 |
+
},
|
881 |
+
{
|
882 |
+
"src_id": "Alice",
|
883 |
+
"tgt_id": "量子计算",
|
884 |
+
"description": "Alice进行量子计算研究。",
|
885 |
+
"keywords": "研究 专业",
|
886 |
+
"weight": 1.0,
|
887 |
+
"source_id": "doc-1"
|
888 |
+
},
|
889 |
+
{
|
890 |
+
"src_id": "Bob",
|
891 |
+
"tgt_id": "量子计算",
|
892 |
+
"description": "Bob研究量子计算。",
|
893 |
+
"keywords": "研究 应用",
|
894 |
+
"weight": 1.0,
|
895 |
+
"source_id": "doc-1"
|
896 |
+
}
|
897 |
+
]
|
898 |
+
}
|
899 |
+
|
900 |
+
rag.insert_custom_kg(custom_kg)
|
901 |
+
```
|
902 |
+
|
903 |
+
</details>
|
904 |
+
|
905 |
+
<details>
|
906 |
+
<summary> <b>其它实体与关系操作</b> </summary>
|
907 |
|
908 |
- **create_entity**:创建具有指定属性的新实体
|
909 |
- **edit_entity**:更新现有实体的属性或重命名它
|
910 |
|
|
|
|
|
911 |
- **create_relation**:在现有实体之间创建新关系
|
912 |
- **edit_relation**:更新现有关系的属性
|
913 |
|
914 |
这些操作在图数据库和向量数据库组件之间保持数据一致性,确保您的知识图谱保持连贯。
|
915 |
|
916 |
+
</details>
|
917 |
+
|
918 |
+
## 实体合并
|
919 |
+
|
920 |
+
<details>
|
921 |
+
<summary> <b>合并实体及其关系</b> </summary>
|
922 |
+
|
923 |
+
LightRAG现在支持将多个实体合并为单个实体,自动处理所有关系:
|
924 |
+
|
925 |
+
```python
|
926 |
+
# 基本实体合并
|
927 |
+
rag.merge_entities(
|
928 |
+
source_entities=["人工智能", "AI", "机器智能"],
|
929 |
+
target_entity="AI技术"
|
930 |
+
)
|
931 |
+
```
|
932 |
+
|
933 |
+
使用自定义合并策略:
|
934 |
+
|
935 |
+
```python
|
936 |
+
# 为不同字段定义自定义合并策略
|
937 |
+
rag.merge_entities(
|
938 |
+
source_entities=["约翰·史密斯", "史密斯博士", "J·史密斯"],
|
939 |
+
target_entity="约翰·史密斯",
|
940 |
+
merge_strategy={
|
941 |
+
"description": "concatenate", # 组合所有描述
|
942 |
+
"entity_type": "keep_first", # 保留第一个实体的类型
|
943 |
+
"source_id": "join_unique" # 组合所有唯一的源ID
|
944 |
+
}
|
945 |
+
)
|
946 |
+
```
|
947 |
+
|
948 |
+
使用自定义目标实体数据:
|
949 |
+
|
950 |
+
```python
|
951 |
+
# 为合并后的实体指定确切值
|
952 |
+
rag.merge_entities(
|
953 |
+
source_entities=["纽约", "NYC", "大苹果"],
|
954 |
+
target_entity="纽约市",
|
955 |
+
target_entity_data={
|
956 |
+
"entity_type": "LOCATION",
|
957 |
+
"description": "纽约市是美国人口最多的城市。",
|
958 |
+
}
|
959 |
+
)
|
960 |
+
```
|
961 |
+
|
962 |
+
结合两种方法的高级用法:
|
963 |
+
|
964 |
+
```python
|
965 |
+
# 使用策略和自定义数据合并公司实体
|
966 |
+
rag.merge_entities(
|
967 |
+
source_entities=["微软公司", "Microsoft Corporation", "MSFT"],
|
968 |
+
target_entity="微软",
|
969 |
+
merge_strategy={
|
970 |
+
"description": "concatenate", # 组合所有描述
|
971 |
+
"source_id": "join_unique" # 组合源ID
|
972 |
+
},
|
973 |
+
target_entity_data={
|
974 |
+
"entity_type": "ORGANIZATION",
|
975 |
+
}
|
976 |
+
)
|
977 |
+
```
|
978 |
+
|
979 |
+
合并实体时:
|
980 |
+
|
981 |
+
* 所有来自源实体的关系都会重定向到目标实体
|
982 |
+
* 重复的关系会被智能合并
|
983 |
+
* 防止自我关系(循环)
|
984 |
+
* 合并后删除源实体
|
985 |
+
* 保留关系权重和属性
|
986 |
+
|
987 |
+
</details>
|
988 |
+
|
989 |
## Token统计功能
|
990 |
+
|
991 |
<details>
|
992 |
<summary> <b>概述和使用</b> </summary>
|
993 |
|
|
|
1085 |
* 关系数据(实体之间的连接)
|
1086 |
* 来自向量数据库的关系信息
|
1087 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1088 |
## 缓存
|
1089 |
|
1090 |
<details>
|
README.md
CHANGED
@@ -274,12 +274,6 @@ class QueryParam:
|
|
274 |
max_token_for_local_context: int = int(os.getenv("MAX_TOKEN_ENTITY_DESC", "4000"))
|
275 |
"""Maximum number of tokens allocated for entity descriptions in local retrieval."""
|
276 |
|
277 |
-
hl_keywords: list[str] = field(default_factory=list)
|
278 |
-
"""List of high-level keywords to prioritize in retrieval."""
|
279 |
-
|
280 |
-
ll_keywords: list[str] = field(default_factory=list)
|
281 |
-
"""List of low-level keywords to refine retrieval focus."""
|
282 |
-
|
283 |
conversation_history: list[dict[str, str]] = field(default_factory=list)
|
284 |
"""Stores past conversation history to maintain context.
|
285 |
Format: [{"role": "user/assistant", "content": "message"}].
|
@@ -296,6 +290,11 @@ class QueryParam:
|
|
296 |
If provided, this will be used instead of the global model function.
|
297 |
This allows using different models for different query modes.
|
298 |
"""
|
|
|
|
|
|
|
|
|
|
|
299 |
```
|
300 |
|
301 |
> default value of Top_k can be change by environment variables TOP_K.
|
@@ -571,76 +570,26 @@ response = rag.query(
|
|
571 |
|
572 |
</details>
|
573 |
|
574 |
-
### Custom Prompt Support
|
575 |
-
|
576 |
-
LightRAG now supports custom prompts for fine-tuned control over the system's behavior. Here's how to use it:
|
577 |
|
578 |
-
|
579 |
-
<summary> <b> Usage Example </b></summary>
|
580 |
|
581 |
```python
|
582 |
# Create query parameters
|
583 |
query_param = QueryParam(
|
584 |
-
mode="hybrid", #
|
|
|
585 |
)
|
586 |
|
587 |
-
#
|
588 |
response_default = rag.query(
|
589 |
-
"
|
590 |
param=query_param
|
591 |
)
|
592 |
print(response_default)
|
593 |
-
|
594 |
-
# Example 2: Using a custom prompt
|
595 |
-
custom_prompt = """
|
596 |
-
You are an expert assistant in environmental science. Provide detailed and structured answers with examples.
|
597 |
-
---Conversation History---
|
598 |
-
{history}
|
599 |
-
|
600 |
-
---Knowledge Base---
|
601 |
-
{context_data}
|
602 |
-
|
603 |
-
---Response Rules---
|
604 |
-
|
605 |
-
- Target format and length: {response_type}
|
606 |
-
"""
|
607 |
-
response_custom = rag.query(
|
608 |
-
"What are the primary benefits of renewable energy?",
|
609 |
-
param=query_param,
|
610 |
-
system_prompt=custom_prompt # Pass the custom prompt
|
611 |
-
)
|
612 |
-
print(response_custom)
|
613 |
```
|
614 |
|
615 |
-
</details>
|
616 |
-
|
617 |
-
### Separate Keyword Extraction
|
618 |
-
|
619 |
-
We've introduced a new function `query_with_separate_keyword_extraction` to enhance the keyword extraction capabilities. This function separates the keyword extraction process from the user's prompt, focusing solely on the query to improve the relevance of extracted keywords.
|
620 |
-
|
621 |
-
**How It Works?**
|
622 |
-
|
623 |
-
The function operates by dividing the input into two parts:
|
624 |
-
|
625 |
-
- `User Query`
|
626 |
-
- `Prompt`
|
627 |
-
|
628 |
-
It then performs keyword extraction exclusively on the `user query`. This separation ensures that the extraction process is focused and relevant, unaffected by any additional language in the `prompt`. It also allows the `prompt` to serve purely for response formatting, maintaining the intent and clarity of the user's original question.
|
629 |
|
630 |
-
<details>
|
631 |
-
<summary> <b> Usage Example </b></summary>
|
632 |
-
|
633 |
-
This `example` shows how to tailor the function for educational content, focusing on detailed explanations for older students.
|
634 |
-
|
635 |
-
```python
|
636 |
-
rag.query_with_separate_keyword_extraction(
|
637 |
-
query="Explain the law of gravity",
|
638 |
-
prompt="Provide a detailed explanation suitable for high school students studying physics.",
|
639 |
-
param=QueryParam(mode="hybrid")
|
640 |
-
)
|
641 |
-
```
|
642 |
-
|
643 |
-
</details>
|
644 |
|
645 |
### Insert
|
646 |
|
@@ -725,70 +674,6 @@ rag.insert(text_content.decode('utf-8'))
|
|
725 |
|
726 |
</details>
|
727 |
|
728 |
-
<details>
|
729 |
-
<summary> <b> Insert Custom KG </b></summary>
|
730 |
-
|
731 |
-
```python
|
732 |
-
custom_kg = {
|
733 |
-
"chunks": [
|
734 |
-
{
|
735 |
-
"content": "Alice and Bob are collaborating on quantum computing research.",
|
736 |
-
"source_id": "doc-1"
|
737 |
-
}
|
738 |
-
],
|
739 |
-
"entities": [
|
740 |
-
{
|
741 |
-
"entity_name": "Alice",
|
742 |
-
"entity_type": "person",
|
743 |
-
"description": "Alice is a researcher specializing in quantum physics.",
|
744 |
-
"source_id": "doc-1"
|
745 |
-
},
|
746 |
-
{
|
747 |
-
"entity_name": "Bob",
|
748 |
-
"entity_type": "person",
|
749 |
-
"description": "Bob is a mathematician.",
|
750 |
-
"source_id": "doc-1"
|
751 |
-
},
|
752 |
-
{
|
753 |
-
"entity_name": "Quantum Computing",
|
754 |
-
"entity_type": "technology",
|
755 |
-
"description": "Quantum computing utilizes quantum mechanical phenomena for computation.",
|
756 |
-
"source_id": "doc-1"
|
757 |
-
}
|
758 |
-
],
|
759 |
-
"relationships": [
|
760 |
-
{
|
761 |
-
"src_id": "Alice",
|
762 |
-
"tgt_id": "Bob",
|
763 |
-
"description": "Alice and Bob are research partners.",
|
764 |
-
"keywords": "collaboration research",
|
765 |
-
"weight": 1.0,
|
766 |
-
"source_id": "doc-1"
|
767 |
-
},
|
768 |
-
{
|
769 |
-
"src_id": "Alice",
|
770 |
-
"tgt_id": "Quantum Computing",
|
771 |
-
"description": "Alice conducts research on quantum computing.",
|
772 |
-
"keywords": "research expertise",
|
773 |
-
"weight": 1.0,
|
774 |
-
"source_id": "doc-1"
|
775 |
-
},
|
776 |
-
{
|
777 |
-
"src_id": "Bob",
|
778 |
-
"tgt_id": "Quantum Computing",
|
779 |
-
"description": "Bob researches quantum computing.",
|
780 |
-
"keywords": "research application",
|
781 |
-
"weight": 1.0,
|
782 |
-
"source_id": "doc-1"
|
783 |
-
}
|
784 |
-
]
|
785 |
-
}
|
786 |
-
|
787 |
-
rag.insert_custom_kg(custom_kg)
|
788 |
-
```
|
789 |
-
|
790 |
-
</details>
|
791 |
-
|
792 |
<details>
|
793 |
<summary><b>Citation Functionality</b></summary>
|
794 |
|
@@ -992,12 +877,78 @@ updated_relation = rag.edit_relation("Google", "Google Mail", {
|
|
992 |
|
993 |
All operations are available in both synchronous and asynchronous versions. The asynchronous versions have the prefix "a" (e.g., `acreate_entity`, `aedit_relation`).
|
994 |
|
995 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
996 |
|
997 |
- **create_entity**: Creates a new entity with specified attributes
|
998 |
- **edit_entity**: Updates an existing entity's attributes or renames it
|
999 |
|
1000 |
-
#### Relation Operations
|
1001 |
|
1002 |
- **create_relation**: Creates a new relation between existing entities
|
1003 |
- **edit_relation**: Updates an existing relation's attributes
|
@@ -1006,6 +957,77 @@ These operations maintain data consistency across both the graph database and ve
|
|
1006 |
|
1007 |
</details>
|
1008 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1009 |
## Token Usage Tracking
|
1010 |
|
1011 |
<details>
|
@@ -1112,78 +1134,6 @@ All exports include:
|
|
1112 |
* Relation data (connections between entities)
|
1113 |
* Relationship information from vector database
|
1114 |
|
1115 |
-
|
1116 |
-
## Entity Merging
|
1117 |
-
|
1118 |
-
<details>
|
1119 |
-
<summary> <b>Merge Entities and Their Relationships</b> </summary>
|
1120 |
-
|
1121 |
-
LightRAG now supports merging multiple entities into a single entity, automatically handling all relationships:
|
1122 |
-
|
1123 |
-
```python
|
1124 |
-
# Basic entity merging
|
1125 |
-
rag.merge_entities(
|
1126 |
-
source_entities=["Artificial Intelligence", "AI", "Machine Intelligence"],
|
1127 |
-
target_entity="AI Technology"
|
1128 |
-
)
|
1129 |
-
```
|
1130 |
-
|
1131 |
-
With custom merge strategy:
|
1132 |
-
|
1133 |
-
```python
|
1134 |
-
# Define custom merge strategy for different fields
|
1135 |
-
rag.merge_entities(
|
1136 |
-
source_entities=["John Smith", "Dr. Smith", "J. Smith"],
|
1137 |
-
target_entity="John Smith",
|
1138 |
-
merge_strategy={
|
1139 |
-
"description": "concatenate", # Combine all descriptions
|
1140 |
-
"entity_type": "keep_first", # Keep the entity type from the first entity
|
1141 |
-
"source_id": "join_unique" # Combine all unique source IDs
|
1142 |
-
}
|
1143 |
-
)
|
1144 |
-
```
|
1145 |
-
|
1146 |
-
With custom target entity data:
|
1147 |
-
|
1148 |
-
```python
|
1149 |
-
# Specify exact values for the merged entity
|
1150 |
-
rag.merge_entities(
|
1151 |
-
source_entities=["New York", "NYC", "Big Apple"],
|
1152 |
-
target_entity="New York City",
|
1153 |
-
target_entity_data={
|
1154 |
-
"entity_type": "LOCATION",
|
1155 |
-
"description": "New York City is the most populous city in the United States.",
|
1156 |
-
}
|
1157 |
-
)
|
1158 |
-
```
|
1159 |
-
|
1160 |
-
Advanced usage combining both approaches:
|
1161 |
-
|
1162 |
-
```python
|
1163 |
-
# Merge company entities with both strategy and custom data
|
1164 |
-
rag.merge_entities(
|
1165 |
-
source_entities=["Microsoft Corp", "Microsoft Corporation", "MSFT"],
|
1166 |
-
target_entity="Microsoft",
|
1167 |
-
merge_strategy={
|
1168 |
-
"description": "concatenate", # Combine all descriptions
|
1169 |
-
"source_id": "join_unique" # Combine source IDs
|
1170 |
-
},
|
1171 |
-
target_entity_data={
|
1172 |
-
"entity_type": "ORGANIZATION",
|
1173 |
-
}
|
1174 |
-
)
|
1175 |
-
```
|
1176 |
-
|
1177 |
-
When merging entities:
|
1178 |
-
|
1179 |
-
* All relationships from source entities are redirected to the target entity
|
1180 |
-
* Duplicate relationships are intelligently merged
|
1181 |
-
* Self-relationships (loops) are prevented
|
1182 |
-
* Source entities are removed after merging
|
1183 |
-
* Relationship weights and attributes are preserved
|
1184 |
-
|
1185 |
-
</details>
|
1186 |
-
|
1187 |
## Cache
|
1188 |
|
1189 |
<details>
|
|
|
274 |
max_token_for_local_context: int = int(os.getenv("MAX_TOKEN_ENTITY_DESC", "4000"))
|
275 |
"""Maximum number of tokens allocated for entity descriptions in local retrieval."""
|
276 |
|
|
|
|
|
|
|
|
|
|
|
|
|
277 |
conversation_history: list[dict[str, str]] = field(default_factory=list)
|
278 |
"""Stores past conversation history to maintain context.
|
279 |
Format: [{"role": "user/assistant", "content": "message"}].
|
|
|
290 |
If provided, this will be used instead of the global model function.
|
291 |
This allows using different models for different query modes.
|
292 |
"""
|
293 |
+
|
294 |
+
user_prompt: str | None = None
|
295 |
+
"""User-provided prompt for the query.
|
296 |
+
If proivded, this will be use instead of the default vaulue from prompt template.
|
297 |
+
"""
|
298 |
```
|
299 |
|
300 |
> default value of Top_k can be change by environment variables TOP_K.
|
|
|
570 |
|
571 |
</details>
|
572 |
|
573 |
+
### Custom User Prompt Support
|
|
|
|
|
574 |
|
575 |
+
Custom user prompts do not affect the query content; they are only used to instruct the LLM on how to handle the query results. Here's how to use it:
|
|
|
576 |
|
577 |
```python
|
578 |
# Create query parameters
|
579 |
query_param = QueryParam(
|
580 |
+
mode = "hybrid", # 或其他模式:"local"、"global"、"hybrid"、"mix"和"naive"
|
581 |
+
user_prompt = "Please create the diagram using the Mermaid syntax"
|
582 |
)
|
583 |
|
584 |
+
# Query and process
|
585 |
response_default = rag.query(
|
586 |
+
"Please draw a character relationship diagram for Scrooge",
|
587 |
param=query_param
|
588 |
)
|
589 |
print(response_default)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
590 |
```
|
591 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
592 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
593 |
|
594 |
### Insert
|
595 |
|
|
|
674 |
|
675 |
</details>
|
676 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
677 |
<details>
|
678 |
<summary><b>Citation Functionality</b></summary>
|
679 |
|
|
|
877 |
|
878 |
All operations are available in both synchronous and asynchronous versions. The asynchronous versions have the prefix "a" (e.g., `acreate_entity`, `aedit_relation`).
|
879 |
|
880 |
+
</details>
|
881 |
+
|
882 |
+
<details>
|
883 |
+
<summary> <b> Insert Custom KG </b></summary>
|
884 |
+
|
885 |
+
```python
|
886 |
+
custom_kg = {
|
887 |
+
"chunks": [
|
888 |
+
{
|
889 |
+
"content": "Alice and Bob are collaborating on quantum computing research.",
|
890 |
+
"source_id": "doc-1"
|
891 |
+
}
|
892 |
+
],
|
893 |
+
"entities": [
|
894 |
+
{
|
895 |
+
"entity_name": "Alice",
|
896 |
+
"entity_type": "person",
|
897 |
+
"description": "Alice is a researcher specializing in quantum physics.",
|
898 |
+
"source_id": "doc-1"
|
899 |
+
},
|
900 |
+
{
|
901 |
+
"entity_name": "Bob",
|
902 |
+
"entity_type": "person",
|
903 |
+
"description": "Bob is a mathematician.",
|
904 |
+
"source_id": "doc-1"
|
905 |
+
},
|
906 |
+
{
|
907 |
+
"entity_name": "Quantum Computing",
|
908 |
+
"entity_type": "technology",
|
909 |
+
"description": "Quantum computing utilizes quantum mechanical phenomena for computation.",
|
910 |
+
"source_id": "doc-1"
|
911 |
+
}
|
912 |
+
],
|
913 |
+
"relationships": [
|
914 |
+
{
|
915 |
+
"src_id": "Alice",
|
916 |
+
"tgt_id": "Bob",
|
917 |
+
"description": "Alice and Bob are research partners.",
|
918 |
+
"keywords": "collaboration research",
|
919 |
+
"weight": 1.0,
|
920 |
+
"source_id": "doc-1"
|
921 |
+
},
|
922 |
+
{
|
923 |
+
"src_id": "Alice",
|
924 |
+
"tgt_id": "Quantum Computing",
|
925 |
+
"description": "Alice conducts research on quantum computing.",
|
926 |
+
"keywords": "research expertise",
|
927 |
+
"weight": 1.0,
|
928 |
+
"source_id": "doc-1"
|
929 |
+
},
|
930 |
+
{
|
931 |
+
"src_id": "Bob",
|
932 |
+
"tgt_id": "Quantum Computing",
|
933 |
+
"description": "Bob researches quantum computing.",
|
934 |
+
"keywords": "research application",
|
935 |
+
"weight": 1.0,
|
936 |
+
"source_id": "doc-1"
|
937 |
+
}
|
938 |
+
]
|
939 |
+
}
|
940 |
+
|
941 |
+
rag.insert_custom_kg(custom_kg)
|
942 |
+
```
|
943 |
+
|
944 |
+
</details>
|
945 |
+
|
946 |
+
<details>
|
947 |
+
<summary> <b>Other Entity and Relation Operations</b></summary>
|
948 |
|
949 |
- **create_entity**: Creates a new entity with specified attributes
|
950 |
- **edit_entity**: Updates an existing entity's attributes or renames it
|
951 |
|
|
|
952 |
|
953 |
- **create_relation**: Creates a new relation between existing entities
|
954 |
- **edit_relation**: Updates an existing relation's attributes
|
|
|
957 |
|
958 |
</details>
|
959 |
|
960 |
+
## Entity Merging
|
961 |
+
|
962 |
+
<details>
|
963 |
+
<summary> <b>Merge Entities and Their Relationships</b> </summary>
|
964 |
+
|
965 |
+
LightRAG now supports merging multiple entities into a single entity, automatically handling all relationships:
|
966 |
+
|
967 |
+
```python
|
968 |
+
# Basic entity merging
|
969 |
+
rag.merge_entities(
|
970 |
+
source_entities=["Artificial Intelligence", "AI", "Machine Intelligence"],
|
971 |
+
target_entity="AI Technology"
|
972 |
+
)
|
973 |
+
```
|
974 |
+
|
975 |
+
With custom merge strategy:
|
976 |
+
|
977 |
+
```python
|
978 |
+
# Define custom merge strategy for different fields
|
979 |
+
rag.merge_entities(
|
980 |
+
source_entities=["John Smith", "Dr. Smith", "J. Smith"],
|
981 |
+
target_entity="John Smith",
|
982 |
+
merge_strategy={
|
983 |
+
"description": "concatenate", # Combine all descriptions
|
984 |
+
"entity_type": "keep_first", # Keep the entity type from the first entity
|
985 |
+
"source_id": "join_unique" # Combine all unique source IDs
|
986 |
+
}
|
987 |
+
)
|
988 |
+
```
|
989 |
+
|
990 |
+
With custom target entity data:
|
991 |
+
|
992 |
+
```python
|
993 |
+
# Specify exact values for the merged entity
|
994 |
+
rag.merge_entities(
|
995 |
+
source_entities=["New York", "NYC", "Big Apple"],
|
996 |
+
target_entity="New York City",
|
997 |
+
target_entity_data={
|
998 |
+
"entity_type": "LOCATION",
|
999 |
+
"description": "New York City is the most populous city in the United States.",
|
1000 |
+
}
|
1001 |
+
)
|
1002 |
+
```
|
1003 |
+
|
1004 |
+
Advanced usage combining both approaches:
|
1005 |
+
|
1006 |
+
```python
|
1007 |
+
# Merge company entities with both strategy and custom data
|
1008 |
+
rag.merge_entities(
|
1009 |
+
source_entities=["Microsoft Corp", "Microsoft Corporation", "MSFT"],
|
1010 |
+
target_entity="Microsoft",
|
1011 |
+
merge_strategy={
|
1012 |
+
"description": "concatenate", # Combine all descriptions
|
1013 |
+
"source_id": "join_unique" # Combine source IDs
|
1014 |
+
},
|
1015 |
+
target_entity_data={
|
1016 |
+
"entity_type": "ORGANIZATION",
|
1017 |
+
}
|
1018 |
+
)
|
1019 |
+
```
|
1020 |
+
|
1021 |
+
When merging entities:
|
1022 |
+
|
1023 |
+
* All relationships from source entities are redirected to the target entity
|
1024 |
+
* Duplicate relationships are intelligently merged
|
1025 |
+
* Self-relationships (loops) are prevented
|
1026 |
+
* Source entities are removed after merging
|
1027 |
+
* Relationship weights and attributes are preserved
|
1028 |
+
|
1029 |
+
</details>
|
1030 |
+
|
1031 |
## Token Usage Tracking
|
1032 |
|
1033 |
<details>
|
|
|
1134 |
* Relation data (connections between entities)
|
1135 |
* Relationship information from vector database
|
1136 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1137 |
## Cache
|
1138 |
|
1139 |
<details>
|