zrguo
commited on
Commit
·
10cec3e
1
Parent(s):
ed486c9
Update
Browse files- README-zh.md +88 -0
- README.md +154 -0
- lightrag/base.py +11 -11
- lightrag/kg/json_kv_impl.py +46 -46
- lightrag/kg/postgres_impl.py +44 -5
- lightrag/lightrag.py +11 -11
README-zh.md
CHANGED
@@ -932,6 +932,94 @@ rag.insert_custom_kg(custom_kg)
|
|
932 |
|
933 |
</details>
|
934 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
935 |
## 实体合并
|
936 |
|
937 |
<details>
|
|
|
932 |
|
933 |
</details>
|
934 |
|
935 |
+
## 删除功能
|
936 |
+
|
937 |
+
LightRAG提供了全面的删除功能,允许您删除文档、实体和关系。
|
938 |
+
|
939 |
+
<details>
|
940 |
+
<summary> <b>删除实体</b> </summary>
|
941 |
+
|
942 |
+
您可以通过实体名称删除实体及其所有关联关系:
|
943 |
+
|
944 |
+
```python
|
945 |
+
# 删除实体及其所有关系(同步版本)
|
946 |
+
rag.delete_by_entity("Google")
|
947 |
+
|
948 |
+
# 异步版本
|
949 |
+
await rag.adelete_by_entity("Google")
|
950 |
+
```
|
951 |
+
|
952 |
+
删除实体时会:
|
953 |
+
- 从知识图谱中移除该实体节点
|
954 |
+
- 删除该实体的所有关联关系
|
955 |
+
- 从向量数据库中移除相关的嵌入向量
|
956 |
+
- 保持知识图谱的完整性
|
957 |
+
|
958 |
+
</details>
|
959 |
+
|
960 |
+
<details>
|
961 |
+
<summary> <b>删除关系</b> </summary>
|
962 |
+
|
963 |
+
您可以删除两个特定实体之间的关系:
|
964 |
+
|
965 |
+
```python
|
966 |
+
# 删除两个实体之间的关系(同步版本)
|
967 |
+
rag.delete_by_relation("Google", "Gmail")
|
968 |
+
|
969 |
+
# 异步版本
|
970 |
+
await rag.adelete_by_relation("Google", "Gmail")
|
971 |
+
```
|
972 |
+
|
973 |
+
删除关系时会:
|
974 |
+
- 移除指定的关系边
|
975 |
+
- 从向量数据库中删除关系的嵌入向量
|
976 |
+
- 保留两个实体节点及其他关系
|
977 |
+
|
978 |
+
</details>
|
979 |
+
|
980 |
+
<details>
|
981 |
+
<summary> <b>通过文档ID删除</b> </summary>
|
982 |
+
|
983 |
+
您可以通过文档ID删除整个文档及其相关的所有知识:
|
984 |
+
|
985 |
+
```python
|
986 |
+
# 通过文档ID删除(异步版本)
|
987 |
+
await rag.adelete_by_doc_id("doc-12345")
|
988 |
+
```
|
989 |
+
|
990 |
+
通过文档ID删除时的优化处理:
|
991 |
+
- **智能清理**:自动识别并删除仅属于该文档的实体和关系
|
992 |
+
- **保留共享知识**:如果实体或关系在其他文档中也存在,则会保留并重新构建描述
|
993 |
+
- **缓存优化**:清理相关的LLM缓存以减少存储开销
|
994 |
+
- **增量重建**:从剩余文档重新构建受影响的实体和关系描述
|
995 |
+
|
996 |
+
删除过程包括:
|
997 |
+
1. 删除文档相关的所有文本块
|
998 |
+
2. 识别仅属于该文档的实体和关系并删除
|
999 |
+
3. 重新构建在其他文档中仍存在的实体和关系
|
1000 |
+
4. 更新所有相关的向量索引
|
1001 |
+
5. 清理文档状态记录
|
1002 |
+
|
1003 |
+
注意:通过文档ID删除是一个异步操作,因为它涉及复杂的知识图谱重构过程。
|
1004 |
+
|
1005 |
+
</details>
|
1006 |
+
|
1007 |
+
<details>
|
1008 |
+
<summary> <b>删除注意事项</b> </summary>
|
1009 |
+
|
1010 |
+
**重要提醒:**
|
1011 |
+
|
1012 |
+
1. **不可逆操作**:所有删除操作都是不可逆的,请谨慎使用
|
1013 |
+
2. **性能考虑**:删除大量数据时可能需要一些时间,特别是通过文档ID删除
|
1014 |
+
3. **数据一致性**:删除操作会自动维护知识图谱和向量数据库之间的一致性
|
1015 |
+
4. **备份建议**:在执行重要删除操作前建议备份数据
|
1016 |
+
|
1017 |
+
**批量删除建议:**
|
1018 |
+
- 对于批量删除操作,建议使用异步方法以获得更好的性能
|
1019 |
+
- 大规模删除时,考虑分批进行以避免系统负载过高
|
1020 |
+
|
1021 |
+
</details>
|
1022 |
+
|
1023 |
## 实体合并
|
1024 |
|
1025 |
<details>
|
README.md
CHANGED
@@ -988,6 +988,160 @@ These operations maintain data consistency across both the graph database and ve
|
|
988 |
|
989 |
</details>
|
990 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
991 |
## Entity Merging
|
992 |
|
993 |
<details>
|
|
|
988 |
|
989 |
</details>
|
990 |
|
991 |
+
## Delete Functions
|
992 |
+
|
993 |
+
LightRAG provides comprehensive deletion capabilities, allowing you to delete documents, entities, and relationships.
|
994 |
+
|
995 |
+
<details>
|
996 |
+
<summary> <b>Delete Entities</b> </summary>
|
997 |
+
|
998 |
+
You can delete entities by their name along with all associated relationships:
|
999 |
+
|
1000 |
+
```python
|
1001 |
+
# Delete entity and all its relationships (synchronous version)
|
1002 |
+
rag.delete_by_entity("Google")
|
1003 |
+
|
1004 |
+
# Asynchronous version
|
1005 |
+
await rag.adelete_by_entity("Google")
|
1006 |
+
```
|
1007 |
+
|
1008 |
+
When deleting an entity:
|
1009 |
+
- Removes the entity node from the knowledge graph
|
1010 |
+
- Deletes all associated relationships
|
1011 |
+
- Removes related embedding vectors from the vector database
|
1012 |
+
- Maintains knowledge graph integrity
|
1013 |
+
|
1014 |
+
</details>
|
1015 |
+
|
1016 |
+
<details>
|
1017 |
+
<summary> <b>Delete Relations</b> </summary>
|
1018 |
+
|
1019 |
+
You can delete relationships between two specific entities:
|
1020 |
+
|
1021 |
+
```python
|
1022 |
+
# Delete relationship between two entities (synchronous version)
|
1023 |
+
rag.delete_by_relation("Google", "Gmail")
|
1024 |
+
|
1025 |
+
# Asynchronous version
|
1026 |
+
await rag.adelete_by_relation("Google", "Gmail")
|
1027 |
+
```
|
1028 |
+
|
1029 |
+
When deleting a relationship:
|
1030 |
+
- Removes the specified relationship edge
|
1031 |
+
- Deletes the relationship's embedding vector from the vector database
|
1032 |
+
- Preserves both entity nodes and their other relationships
|
1033 |
+
|
1034 |
+
</details>
|
1035 |
+
|
1036 |
+
<details>
|
1037 |
+
<summary> <b>Delete by Document ID</b> </summary>
|
1038 |
+
|
1039 |
+
You can delete an entire document and all its related knowledge through document ID:
|
1040 |
+
|
1041 |
+
```python
|
1042 |
+
# Delete by document ID (asynchronous version)
|
1043 |
+
await rag.adelete_by_doc_id("doc-12345")
|
1044 |
+
```
|
1045 |
+
|
1046 |
+
Optimized processing when deleting by document ID:
|
1047 |
+
- **Smart Cleanup**: Automatically identifies and removes entities and relationships that belong only to this document
|
1048 |
+
- **Preserve Shared Knowledge**: If entities or relationships exist in other documents, they are preserved and their descriptions are rebuilt
|
1049 |
+
- **Cache Optimization**: Clears related LLM cache to reduce storage overhead
|
1050 |
+
- **Incremental Rebuilding**: Reconstructs affected entity and relationship descriptions from remaining documents
|
1051 |
+
|
1052 |
+
The deletion process includes:
|
1053 |
+
1. Delete all text chunks related to the document
|
1054 |
+
2. Identify and delete entities and relationships that belong only to this document
|
1055 |
+
3. Rebuild entities and relationships that still exist in other documents
|
1056 |
+
4. Update all related vector indexes
|
1057 |
+
5. Clean up document status records
|
1058 |
+
|
1059 |
+
Note: Deletion by document ID is an asynchronous operation as it involves complex knowledge graph reconstruction processes.
|
1060 |
+
|
1061 |
+
</details>
|
1062 |
+
|
1063 |
+
**Important Reminders:**
|
1064 |
+
|
1065 |
+
1. **Irreversible Operations**: All deletion operations are irreversible, please use with caution
|
1066 |
+
2. **Performance Considerations**: Deleting large amounts of data may take some time, especially deletion by document ID
|
1067 |
+
3. **Data Consistency**: Deletion operations automatically maintain consistency between the knowledge graph and vector database
|
1068 |
+
4. **Backup Recommendations**: Consider backing up data before performing important deletion operations
|
1069 |
+
|
1070 |
+
**Batch Deletion Recommendations:**
|
1071 |
+
- For batch deletion operations, consider using asynchronous methods for better performance
|
1072 |
+
- For large-scale deletions, consider processing in batches to avoid excessive system load
|
1073 |
+
|
1074 |
+
## Entity Merging
|
1075 |
+
|
1076 |
+
<details>
|
1077 |
+
<summary> <b>Merge Entities and Their Relationships</b> </summary>
|
1078 |
+
|
1079 |
+
LightRAG now supports merging multiple entities into a single entity, automatically handling all relationships:
|
1080 |
+
|
1081 |
+
```python
|
1082 |
+
# Basic entity merging
|
1083 |
+
rag.merge_entities(
|
1084 |
+
source_entities=["Artificial Intelligence", "AI", "Machine Intelligence"],
|
1085 |
+
target_entity="AI Technology"
|
1086 |
+
)
|
1087 |
+
```
|
1088 |
+
|
1089 |
+
With custom merge strategy:
|
1090 |
+
|
1091 |
+
```python
|
1092 |
+
# Define custom merge strategy for different fields
|
1093 |
+
rag.merge_entities(
|
1094 |
+
source_entities=["John Smith", "Dr. Smith", "J. Smith"],
|
1095 |
+
target_entity="John Smith",
|
1096 |
+
merge_strategy={
|
1097 |
+
"description": "concatenate", # Combine all descriptions
|
1098 |
+
"entity_type": "keep_first", # Keep the entity type from the first entity
|
1099 |
+
"source_id": "join_unique" # Combine all unique source IDs
|
1100 |
+
}
|
1101 |
+
)
|
1102 |
+
```
|
1103 |
+
|
1104 |
+
With custom target entity data:
|
1105 |
+
|
1106 |
+
```python
|
1107 |
+
# Specify exact values for the merged entity
|
1108 |
+
rag.merge_entities(
|
1109 |
+
source_entities=["New York", "NYC", "Big Apple"],
|
1110 |
+
target_entity="New York City",
|
1111 |
+
target_entity_data={
|
1112 |
+
"entity_type": "LOCATION",
|
1113 |
+
"description": "New York City is the most populous city in the United States.",
|
1114 |
+
}
|
1115 |
+
)
|
1116 |
+
```
|
1117 |
+
|
1118 |
+
Advanced usage combining both approaches:
|
1119 |
+
|
1120 |
+
```python
|
1121 |
+
# Merge company entities with both strategy and custom data
|
1122 |
+
rag.merge_entities(
|
1123 |
+
source_entities=["Microsoft Corp", "Microsoft Corporation", "MSFT"],
|
1124 |
+
target_entity="Microsoft",
|
1125 |
+
merge_strategy={
|
1126 |
+
"description": "concatenate", # Combine all descriptions
|
1127 |
+
"source_id": "join_unique" # Combine source IDs
|
1128 |
+
},
|
1129 |
+
target_entity_data={
|
1130 |
+
"entity_type": "ORGANIZATION",
|
1131 |
+
}
|
1132 |
+
)
|
1133 |
+
```
|
1134 |
+
|
1135 |
+
When merging entities:
|
1136 |
+
|
1137 |
+
* All relationships from source entities are redirected to the target entity
|
1138 |
+
* Duplicate relationships are intelligently merged
|
1139 |
+
* Self-relationships (loops) are prevented
|
1140 |
+
* Source entities are removed after merging
|
1141 |
+
* Relationship weights and attributes are preserved
|
1142 |
+
|
1143 |
+
</details>
|
1144 |
+
|
1145 |
## Entity Merging
|
1146 |
|
1147 |
<details>
|
lightrag/base.py
CHANGED
@@ -278,20 +278,20 @@ class BaseKVStorage(StorageNameSpace, ABC):
|
|
278 |
False: if the cache drop failed, or the cache mode is not supported
|
279 |
"""
|
280 |
|
281 |
-
async def drop_cache_by_chunk_ids(self, chunk_ids: list[str] | None = None) -> bool:
|
282 |
-
|
283 |
|
284 |
-
|
285 |
-
|
286 |
-
|
287 |
|
288 |
-
|
289 |
-
|
290 |
|
291 |
-
|
292 |
-
|
293 |
-
|
294 |
-
|
295 |
|
296 |
|
297 |
@dataclass
|
|
|
278 |
False: if the cache drop failed, or the cache mode is not supported
|
279 |
"""
|
280 |
|
281 |
+
# async def drop_cache_by_chunk_ids(self, chunk_ids: list[str] | None = None) -> bool:
|
282 |
+
# """Delete specific cache records from storage by chunk IDs
|
283 |
|
284 |
+
# Importance notes for in-memory storage:
|
285 |
+
# 1. Changes will be persisted to disk during the next index_done_callback
|
286 |
+
# 2. update flags to notify other processes that data persistence is needed
|
287 |
|
288 |
+
# Args:
|
289 |
+
# chunk_ids (list[str]): List of chunk IDs to be dropped from storage
|
290 |
|
291 |
+
# Returns:
|
292 |
+
# True: if the cache drop successfully
|
293 |
+
# False: if the cache drop failed, or the operation is not supported
|
294 |
+
# """
|
295 |
|
296 |
|
297 |
@dataclass
|
lightrag/kg/json_kv_impl.py
CHANGED
@@ -172,52 +172,52 @@ class JsonKVStorage(BaseKVStorage):
|
|
172 |
except Exception:
|
173 |
return False
|
174 |
|
175 |
-
async def drop_cache_by_chunk_ids(self, chunk_ids: list[str] | None = None) -> bool:
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
-
|
181 |
-
|
182 |
-
|
183 |
-
|
184 |
-
|
185 |
-
|
186 |
-
|
187 |
-
|
188 |
-
|
189 |
-
|
190 |
-
|
191 |
-
|
192 |
-
|
193 |
-
|
194 |
-
|
195 |
-
|
196 |
-
|
197 |
-
|
198 |
-
|
199 |
-
|
200 |
-
|
201 |
-
|
202 |
-
|
203 |
-
|
204 |
-
|
205 |
-
|
206 |
-
|
207 |
-
|
208 |
-
|
209 |
-
|
210 |
-
|
211 |
-
|
212 |
-
|
213 |
-
|
214 |
-
|
215 |
-
|
216 |
-
|
217 |
-
|
218 |
-
|
219 |
-
|
220 |
-
|
221 |
|
222 |
async def drop(self) -> dict[str, str]:
|
223 |
"""Drop all data from storage and clean up resources
|
|
|
172 |
except Exception:
|
173 |
return False
|
174 |
|
175 |
+
# async def drop_cache_by_chunk_ids(self, chunk_ids: list[str] | None = None) -> bool:
|
176 |
+
# """Delete specific cache records from storage by chunk IDs
|
177 |
+
|
178 |
+
# Importance notes for in-memory storage:
|
179 |
+
# 1. Changes will be persisted to disk during the next index_done_callback
|
180 |
+
# 2. update flags to notify other processes that data persistence is needed
|
181 |
+
|
182 |
+
# Args:
|
183 |
+
# chunk_ids (list[str]): List of chunk IDs to be dropped from storage
|
184 |
+
|
185 |
+
# Returns:
|
186 |
+
# True: if the cache drop successfully
|
187 |
+
# False: if the cache drop failed
|
188 |
+
# """
|
189 |
+
# if not chunk_ids:
|
190 |
+
# return False
|
191 |
+
|
192 |
+
# try:
|
193 |
+
# async with self._storage_lock:
|
194 |
+
# # Iterate through all cache modes to find entries with matching chunk_ids
|
195 |
+
# for mode_key, mode_data in list(self._data.items()):
|
196 |
+
# if isinstance(mode_data, dict):
|
197 |
+
# # Check each cached entry in this mode
|
198 |
+
# for cache_key, cache_entry in list(mode_data.items()):
|
199 |
+
# if (
|
200 |
+
# isinstance(cache_entry, dict)
|
201 |
+
# and cache_entry.get("chunk_id") in chunk_ids
|
202 |
+
# ):
|
203 |
+
# # Remove this cache entry
|
204 |
+
# del mode_data[cache_key]
|
205 |
+
# logger.debug(
|
206 |
+
# f"Removed cache entry {cache_key} for chunk {cache_entry.get('chunk_id')}"
|
207 |
+
# )
|
208 |
+
|
209 |
+
# # If the mode is now empty, remove it entirely
|
210 |
+
# if not mode_data:
|
211 |
+
# del self._data[mode_key]
|
212 |
+
|
213 |
+
# # Set update flags to notify persistence is needed
|
214 |
+
# await set_all_update_flags(self.namespace)
|
215 |
+
|
216 |
+
# logger.info(f"Cleared cache for {len(chunk_ids)} chunk IDs")
|
217 |
+
# return True
|
218 |
+
# except Exception as e:
|
219 |
+
# logger.error(f"Error clearing cache by chunk IDs: {e}")
|
220 |
+
# return False
|
221 |
|
222 |
async def drop(self) -> dict[str, str]:
|
223 |
"""Drop all data from storage and clean up resources
|
lightrag/kg/postgres_impl.py
CHANGED
@@ -106,6 +106,35 @@ class PostgreSQLDB:
|
|
106 |
):
|
107 |
pass
|
108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
async def _migrate_timestamp_columns(self):
|
110 |
"""Migrate timestamp columns in tables to timezone-aware types, assuming original data is in UTC time"""
|
111 |
# Tables and columns that need migration
|
@@ -203,6 +232,13 @@ class PostgreSQLDB:
|
|
203 |
logger.error(f"PostgreSQL, Failed to migrate timestamp columns: {e}")
|
204 |
# Don't throw an exception, allow the initialization process to continue
|
205 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
206 |
async def query(
|
207 |
self,
|
208 |
sql: str,
|
@@ -497,6 +533,7 @@ class PGKVStorage(BaseKVStorage):
|
|
497 |
"original_prompt": v["original_prompt"],
|
498 |
"return_value": v["return"],
|
499 |
"mode": mode,
|
|
|
500 |
}
|
501 |
|
502 |
await self.db.execute(upsert_sql, _data)
|
@@ -2357,6 +2394,7 @@ TABLES = {
|
|
2357 |
mode varchar(32) NOT NULL,
|
2358 |
original_prompt TEXT,
|
2359 |
return_value TEXT,
|
|
|
2360 |
create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
2361 |
update_time TIMESTAMP,
|
2362 |
CONSTRAINT LIGHTRAG_LLM_CACHE_PK PRIMARY KEY (workspace, mode, id)
|
@@ -2389,10 +2427,10 @@ SQL_TEMPLATES = {
|
|
2389 |
chunk_order_index, full_doc_id, file_path
|
2390 |
FROM LIGHTRAG_DOC_CHUNKS WHERE workspace=$1 AND id=$2
|
2391 |
""",
|
2392 |
-
"get_by_id_llm_response_cache": """SELECT id, original_prompt, COALESCE(return_value, '') as "return", mode
|
2393 |
FROM LIGHTRAG_LLM_CACHE WHERE workspace=$1 AND mode=$2
|
2394 |
""",
|
2395 |
-
"get_by_mode_id_llm_response_cache": """SELECT id, original_prompt, COALESCE(return_value, '') as "return", mode
|
2396 |
FROM LIGHTRAG_LLM_CACHE WHERE workspace=$1 AND mode=$2 AND id=$3
|
2397 |
""",
|
2398 |
"get_by_ids_full_docs": """SELECT id, COALESCE(content, '') as content
|
@@ -2402,7 +2440,7 @@ SQL_TEMPLATES = {
|
|
2402 |
chunk_order_index, full_doc_id, file_path
|
2403 |
FROM LIGHTRAG_DOC_CHUNKS WHERE workspace=$1 AND id IN ({ids})
|
2404 |
""",
|
2405 |
-
"get_by_ids_llm_response_cache": """SELECT id, original_prompt, COALESCE(return_value, '') as "return", mode
|
2406 |
FROM LIGHTRAG_LLM_CACHE WHERE workspace=$1 AND mode= IN ({ids})
|
2407 |
""",
|
2408 |
"filter_keys": "SELECT id FROM {table_name} WHERE workspace=$1 AND id IN ({ids})",
|
@@ -2411,12 +2449,13 @@ SQL_TEMPLATES = {
|
|
2411 |
ON CONFLICT (workspace,id) DO UPDATE
|
2412 |
SET content = $2, update_time = CURRENT_TIMESTAMP
|
2413 |
""",
|
2414 |
-
"upsert_llm_response_cache": """INSERT INTO LIGHTRAG_LLM_CACHE(workspace,id,original_prompt,return_value,mode)
|
2415 |
-
VALUES ($1, $2, $3, $4, $5)
|
2416 |
ON CONFLICT (workspace,mode,id) DO UPDATE
|
2417 |
SET original_prompt = EXCLUDED.original_prompt,
|
2418 |
return_value=EXCLUDED.return_value,
|
2419 |
mode=EXCLUDED.mode,
|
|
|
2420 |
update_time = CURRENT_TIMESTAMP
|
2421 |
""",
|
2422 |
"upsert_chunk": """INSERT INTO LIGHTRAG_DOC_CHUNKS (workspace, id, tokens,
|
|
|
106 |
):
|
107 |
pass
|
108 |
|
109 |
+
async def _migrate_llm_cache_add_chunk_id(self):
|
110 |
+
"""Add chunk_id column to LIGHTRAG_LLM_CACHE table if it doesn't exist"""
|
111 |
+
try:
|
112 |
+
# Check if chunk_id column exists
|
113 |
+
check_column_sql = """
|
114 |
+
SELECT column_name
|
115 |
+
FROM information_schema.columns
|
116 |
+
WHERE table_name = 'lightrag_llm_cache'
|
117 |
+
AND column_name = 'chunk_id'
|
118 |
+
"""
|
119 |
+
|
120 |
+
column_info = await self.query(check_column_sql)
|
121 |
+
if not column_info:
|
122 |
+
logger.info("Adding chunk_id column to LIGHTRAG_LLM_CACHE table")
|
123 |
+
add_column_sql = """
|
124 |
+
ALTER TABLE LIGHTRAG_LLM_CACHE
|
125 |
+
ADD COLUMN chunk_id VARCHAR(255) NULL
|
126 |
+
"""
|
127 |
+
await self.execute(add_column_sql)
|
128 |
+
logger.info(
|
129 |
+
"Successfully added chunk_id column to LIGHTRAG_LLM_CACHE table"
|
130 |
+
)
|
131 |
+
else:
|
132 |
+
logger.info(
|
133 |
+
"chunk_id column already exists in LIGHTRAG_LLM_CACHE table"
|
134 |
+
)
|
135 |
+
except Exception as e:
|
136 |
+
logger.warning(f"Failed to add chunk_id column to LIGHTRAG_LLM_CACHE: {e}")
|
137 |
+
|
138 |
async def _migrate_timestamp_columns(self):
|
139 |
"""Migrate timestamp columns in tables to timezone-aware types, assuming original data is in UTC time"""
|
140 |
# Tables and columns that need migration
|
|
|
232 |
logger.error(f"PostgreSQL, Failed to migrate timestamp columns: {e}")
|
233 |
# Don't throw an exception, allow the initialization process to continue
|
234 |
|
235 |
+
# Migrate LLM cache table to add chunk_id field if needed
|
236 |
+
try:
|
237 |
+
await self._migrate_llm_cache_add_chunk_id()
|
238 |
+
except Exception as e:
|
239 |
+
logger.error(f"PostgreSQL, Failed to migrate LLM cache chunk_id field: {e}")
|
240 |
+
# Don't throw an exception, allow the initialization process to continue
|
241 |
+
|
242 |
async def query(
|
243 |
self,
|
244 |
sql: str,
|
|
|
533 |
"original_prompt": v["original_prompt"],
|
534 |
"return_value": v["return"],
|
535 |
"mode": mode,
|
536 |
+
"chunk_id": v.get("chunk_id"),
|
537 |
}
|
538 |
|
539 |
await self.db.execute(upsert_sql, _data)
|
|
|
2394 |
mode varchar(32) NOT NULL,
|
2395 |
original_prompt TEXT,
|
2396 |
return_value TEXT,
|
2397 |
+
chunk_id VARCHAR(255) NULL,
|
2398 |
create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
2399 |
update_time TIMESTAMP,
|
2400 |
CONSTRAINT LIGHTRAG_LLM_CACHE_PK PRIMARY KEY (workspace, mode, id)
|
|
|
2427 |
chunk_order_index, full_doc_id, file_path
|
2428 |
FROM LIGHTRAG_DOC_CHUNKS WHERE workspace=$1 AND id=$2
|
2429 |
""",
|
2430 |
+
"get_by_id_llm_response_cache": """SELECT id, original_prompt, COALESCE(return_value, '') as "return", mode, chunk_id
|
2431 |
FROM LIGHTRAG_LLM_CACHE WHERE workspace=$1 AND mode=$2
|
2432 |
""",
|
2433 |
+
"get_by_mode_id_llm_response_cache": """SELECT id, original_prompt, COALESCE(return_value, '') as "return", mode, chunk_id
|
2434 |
FROM LIGHTRAG_LLM_CACHE WHERE workspace=$1 AND mode=$2 AND id=$3
|
2435 |
""",
|
2436 |
"get_by_ids_full_docs": """SELECT id, COALESCE(content, '') as content
|
|
|
2440 |
chunk_order_index, full_doc_id, file_path
|
2441 |
FROM LIGHTRAG_DOC_CHUNKS WHERE workspace=$1 AND id IN ({ids})
|
2442 |
""",
|
2443 |
+
"get_by_ids_llm_response_cache": """SELECT id, original_prompt, COALESCE(return_value, '') as "return", mode, chunk_id
|
2444 |
FROM LIGHTRAG_LLM_CACHE WHERE workspace=$1 AND mode= IN ({ids})
|
2445 |
""",
|
2446 |
"filter_keys": "SELECT id FROM {table_name} WHERE workspace=$1 AND id IN ({ids})",
|
|
|
2449 |
ON CONFLICT (workspace,id) DO UPDATE
|
2450 |
SET content = $2, update_time = CURRENT_TIMESTAMP
|
2451 |
""",
|
2452 |
+
"upsert_llm_response_cache": """INSERT INTO LIGHTRAG_LLM_CACHE(workspace,id,original_prompt,return_value,mode,chunk_id)
|
2453 |
+
VALUES ($1, $2, $3, $4, $5, $6)
|
2454 |
ON CONFLICT (workspace,mode,id) DO UPDATE
|
2455 |
SET original_prompt = EXCLUDED.original_prompt,
|
2456 |
return_value=EXCLUDED.return_value,
|
2457 |
mode=EXCLUDED.mode,
|
2458 |
+
chunk_id=EXCLUDED.chunk_id,
|
2459 |
update_time = CURRENT_TIMESTAMP
|
2460 |
""",
|
2461 |
"upsert_chunk": """INSERT INTO LIGHTRAG_DOC_CHUNKS (workspace, id, tokens,
|
lightrag/lightrag.py
CHANGED
@@ -1710,17 +1710,17 @@ class LightRAG:
|
|
1710 |
chunk_ids = set(related_chunks.keys())
|
1711 |
logger.info(f"Found {len(chunk_ids)} chunks to delete")
|
1712 |
|
1713 |
-
# 3. **OPTIMIZATION 1**: Clear LLM cache for related chunks
|
1714 |
-
logger.info("Clearing LLM cache for related chunks...")
|
1715 |
-
cache_cleared = await self.llm_response_cache.drop_cache_by_chunk_ids(
|
1716 |
-
|
1717 |
-
)
|
1718 |
-
if cache_cleared:
|
1719 |
-
|
1720 |
-
else:
|
1721 |
-
|
1722 |
-
|
1723 |
-
|
1724 |
|
1725 |
# 4. Analyze entities and relationships that will be affected
|
1726 |
entities_to_delete = set()
|
|
|
1710 |
chunk_ids = set(related_chunks.keys())
|
1711 |
logger.info(f"Found {len(chunk_ids)} chunks to delete")
|
1712 |
|
1713 |
+
# # 3. **OPTIMIZATION 1**: Clear LLM cache for related chunks
|
1714 |
+
# logger.info("Clearing LLM cache for related chunks...")
|
1715 |
+
# cache_cleared = await self.llm_response_cache.drop_cache_by_chunk_ids(
|
1716 |
+
# list(chunk_ids)
|
1717 |
+
# )
|
1718 |
+
# if cache_cleared:
|
1719 |
+
# logger.info(f"Successfully cleared cache for {len(chunk_ids)} chunks")
|
1720 |
+
# else:
|
1721 |
+
# logger.warning(
|
1722 |
+
# "Failed to clear chunk cache or cache clearing not supported"
|
1723 |
+
# )
|
1724 |
|
1725 |
# 4. Analyze entities and relationships that will be affected
|
1726 |
entities_to_delete = set()
|