yangdx commited on
Commit
62a507f
·
1 Parent(s): 99ce99e

Remove Oracle storage implementation

Browse files
README-zh.md CHANGED
@@ -11,7 +11,6 @@
11
  - [X] [2024.12.31]🎯📢LightRAG现在支持[通过文档ID删除](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
12
  - [X] [2024.11.25]🎯📢LightRAG现在支持无缝集成[自定义知识图谱](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg),使用户能够用自己的领域专业知识增强系统。
13
  - [X] [2024.11.19]🎯📢LightRAG的综合指南现已在[LearnOpenCV](https://learnopencv.com/lightrag)上发布。非常感谢博客作者。
14
- - [X] [2024.11.12]🎯📢LightRAG现在支持[Oracle Database 23ai的所有存储类型(KV、向量和图)](https://github.com/HKUDS/LightRAG/blob/main/examples/lightrag_oracle_demo.py)。
15
  - [X] [2024.11.11]🎯📢LightRAG现在支持[通过实体名称删除实体](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
16
  - [X] [2024.11.09]🎯📢推出[LightRAG Gui](https://lightrag-gui.streamlit.app),允许您插入、查询、可视化和下载LightRAG知识。
17
  - [X] [2024.11.04]🎯📢现在您可以[使用Neo4J进行存储](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage)。
@@ -1037,9 +1036,10 @@ rag.clear_cache(modes=["local"])
1037
  | **参数** | **类型** | **说明** | **默认值** |
1038
  |--------------|----------|-----------------|-------------|
1039
  | **working_dir** | `str` | 存储缓存的目录 | `lightrag_cache+timestamp` |
1040
- | **kv_storage** | `str` | 文档和文本块的存储类型。支持的类型:`JsonKVStorage`、`OracleKVStorage` | `JsonKVStorage` |
1041
- | **vector_storage** | `str` | 嵌入向量的存储类型。支持的类型:`NanoVectorDBStorage`、`OracleVectorDBStorage` | `NanoVectorDBStorage` |
1042
- | **graph_storage** | `str` | 图边和节点的存储类型。支持的类型:`NetworkXStorage`、`Neo4JStorage`、`OracleGraphStorage` | `NetworkXStorage` |
 
1043
  | **chunk_token_size** | `int` | 拆分文档时每个块的最大令牌大小 | `1200` |
1044
  | **chunk_overlap_token_size** | `int` | 拆分文档时两个块之间的重叠令牌大小 | `100` |
1045
  | **tiktoken_model_name** | `str` | 用于计算令牌数的Tiktoken编码器的模型名称 | `gpt-4o-mini` |
 
11
  - [X] [2024.12.31]🎯📢LightRAG现在支持[通过文档ID删除](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
12
  - [X] [2024.11.25]🎯📢LightRAG现在支持无缝集成[自定义知识图谱](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg),使用户能够用自己的领域专业知识增强系统。
13
  - [X] [2024.11.19]🎯📢LightRAG的综合指南现已在[LearnOpenCV](https://learnopencv.com/lightrag)上发布。非常感谢博客作者。
 
14
  - [X] [2024.11.11]🎯📢LightRAG现在支持[通过实体名称删除实体](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
15
  - [X] [2024.11.09]🎯📢推出[LightRAG Gui](https://lightrag-gui.streamlit.app),允许您插入、查询、可视化和下载LightRAG知识。
16
  - [X] [2024.11.04]🎯📢现在您可以[使用Neo4J进行存储](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage)。
 
1036
  | **参数** | **类型** | **说明** | **默认值** |
1037
  |--------------|----------|-----------------|-------------|
1038
  | **working_dir** | `str` | 存储缓存的目录 | `lightrag_cache+timestamp` |
1039
+ | **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage`,`TiDBKVStorage` | `JsonKVStorage` |
1040
+ | **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`TiDBVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
1041
+ | **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage`,`GremlinStorage` | `NetworkXStorage` |
1042
+ | **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
1043
  | **chunk_token_size** | `int` | 拆分文档时每个块的最大令牌大小 | `1200` |
1044
  | **chunk_overlap_token_size** | `int` | 拆分文档时两个块之间的重叠令牌大小 | `100` |
1045
  | **tiktoken_model_name** | `str` | 用于计算令牌数的Tiktoken编码器的模型名称 | `gpt-4o-mini` |
README.md CHANGED
@@ -41,7 +41,6 @@
41
  - [X] [2024.12.31]🎯📢LightRAG now supports [deletion by document ID](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
42
  - [X] [2024.11.25]🎯📢LightRAG now supports seamless integration of [custom knowledge graphs](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg), empowering users to enhance the system with their own domain expertise.
43
  - [X] [2024.11.19]🎯📢A comprehensive guide to LightRAG is now available on [LearnOpenCV](https://learnopencv.com/lightrag). Many thanks to the blog author.
44
- - [X] [2024.11.12]🎯📢LightRAG now supports [Oracle Database 23ai for all storage types (KV, vector, and graph)](https://github.com/HKUDS/LightRAG/blob/main/examples/lightrag_oracle_demo.py).
45
  - [X] [2024.11.11]🎯📢LightRAG now supports [deleting entities by their names](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
46
  - [X] [2024.11.09]🎯📢Introducing the [LightRAG Gui](https://lightrag-gui.streamlit.app), which allows you to insert, query, visualize, and download LightRAG knowledge.
47
  - [X] [2024.11.04]🎯📢You can now [use Neo4J for Storage](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage).
@@ -1065,9 +1064,10 @@ Valid modes are:
1065
  | **Parameter** | **Type** | **Explanation** | **Default** |
1066
  |--------------|----------|-----------------|-------------|
1067
  | **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
1068
- | **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` |
1069
- | **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` |
1070
- | **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` |
 
1071
  | **chunk_token_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
1072
  | **chunk_overlap_token_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
1073
  | **tiktoken_model_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
 
41
  - [X] [2024.12.31]🎯📢LightRAG now supports [deletion by document ID](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
42
  - [X] [2024.11.25]🎯📢LightRAG now supports seamless integration of [custom knowledge graphs](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg), empowering users to enhance the system with their own domain expertise.
43
  - [X] [2024.11.19]🎯📢A comprehensive guide to LightRAG is now available on [LearnOpenCV](https://learnopencv.com/lightrag). Many thanks to the blog author.
 
44
  - [X] [2024.11.11]🎯📢LightRAG now supports [deleting entities by their names](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
45
  - [X] [2024.11.09]🎯📢Introducing the [LightRAG Gui](https://lightrag-gui.streamlit.app), which allows you to insert, query, visualize, and download LightRAG knowledge.
46
  - [X] [2024.11.04]🎯📢You can now [use Neo4J for Storage](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage).
 
1064
  | **Parameter** | **Type** | **Explanation** | **Default** |
1065
  |--------------|----------|-----------------|-------------|
1066
  | **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
1067
+ | **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage`,`TiDBKVStorage` | `JsonKVStorage` |
1068
+ | **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`TiDBVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
1069
+ | **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage`,`GremlinStorage` | `NetworkXStorage` |
1070
+ | **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
1071
  | **chunk_token_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
1072
  | **chunk_overlap_token_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
1073
  | **tiktoken_model_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
config.ini.example CHANGED
@@ -13,15 +13,6 @@ uri=redis://localhost:6379/1
13
  [qdrant]
14
  uri = http://localhost:16333
15
 
16
- [oracle]
17
- dsn = localhost:1521/XEPDB1
18
- user = your_username
19
- password = your_password
20
- config_dir = /path/to/oracle/config
21
- wallet_location = /path/to/wallet # 可选
22
- wallet_password = your_wallet_password # 可选
23
- workspace = default # 可选,默认为default
24
-
25
  [tidb]
26
  host = localhost
27
  port = 4000
 
13
  [qdrant]
14
  uri = http://localhost:16333
15
 
 
 
 
 
 
 
 
 
 
16
  [tidb]
17
  host = localhost
18
  port = 4000
env.example CHANGED
@@ -109,16 +109,6 @@ LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage
109
  LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
110
  LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
111
 
112
- ### Oracle Database Configuration
113
- ORACLE_DSN=localhost:1521/XEPDB1
114
- ORACLE_USER=your_username
115
- ORACLE_PASSWORD='your_password'
116
- ORACLE_CONFIG_DIR=/path/to/oracle/config
117
- #ORACLE_WALLET_LOCATION=/path/to/wallet
118
- #ORACLE_WALLET_PASSWORD='your_password'
119
- ### separating all data from difference Lightrag instances(deprecating)
120
- #ORACLE_WORKSPACE=default
121
-
122
  ### TiDB Configuration
123
  TIDB_HOST=localhost
124
  TIDB_PORT=4000
 
109
  LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
110
  LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
111
 
 
 
 
 
 
 
 
 
 
 
112
  ### TiDB Configuration
113
  TIDB_HOST=localhost
114
  TIDB_PORT=4000
examples/lightrag_api_oracle_demo.py DELETED
@@ -1,267 +0,0 @@
1
- from fastapi import FastAPI, HTTPException, File, UploadFile
2
- from fastapi import Query
3
- from contextlib import asynccontextmanager
4
- from pydantic import BaseModel
5
- from typing import Optional, Any
6
-
7
- import sys
8
- import os
9
-
10
-
11
- from pathlib import Path
12
-
13
- import asyncio
14
- import nest_asyncio
15
- from lightrag import LightRAG, QueryParam
16
- from lightrag.llm.openai import openai_complete_if_cache, openai_embed
17
- from lightrag.utils import EmbeddingFunc
18
- import numpy as np
19
- from lightrag.kg.shared_storage import initialize_pipeline_status
20
-
21
-
22
- print(os.getcwd())
23
- script_directory = Path(__file__).resolve().parent.parent
24
- sys.path.append(os.path.abspath(script_directory))
25
-
26
-
27
- # Apply nest_asyncio to solve event loop issues
28
- nest_asyncio.apply()
29
-
30
- DEFAULT_RAG_DIR = "index_default"
31
-
32
-
33
- # We use OpenAI compatible API to call LLM on Oracle Cloud
34
- # More docs here https://github.com/jin38324/OCI_GenAI_access_gateway
35
- BASE_URL = "http://xxx.xxx.xxx.xxx:8088/v1/"
36
- APIKEY = "ocigenerativeai"
37
-
38
- # Configure working directory
39
- WORKING_DIR = os.environ.get("RAG_DIR", f"{DEFAULT_RAG_DIR}")
40
- print(f"WORKING_DIR: {WORKING_DIR}")
41
- LLM_MODEL = os.environ.get("LLM_MODEL", "cohere.command-r-plus-08-2024")
42
- print(f"LLM_MODEL: {LLM_MODEL}")
43
- EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "cohere.embed-multilingual-v3.0")
44
- print(f"EMBEDDING_MODEL: {EMBEDDING_MODEL}")
45
- EMBEDDING_MAX_TOKEN_SIZE = int(os.environ.get("EMBEDDING_MAX_TOKEN_SIZE", 512))
46
- print(f"EMBEDDING_MAX_TOKEN_SIZE: {EMBEDDING_MAX_TOKEN_SIZE}")
47
-
48
- if not os.path.exists(WORKING_DIR):
49
- os.mkdir(WORKING_DIR)
50
-
51
- os.environ["ORACLE_USER"] = ""
52
- os.environ["ORACLE_PASSWORD"] = ""
53
- os.environ["ORACLE_DSN"] = ""
54
- os.environ["ORACLE_CONFIG_DIR"] = "path_to_config_dir"
55
- os.environ["ORACLE_WALLET_LOCATION"] = "path_to_wallet_location"
56
- os.environ["ORACLE_WALLET_PASSWORD"] = "wallet_password"
57
- os.environ["ORACLE_WORKSPACE"] = "company"
58
-
59
-
60
- async def llm_model_func(
61
- prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
62
- ) -> str:
63
- return await openai_complete_if_cache(
64
- LLM_MODEL,
65
- prompt,
66
- system_prompt=system_prompt,
67
- history_messages=history_messages,
68
- api_key=APIKEY,
69
- base_url=BASE_URL,
70
- **kwargs,
71
- )
72
-
73
-
74
- async def embedding_func(texts: list[str]) -> np.ndarray:
75
- return await openai_embed(
76
- texts,
77
- model=EMBEDDING_MODEL,
78
- api_key=APIKEY,
79
- base_url=BASE_URL,
80
- )
81
-
82
-
83
- async def get_embedding_dim():
84
- test_text = ["This is a test sentence."]
85
- embedding = await embedding_func(test_text)
86
- embedding_dim = embedding.shape[1]
87
- return embedding_dim
88
-
89
-
90
- async def init():
91
- # Detect embedding dimension
92
- embedding_dimension = await get_embedding_dim()
93
- print(f"Detected embedding dimension: {embedding_dimension}")
94
- # Create Oracle DB connection
95
- # The `config` parameter is the connection configuration of Oracle DB
96
- # More docs here https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html
97
- # We storage data in unified tables, so we need to set a `workspace` parameter to specify which docs we want to store and query
98
- # Below is an example of how to connect to Oracle Autonomous Database on Oracle Cloud
99
-
100
- # Initialize LightRAG
101
- # We use Oracle DB as the KV/vector/graph storage
102
- rag = LightRAG(
103
- enable_llm_cache=False,
104
- working_dir=WORKING_DIR,
105
- chunk_token_size=512,
106
- llm_model_func=llm_model_func,
107
- embedding_func=EmbeddingFunc(
108
- embedding_dim=embedding_dimension,
109
- max_token_size=512,
110
- func=embedding_func,
111
- ),
112
- graph_storage="OracleGraphStorage",
113
- kv_storage="OracleKVStorage",
114
- vector_storage="OracleVectorDBStorage",
115
- )
116
-
117
- await rag.initialize_storages()
118
- await initialize_pipeline_status()
119
-
120
- return rag
121
-
122
-
123
- # Extract and Insert into LightRAG storage
124
- # with open("./dickens/book.txt", "r", encoding="utf-8") as f:
125
- # await rag.ainsert(f.read())
126
-
127
- # # Perform search in different modes
128
- # modes = ["naive", "local", "global", "hybrid"]
129
- # for mode in modes:
130
- # print("="*20, mode, "="*20)
131
- # print(await rag.aquery("这篇文档是关于什么内容的?", param=QueryParam(mode=mode)))
132
- # print("-"*100, "\n")
133
-
134
- # Data models
135
-
136
-
137
- class QueryRequest(BaseModel):
138
- query: str
139
- mode: str = "hybrid"
140
- only_need_context: bool = False
141
- only_need_prompt: bool = False
142
-
143
-
144
- class DataRequest(BaseModel):
145
- limit: int = 100
146
-
147
-
148
- class InsertRequest(BaseModel):
149
- text: str
150
-
151
-
152
- class Response(BaseModel):
153
- status: str
154
- data: Optional[Any] = None
155
- message: Optional[str] = None
156
-
157
-
158
- # API routes
159
-
160
- rag = None
161
-
162
-
163
- @asynccontextmanager
164
- async def lifespan(app: FastAPI):
165
- global rag
166
- rag = await init()
167
- print("done!")
168
- yield
169
-
170
-
171
- app = FastAPI(
172
- title="LightRAG API", description="API for RAG operations", lifespan=lifespan
173
- )
174
-
175
-
176
- @app.post("/query", response_model=Response)
177
- async def query_endpoint(request: QueryRequest):
178
- # try:
179
- # loop = asyncio.get_event_loop()
180
- if request.mode == "naive":
181
- top_k = 3
182
- else:
183
- top_k = 60
184
- result = await rag.aquery(
185
- request.query,
186
- param=QueryParam(
187
- mode=request.mode,
188
- only_need_context=request.only_need_context,
189
- only_need_prompt=request.only_need_prompt,
190
- top_k=top_k,
191
- ),
192
- )
193
- return Response(status="success", data=result)
194
- # except Exception as e:
195
- # raise HTTPException(status_code=500, detail=str(e))
196
-
197
-
198
- @app.get("/data", response_model=Response)
199
- async def query_all_nodes(type: str = Query("nodes"), limit: int = Query(100)):
200
- if type == "nodes":
201
- result = await rag.chunk_entity_relation_graph.get_all_nodes(limit=limit)
202
- elif type == "edges":
203
- result = await rag.chunk_entity_relation_graph.get_all_edges(limit=limit)
204
- elif type == "statistics":
205
- result = await rag.chunk_entity_relation_graph.get_statistics()
206
- return Response(status="success", data=result)
207
-
208
-
209
- @app.post("/insert", response_model=Response)
210
- async def insert_endpoint(request: InsertRequest):
211
- try:
212
- loop = asyncio.get_event_loop()
213
- await loop.run_in_executor(None, lambda: rag.insert(request.text))
214
- return Response(status="success", message="Text inserted successfully")
215
- except Exception as e:
216
- raise HTTPException(status_code=500, detail=str(e))
217
-
218
-
219
- @app.post("/insert_file", response_model=Response)
220
- async def insert_file(file: UploadFile = File(...)):
221
- try:
222
- file_content = await file.read()
223
- # Read file content
224
- try:
225
- content = file_content.decode("utf-8")
226
- except UnicodeDecodeError:
227
- # If UTF-8 decoding fails, try other encodings
228
- content = file_content.decode("gbk")
229
- # Insert file content
230
- loop = asyncio.get_event_loop()
231
- await loop.run_in_executor(None, lambda: rag.insert(content))
232
-
233
- return Response(
234
- status="success",
235
- message=f"File content from {file.filename} inserted successfully",
236
- )
237
- except Exception as e:
238
- raise HTTPException(status_code=500, detail=str(e))
239
-
240
-
241
- @app.get("/health")
242
- async def health_check():
243
- return {"status": "healthy"}
244
-
245
-
246
- if __name__ == "__main__":
247
- import uvicorn
248
-
249
- uvicorn.run(app, host="127.0.0.1", port=8020)
250
-
251
- # Usage example
252
- # To run the server, use the following command in your terminal:
253
- # python lightrag_api_openai_compatible_demo.py
254
-
255
- # Example requests:
256
- # 1. Query:
257
- # curl -X POST "http://127.0.0.1:8020/query" -H "Content-Type: application/json" -d '{"query": "your query here", "mode": "hybrid"}'
258
-
259
- # 2. Insert text:
260
- # curl -X POST "http://127.0.0.1:8020/insert" -H "Content-Type: application/json" -d '{"text": "your text here"}'
261
-
262
- # 3. Insert file:
263
- # curl -X POST "http://127.0.0.1:8020/insert_file" -H "Content-Type: multipart/form-data" -F "file=@path/to/your/file.txt"
264
-
265
-
266
- # 4. Health check:
267
- # curl -X GET "http://127.0.0.1:8020/health"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
examples/lightrag_oracle_demo.py DELETED
@@ -1,141 +0,0 @@
1
- import sys
2
- import os
3
- from pathlib import Path
4
- import asyncio
5
- from lightrag import LightRAG, QueryParam
6
- from lightrag.llm.openai import openai_complete_if_cache, openai_embed
7
- from lightrag.utils import EmbeddingFunc
8
- import numpy as np
9
- from lightrag.kg.shared_storage import initialize_pipeline_status
10
-
11
- print(os.getcwd())
12
- script_directory = Path(__file__).resolve().parent.parent
13
- sys.path.append(os.path.abspath(script_directory))
14
-
15
- WORKING_DIR = "./dickens"
16
-
17
- # We use OpenAI compatible API to call LLM on Oracle Cloud
18
- # More docs here https://github.com/jin38324/OCI_GenAI_access_gateway
19
- BASE_URL = "http://xxx.xxx.xxx.xxx:8088/v1/"
20
- APIKEY = "ocigenerativeai"
21
- CHATMODEL = "cohere.command-r-plus"
22
- EMBEDMODEL = "cohere.embed-multilingual-v3.0"
23
- CHUNK_TOKEN_SIZE = 1024
24
- MAX_TOKENS = 4000
25
-
26
- if not os.path.exists(WORKING_DIR):
27
- os.mkdir(WORKING_DIR)
28
-
29
- os.environ["ORACLE_USER"] = "username"
30
- os.environ["ORACLE_PASSWORD"] = "xxxxxxxxx"
31
- os.environ["ORACLE_DSN"] = "xxxxxxx_medium"
32
- os.environ["ORACLE_CONFIG_DIR"] = "path_to_config_dir"
33
- os.environ["ORACLE_WALLET_LOCATION"] = "path_to_wallet_location"
34
- os.environ["ORACLE_WALLET_PASSWORD"] = "wallet_password"
35
- os.environ["ORACLE_WORKSPACE"] = "company"
36
-
37
-
38
- async def llm_model_func(
39
- prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
40
- ) -> str:
41
- return await openai_complete_if_cache(
42
- CHATMODEL,
43
- prompt,
44
- system_prompt=system_prompt,
45
- history_messages=history_messages,
46
- api_key=APIKEY,
47
- base_url=BASE_URL,
48
- **kwargs,
49
- )
50
-
51
-
52
- async def embedding_func(texts: list[str]) -> np.ndarray:
53
- return await openai_embed(
54
- texts,
55
- model=EMBEDMODEL,
56
- api_key=APIKEY,
57
- base_url=BASE_URL,
58
- )
59
-
60
-
61
- async def get_embedding_dim():
62
- test_text = ["This is a test sentence."]
63
- embedding = await embedding_func(test_text)
64
- embedding_dim = embedding.shape[1]
65
- return embedding_dim
66
-
67
-
68
- async def initialize_rag():
69
- # Detect embedding dimension
70
- embedding_dimension = await get_embedding_dim()
71
- print(f"Detected embedding dimension: {embedding_dimension}")
72
-
73
- # Initialize LightRAG
74
- # We use Oracle DB as the KV/vector/graph storage
75
- # You can add `addon_params={"example_number": 1, "language": "Simplfied Chinese"}` to control the prompt
76
- rag = LightRAG(
77
- # log_level="DEBUG",
78
- working_dir=WORKING_DIR,
79
- entity_extract_max_gleaning=1,
80
- enable_llm_cache=True,
81
- enable_llm_cache_for_entity_extract=True,
82
- embedding_cache_config=None, # {"enabled": True,"similarity_threshold": 0.90},
83
- chunk_token_size=CHUNK_TOKEN_SIZE,
84
- llm_model_max_token_size=MAX_TOKENS,
85
- llm_model_func=llm_model_func,
86
- embedding_func=EmbeddingFunc(
87
- embedding_dim=embedding_dimension,
88
- max_token_size=500,
89
- func=embedding_func,
90
- ),
91
- graph_storage="OracleGraphStorage",
92
- kv_storage="OracleKVStorage",
93
- vector_storage="OracleVectorDBStorage",
94
- addon_params={
95
- "example_number": 1,
96
- "language": "Simplfied Chinese",
97
- "entity_types": ["organization", "person", "geo", "event"],
98
- "insert_batch_size": 2,
99
- },
100
- )
101
- await rag.initialize_storages()
102
- await initialize_pipeline_status()
103
-
104
- return rag
105
-
106
-
107
- async def main():
108
- try:
109
- # Initialize RAG instance
110
- rag = await initialize_rag()
111
-
112
- # Extract and Insert into LightRAG storage
113
- with open(WORKING_DIR + "/docs.txt", "r", encoding="utf-8") as f:
114
- all_text = f.read()
115
- texts = [x for x in all_text.split("\n") if x]
116
-
117
- # New mode use pipeline
118
- await rag.apipeline_enqueue_documents(texts)
119
- await rag.apipeline_process_enqueue_documents()
120
-
121
- # Old method use ainsert
122
- # await rag.ainsert(texts)
123
-
124
- # Perform search in different modes
125
- modes = ["naive", "local", "global", "hybrid"]
126
- for mode in modes:
127
- print("=" * 20, mode, "=" * 20)
128
- print(
129
- await rag.aquery(
130
- "What are the top themes in this story?",
131
- param=QueryParam(mode=mode),
132
- )
133
- )
134
- print("-" * 100, "\n")
135
-
136
- except Exception as e:
137
- print(f"An error occurred: {e}")
138
-
139
-
140
- if __name__ == "__main__":
141
- asyncio.run(main())
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lightrag/api/README-zh.md CHANGED
@@ -291,11 +291,10 @@ LightRAG 使用 4 种类型的存储用于不同目的:
291
 
292
  ```
293
  JsonKVStorage JsonFile(默认)
294
- MongoKVStorage MogonDB
295
  RedisKVStorage Redis
 
296
  TiDBKVStorage TiDB
297
- PGKVStorage Postgres
298
- OracleKVStorage Oracle
299
  ```
300
 
301
  * GRAPH_STORAGE 支持的实现名称
@@ -303,25 +302,21 @@ OracleKVStorage Oracle
303
  ```
304
  NetworkXStorage NetworkX(默认)
305
  Neo4JStorage Neo4J
306
- MongoGraphStorage MongoDB
307
- TiDBGraphStorage TiDB
308
  AGEStorage AGE
309
  GremlinStorage Gremlin
310
- PGGraphStorage Postgres
311
- OracleGraphStorage Postgres
312
  ```
313
 
314
  * VECTOR_STORAGE 支持的实现名称
315
 
316
  ```
317
  NanoVectorDBStorage NanoVector(默认)
 
318
  MilvusVectorDBStorge Milvus
319
  ChromaVectorDBStorage Chroma
320
- TiDBVectorDBStorage TiDB
321
- PGVectorStorage Postgres
322
  FaissVectorDBStorage Faiss
323
  QdrantVectorDBStorage Qdrant
324
- OracleVectorDBStorage Oracle
325
  MongoVectorDBStorage MongoDB
326
  ```
327
 
 
291
 
292
  ```
293
  JsonKVStorage JsonFile(默认)
294
+ PGKVStorage Postgres
295
  RedisKVStorage Redis
296
+ MongoKVStorage MogonDB
297
  TiDBKVStorage TiDB
 
 
298
  ```
299
 
300
  * GRAPH_STORAGE 支持的实现名称
 
302
  ```
303
  NetworkXStorage NetworkX(默认)
304
  Neo4JStorage Neo4J
305
+ PGGraphStorage Postgres
 
306
  AGEStorage AGE
307
  GremlinStorage Gremlin
 
 
308
  ```
309
 
310
  * VECTOR_STORAGE 支持的实现名称
311
 
312
  ```
313
  NanoVectorDBStorage NanoVector(默认)
314
+ PGVectorStorage Postgres
315
  MilvusVectorDBStorge Milvus
316
  ChromaVectorDBStorage Chroma
 
 
317
  FaissVectorDBStorage Faiss
318
  QdrantVectorDBStorage Qdrant
319
+ TiDBVectorDBStorage TiDB
320
  MongoVectorDBStorage MongoDB
321
  ```
322
 
lightrag/api/README.md CHANGED
@@ -302,11 +302,10 @@ Each storage type have servals implementations:
302
 
303
  ```
304
  JsonKVStorage JsonFile(default)
305
- MongoKVStorage MogonDB
306
  RedisKVStorage Redis
 
307
  TiDBKVStorage TiDB
308
- PGKVStorage Postgres
309
- OracleKVStorage Oracle
310
  ```
311
 
312
  * GRAPH_STORAGE supported implement-name
@@ -314,25 +313,21 @@ OracleKVStorage Oracle
314
  ```
315
  NetworkXStorage NetworkX(defualt)
316
  Neo4JStorage Neo4J
317
- MongoGraphStorage MongoDB
318
- TiDBGraphStorage TiDB
319
  AGEStorage AGE
320
  GremlinStorage Gremlin
321
- PGGraphStorage Postgres
322
- OracleGraphStorage Postgres
323
  ```
324
 
325
  * VECTOR_STORAGE supported implement-name
326
 
327
  ```
328
  NanoVectorDBStorage NanoVector(default)
329
- MilvusVectorDBStorage Milvus
330
- ChromaVectorDBStorage Chroma
331
- TiDBVectorDBStorage TiDB
332
  PGVectorStorage Postgres
 
 
333
  FaissVectorDBStorage Faiss
334
  QdrantVectorDBStorage Qdrant
335
- OracleVectorDBStorage Oracle
336
  MongoVectorDBStorage MongoDB
337
  ```
338
 
 
302
 
303
  ```
304
  JsonKVStorage JsonFile(default)
305
+ PGKVStorage Postgres
306
  RedisKVStorage Redis
307
+ MongoKVStorage MogonDB
308
  TiDBKVStorage TiDB
 
 
309
  ```
310
 
311
  * GRAPH_STORAGE supported implement-name
 
313
  ```
314
  NetworkXStorage NetworkX(defualt)
315
  Neo4JStorage Neo4J
316
+ PGGraphStorage Postgres
 
317
  AGEStorage AGE
318
  GremlinStorage Gremlin
 
 
319
  ```
320
 
321
  * VECTOR_STORAGE supported implement-name
322
 
323
  ```
324
  NanoVectorDBStorage NanoVector(default)
 
 
 
325
  PGVectorStorage Postgres
326
+ MilvusVectorDBStorge Milvus
327
+ ChromaVectorDBStorage Chroma
328
  FaissVectorDBStorage Faiss
329
  QdrantVectorDBStorage Qdrant
330
+ TiDBVectorDBStorage TiDB
331
  MongoVectorDBStorage MongoDB
332
  ```
333
 
lightrag/kg/__init__.py CHANGED
@@ -6,7 +6,6 @@ STORAGE_IMPLEMENTATIONS = {
6
  "RedisKVStorage",
7
  "TiDBKVStorage",
8
  "PGKVStorage",
9
- "OracleKVStorage",
10
  ],
11
  "required_methods": ["get_by_id", "upsert"],
12
  },
@@ -19,7 +18,6 @@ STORAGE_IMPLEMENTATIONS = {
19
  "AGEStorage",
20
  "GremlinStorage",
21
  "PGGraphStorage",
22
- # "OracleGraphStorage",
23
  ],
24
  "required_methods": ["upsert_node", "upsert_edge"],
25
  },
@@ -32,7 +30,6 @@ STORAGE_IMPLEMENTATIONS = {
32
  "PGVectorStorage",
33
  "FaissVectorDBStorage",
34
  "QdrantVectorDBStorage",
35
- "OracleVectorDBStorage",
36
  "MongoVectorDBStorage",
37
  ],
38
  "required_methods": ["query", "upsert"],
@@ -41,7 +38,6 @@ STORAGE_IMPLEMENTATIONS = {
41
  "implementations": [
42
  "JsonDocStatusStorage",
43
  "PGDocStatusStorage",
44
- "PGDocStatusStorage",
45
  "MongoDocStatusStorage",
46
  ],
47
  "required_methods": ["get_docs_by_status"],
@@ -56,12 +52,6 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
56
  "RedisKVStorage": ["REDIS_URI"],
57
  "TiDBKVStorage": ["TIDB_USER", "TIDB_PASSWORD", "TIDB_DATABASE"],
58
  "PGKVStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
59
- "OracleKVStorage": [
60
- "ORACLE_DSN",
61
- "ORACLE_USER",
62
- "ORACLE_PASSWORD",
63
- "ORACLE_CONFIG_DIR",
64
- ],
65
  # Graph Storage Implementations
66
  "NetworkXStorage": [],
67
  "Neo4JStorage": ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"],
@@ -78,12 +68,6 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
78
  "POSTGRES_PASSWORD",
79
  "POSTGRES_DATABASE",
80
  ],
81
- "OracleGraphStorage": [
82
- "ORACLE_DSN",
83
- "ORACLE_USER",
84
- "ORACLE_PASSWORD",
85
- "ORACLE_CONFIG_DIR",
86
- ],
87
  # Vector Storage Implementations
88
  "NanoVectorDBStorage": [],
89
  "MilvusVectorDBStorage": [],
@@ -92,12 +76,6 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
92
  "PGVectorStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
93
  "FaissVectorDBStorage": [],
94
  "QdrantVectorDBStorage": ["QDRANT_URL"], # QDRANT_API_KEY has default value None
95
- "OracleVectorDBStorage": [
96
- "ORACLE_DSN",
97
- "ORACLE_USER",
98
- "ORACLE_PASSWORD",
99
- "ORACLE_CONFIG_DIR",
100
- ],
101
  "MongoVectorDBStorage": [],
102
  # Document Status Storage Implementations
103
  "JsonDocStatusStorage": [],
@@ -112,9 +90,6 @@ STORAGES = {
112
  "NanoVectorDBStorage": ".kg.nano_vector_db_impl",
113
  "JsonDocStatusStorage": ".kg.json_doc_status_impl",
114
  "Neo4JStorage": ".kg.neo4j_impl",
115
- "OracleKVStorage": ".kg.oracle_impl",
116
- "OracleGraphStorage": ".kg.oracle_impl",
117
- "OracleVectorDBStorage": ".kg.oracle_impl",
118
  "MilvusVectorDBStorage": ".kg.milvus_impl",
119
  "MongoKVStorage": ".kg.mongo_impl",
120
  "MongoDocStatusStorage": ".kg.mongo_impl",
 
6
  "RedisKVStorage",
7
  "TiDBKVStorage",
8
  "PGKVStorage",
 
9
  ],
10
  "required_methods": ["get_by_id", "upsert"],
11
  },
 
18
  "AGEStorage",
19
  "GremlinStorage",
20
  "PGGraphStorage",
 
21
  ],
22
  "required_methods": ["upsert_node", "upsert_edge"],
23
  },
 
30
  "PGVectorStorage",
31
  "FaissVectorDBStorage",
32
  "QdrantVectorDBStorage",
 
33
  "MongoVectorDBStorage",
34
  ],
35
  "required_methods": ["query", "upsert"],
 
38
  "implementations": [
39
  "JsonDocStatusStorage",
40
  "PGDocStatusStorage",
 
41
  "MongoDocStatusStorage",
42
  ],
43
  "required_methods": ["get_docs_by_status"],
 
52
  "RedisKVStorage": ["REDIS_URI"],
53
  "TiDBKVStorage": ["TIDB_USER", "TIDB_PASSWORD", "TIDB_DATABASE"],
54
  "PGKVStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
 
 
 
 
 
 
55
  # Graph Storage Implementations
56
  "NetworkXStorage": [],
57
  "Neo4JStorage": ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"],
 
68
  "POSTGRES_PASSWORD",
69
  "POSTGRES_DATABASE",
70
  ],
 
 
 
 
 
 
71
  # Vector Storage Implementations
72
  "NanoVectorDBStorage": [],
73
  "MilvusVectorDBStorage": [],
 
76
  "PGVectorStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
77
  "FaissVectorDBStorage": [],
78
  "QdrantVectorDBStorage": ["QDRANT_URL"], # QDRANT_API_KEY has default value None
 
 
 
 
 
 
79
  "MongoVectorDBStorage": [],
80
  # Document Status Storage Implementations
81
  "JsonDocStatusStorage": [],
 
90
  "NanoVectorDBStorage": ".kg.nano_vector_db_impl",
91
  "JsonDocStatusStorage": ".kg.json_doc_status_impl",
92
  "Neo4JStorage": ".kg.neo4j_impl",
 
 
 
93
  "MilvusVectorDBStorage": ".kg.milvus_impl",
94
  "MongoKVStorage": ".kg.mongo_impl",
95
  "MongoDocStatusStorage": ".kg.mongo_impl",
lightrag/kg/oracle_impl.py DELETED
@@ -1,1463 +0,0 @@
1
- import array
2
- import asyncio
3
-
4
- # import html
5
- import os
6
- from dataclasses import dataclass, field
7
- from typing import Any, Union, final
8
- import numpy as np
9
- import configparser
10
-
11
- from lightrag.types import KnowledgeGraph, KnowledgeGraphNode, KnowledgeGraphEdge
12
-
13
- from ..base import (
14
- BaseGraphStorage,
15
- BaseKVStorage,
16
- BaseVectorStorage,
17
- )
18
- from ..namespace import NameSpace, is_namespace
19
- from ..utils import logger
20
-
21
- import pipmaster as pm
22
-
23
- if not pm.is_installed("graspologic"):
24
- pm.install("graspologic")
25
-
26
- if not pm.is_installed("oracledb"):
27
- pm.install("oracledb")
28
-
29
- from graspologic import embed
30
- import oracledb # type: ignore
31
-
32
-
33
- class OracleDB:
34
- def __init__(self, config, **kwargs):
35
- self.host = config.get("host", None)
36
- self.port = config.get("port", None)
37
- self.user = config.get("user", None)
38
- self.password = config.get("password", None)
39
- self.dsn = config.get("dsn", None)
40
- self.config_dir = config.get("config_dir", None)
41
- self.wallet_location = config.get("wallet_location", None)
42
- self.wallet_password = config.get("wallet_password", None)
43
- self.workspace = config.get("workspace", None)
44
- self.max = 12
45
- self.increment = 1
46
- logger.info(f"Using the label {self.workspace} for Oracle Graph as identifier")
47
- if self.user is None or self.password is None:
48
- raise ValueError("Missing database user or password")
49
-
50
- try:
51
- oracledb.defaults.fetch_lobs = False
52
-
53
- self.pool = oracledb.create_pool_async(
54
- user=self.user,
55
- password=self.password,
56
- dsn=self.dsn,
57
- config_dir=self.config_dir,
58
- wallet_location=self.wallet_location,
59
- wallet_password=self.wallet_password,
60
- min=1,
61
- max=self.max,
62
- increment=self.increment,
63
- )
64
- logger.info(f"Connected to Oracle database at {self.dsn}")
65
- except Exception as e:
66
- logger.error(f"Failed to connect to Oracle database at {self.dsn}")
67
- logger.error(f"Oracle database error: {e}")
68
- raise
69
-
70
- def numpy_converter_in(self, value):
71
- """Convert numpy array to array.array"""
72
- if value.dtype == np.float64:
73
- dtype = "d"
74
- elif value.dtype == np.float32:
75
- dtype = "f"
76
- else:
77
- dtype = "b"
78
- return array.array(dtype, value)
79
-
80
- def input_type_handler(self, cursor, value, arraysize):
81
- """Set the type handler for the input data"""
82
- if isinstance(value, np.ndarray):
83
- return cursor.var(
84
- oracledb.DB_TYPE_VECTOR,
85
- arraysize=arraysize,
86
- inconverter=self.numpy_converter_in,
87
- )
88
-
89
- def numpy_converter_out(self, value):
90
- """Convert array.array to numpy array"""
91
- if value.typecode == "b":
92
- dtype = np.int8
93
- elif value.typecode == "f":
94
- dtype = np.float32
95
- else:
96
- dtype = np.float64
97
- return np.array(value, copy=False, dtype=dtype)
98
-
99
- def output_type_handler(self, cursor, metadata):
100
- """Set the type handler for the output data"""
101
- if metadata.type_code is oracledb.DB_TYPE_VECTOR:
102
- return cursor.var(
103
- metadata.type_code,
104
- arraysize=cursor.arraysize,
105
- outconverter=self.numpy_converter_out,
106
- )
107
-
108
- async def check_tables(self):
109
- for k, v in TABLES.items():
110
- try:
111
- if k.lower() == "lightrag_graph":
112
- await self.query(
113
- "SELECT id FROM GRAPH_TABLE (lightrag_graph MATCH (a) COLUMNS (a.id)) fetch first row only"
114
- )
115
- else:
116
- await self.query(f"SELECT 1 FROM {k}")
117
- except Exception as e:
118
- logger.error(f"Failed to check table {k} in Oracle database")
119
- logger.error(f"Oracle database error: {e}")
120
- try:
121
- # print(v["ddl"])
122
- await self.execute(v["ddl"])
123
- logger.info(f"Created table {k} in Oracle database")
124
- except Exception as e:
125
- logger.error(f"Failed to create table {k} in Oracle database")
126
- logger.error(f"Oracle database error: {e}")
127
-
128
- logger.info("Finished check all tables in Oracle database")
129
-
130
- async def query(
131
- self, sql: str, params: dict = None, multirows: bool = False
132
- ) -> Union[dict, None]:
133
- async with self.pool.acquire() as connection:
134
- connection.inputtypehandler = self.input_type_handler
135
- connection.outputtypehandler = self.output_type_handler
136
- with connection.cursor() as cursor:
137
- try:
138
- await cursor.execute(sql, params)
139
- except Exception as e:
140
- logger.error(f"Oracle database error: {e}")
141
- raise
142
- columns = [column[0].lower() for column in cursor.description]
143
- if multirows:
144
- rows = await cursor.fetchall()
145
- if rows:
146
- data = [dict(zip(columns, row)) for row in rows]
147
- else:
148
- data = []
149
- else:
150
- row = await cursor.fetchone()
151
- if row:
152
- data = dict(zip(columns, row))
153
- else:
154
- data = None
155
- return data
156
-
157
- async def execute(self, sql: str, data: Union[list, dict] = None):
158
- # logger.info("go into OracleDB execute method")
159
- try:
160
- async with self.pool.acquire() as connection:
161
- connection.inputtypehandler = self.input_type_handler
162
- connection.outputtypehandler = self.output_type_handler
163
- with connection.cursor() as cursor:
164
- if data is None:
165
- await cursor.execute(sql)
166
- else:
167
- await cursor.execute(sql, data)
168
- await connection.commit()
169
- except Exception as e:
170
- logger.error(f"Oracle database error: {e}")
171
- raise
172
-
173
-
174
- class ClientManager:
175
- _instances: dict[str, Any] = {"db": None, "ref_count": 0}
176
- _lock = asyncio.Lock()
177
-
178
- @staticmethod
179
- def get_config() -> dict[str, Any]:
180
- config = configparser.ConfigParser()
181
- config.read("config.ini", "utf-8")
182
-
183
- return {
184
- "user": os.environ.get(
185
- "ORACLE_USER",
186
- config.get("oracle", "user", fallback=None),
187
- ),
188
- "password": os.environ.get(
189
- "ORACLE_PASSWORD",
190
- config.get("oracle", "password", fallback=None),
191
- ),
192
- "dsn": os.environ.get(
193
- "ORACLE_DSN",
194
- config.get("oracle", "dsn", fallback=None),
195
- ),
196
- "config_dir": os.environ.get(
197
- "ORACLE_CONFIG_DIR",
198
- config.get("oracle", "config_dir", fallback=None),
199
- ),
200
- "wallet_location": os.environ.get(
201
- "ORACLE_WALLET_LOCATION",
202
- config.get("oracle", "wallet_location", fallback=None),
203
- ),
204
- "wallet_password": os.environ.get(
205
- "ORACLE_WALLET_PASSWORD",
206
- config.get("oracle", "wallet_password", fallback=None),
207
- ),
208
- "workspace": os.environ.get(
209
- "ORACLE_WORKSPACE",
210
- config.get("oracle", "workspace", fallback="default"),
211
- ),
212
- }
213
-
214
- @classmethod
215
- async def get_client(cls) -> OracleDB:
216
- async with cls._lock:
217
- if cls._instances["db"] is None:
218
- config = ClientManager.get_config()
219
- db = OracleDB(config)
220
- await db.check_tables()
221
- cls._instances["db"] = db
222
- cls._instances["ref_count"] = 0
223
- cls._instances["ref_count"] += 1
224
- return cls._instances["db"]
225
-
226
- @classmethod
227
- async def release_client(cls, db: OracleDB):
228
- async with cls._lock:
229
- if db is not None:
230
- if db is cls._instances["db"]:
231
- cls._instances["ref_count"] -= 1
232
- if cls._instances["ref_count"] == 0:
233
- await db.pool.close()
234
- logger.info("Closed OracleDB database connection pool")
235
- cls._instances["db"] = None
236
- else:
237
- await db.pool.close()
238
-
239
-
240
- @final
241
- @dataclass
242
- class OracleKVStorage(BaseKVStorage):
243
- db: OracleDB = field(default=None)
244
- meta_fields = None
245
-
246
- def __post_init__(self):
247
- self._data = {}
248
- self._max_batch_size = self.global_config.get("embedding_batch_num", 10)
249
-
250
- async def initialize(self):
251
- if self.db is None:
252
- self.db = await ClientManager.get_client()
253
-
254
- async def finalize(self):
255
- if self.db is not None:
256
- await ClientManager.release_client(self.db)
257
- self.db = None
258
-
259
- ################ QUERY METHODS ################
260
-
261
- async def get_by_id(self, id: str) -> dict[str, Any] | None:
262
- """Get doc_full data based on id."""
263
- SQL = SQL_TEMPLATES["get_by_id_" + self.namespace]
264
- params = {"workspace": self.db.workspace, "id": id}
265
- # print("get_by_id:"+SQL)
266
- if is_namespace(self.namespace, NameSpace.KV_STORE_LLM_RESPONSE_CACHE):
267
- array_res = await self.db.query(SQL, params, multirows=True)
268
- res = {}
269
- for row in array_res:
270
- res[row["id"]] = row
271
- if res:
272
- return res
273
- else:
274
- return None
275
- else:
276
- return await self.db.query(SQL, params)
277
-
278
- async def get_by_mode_and_id(self, mode: str, id: str) -> Union[dict, None]:
279
- """Specifically for llm_response_cache."""
280
- SQL = SQL_TEMPLATES["get_by_mode_id_" + self.namespace]
281
- params = {"workspace": self.db.workspace, "cache_mode": mode, "id": id}
282
- if is_namespace(self.namespace, NameSpace.KV_STORE_LLM_RESPONSE_CACHE):
283
- array_res = await self.db.query(SQL, params, multirows=True)
284
- res = {}
285
- for row in array_res:
286
- res[row["id"]] = row
287
- return res
288
- else:
289
- return None
290
-
291
- async def get_by_ids(self, ids: list[str]) -> list[dict[str, Any]]:
292
- """Get doc_chunks data based on id"""
293
- SQL = SQL_TEMPLATES["get_by_ids_" + self.namespace].format(
294
- ids=",".join([f"'{id}'" for id in ids])
295
- )
296
- params = {"workspace": self.db.workspace}
297
- # print("get_by_ids:"+SQL)
298
- res = await self.db.query(SQL, params, multirows=True)
299
- if is_namespace(self.namespace, NameSpace.KV_STORE_LLM_RESPONSE_CACHE):
300
- modes = set()
301
- dict_res: dict[str, dict] = {}
302
- for row in res:
303
- modes.add(row["mode"])
304
- for mode in modes:
305
- if mode not in dict_res:
306
- dict_res[mode] = {}
307
- for row in res:
308
- dict_res[row["mode"]][row["id"]] = row
309
- res = [{k: v} for k, v in dict_res.items()]
310
- return res
311
-
312
- async def filter_keys(self, keys: set[str]) -> set[str]:
313
- """Return keys that don't exist in storage"""
314
- SQL = SQL_TEMPLATES["filter_keys"].format(
315
- table_name=namespace_to_table_name(self.namespace),
316
- ids=",".join([f"'{id}'" for id in keys]),
317
- )
318
- params = {"workspace": self.db.workspace}
319
- res = await self.db.query(SQL, params, multirows=True)
320
- if res:
321
- exist_keys = [key["id"] for key in res]
322
- data = set([s for s in keys if s not in exist_keys])
323
- return data
324
- else:
325
- return set(keys)
326
-
327
- ################ INSERT METHODS ################
328
- async def upsert(self, data: dict[str, dict[str, Any]]) -> None:
329
- logger.info(f"Inserting {len(data)} to {self.namespace}")
330
- if not data:
331
- return
332
-
333
- if is_namespace(self.namespace, NameSpace.KV_STORE_TEXT_CHUNKS):
334
- list_data = [
335
- {
336
- "id": k,
337
- **{k1: v1 for k1, v1 in v.items()},
338
- }
339
- for k, v in data.items()
340
- ]
341
- contents = [v["content"] for v in data.values()]
342
- batches = [
343
- contents[i : i + self._max_batch_size]
344
- for i in range(0, len(contents), self._max_batch_size)
345
- ]
346
- embeddings_list = await asyncio.gather(
347
- *[self.embedding_func(batch) for batch in batches]
348
- )
349
- embeddings = np.concatenate(embeddings_list)
350
- for i, d in enumerate(list_data):
351
- d["__vector__"] = embeddings[i]
352
-
353
- merge_sql = SQL_TEMPLATES["merge_chunk"]
354
- for item in list_data:
355
- _data = {
356
- "id": item["id"],
357
- "content": item["content"],
358
- "workspace": self.db.workspace,
359
- "tokens": item["tokens"],
360
- "chunk_order_index": item["chunk_order_index"],
361
- "full_doc_id": item["full_doc_id"],
362
- "content_vector": item["__vector__"],
363
- "status": item["status"],
364
- }
365
- await self.db.execute(merge_sql, _data)
366
- if is_namespace(self.namespace, NameSpace.KV_STORE_FULL_DOCS):
367
- for k, v in data.items():
368
- # values.clear()
369
- merge_sql = SQL_TEMPLATES["merge_doc_full"]
370
- _data = {
371
- "id": k,
372
- "content": v["content"],
373
- "workspace": self.db.workspace,
374
- }
375
- await self.db.execute(merge_sql, _data)
376
-
377
- if is_namespace(self.namespace, NameSpace.KV_STORE_LLM_RESPONSE_CACHE):
378
- for mode, items in data.items():
379
- for k, v in items.items():
380
- upsert_sql = SQL_TEMPLATES["upsert_llm_response_cache"]
381
- _data = {
382
- "workspace": self.db.workspace,
383
- "id": k,
384
- "original_prompt": v["original_prompt"],
385
- "return_value": v["return"],
386
- "cache_mode": mode,
387
- }
388
-
389
- await self.db.execute(upsert_sql, _data)
390
-
391
- async def index_done_callback(self) -> None:
392
- # Oracle handles persistence automatically
393
- pass
394
-
395
- async def delete(self, ids: list[str]) -> None:
396
- """Delete records with specified IDs from the storage.
397
-
398
- Args:
399
- ids: List of record IDs to be deleted
400
- """
401
- if not ids:
402
- return
403
-
404
- try:
405
- table_name = namespace_to_table_name(self.namespace)
406
- if not table_name:
407
- logger.error(f"Unknown namespace for deletion: {self.namespace}")
408
- return
409
-
410
- ids_list = ",".join([f"'{id}'" for id in ids])
411
- delete_sql = f"DELETE FROM {table_name} WHERE workspace=:workspace AND id IN ({ids_list})"
412
-
413
- await self.db.execute(delete_sql, {"workspace": self.db.workspace})
414
- logger.info(
415
- f"Successfully deleted {len(ids)} records from {self.namespace}"
416
- )
417
- except Exception as e:
418
- logger.error(f"Error deleting records from {self.namespace}: {e}")
419
-
420
- async def drop_cache_by_modes(self, modes: list[str] | None = None) -> bool:
421
- """Delete specific records from storage by cache mode
422
-
423
- Args:
424
- modes (list[str]): List of cache modes to be dropped from storage
425
-
426
- Returns:
427
- bool: True if successful, False otherwise
428
- """
429
- if not modes:
430
- return False
431
-
432
- try:
433
- table_name = namespace_to_table_name(self.namespace)
434
- if not table_name:
435
- return False
436
-
437
- if table_name != "LIGHTRAG_LLM_CACHE":
438
- return False
439
-
440
- # 构建Oracle风格的IN查询
441
- modes_list = ", ".join([f"'{mode}'" for mode in modes])
442
- sql = f"""
443
- DELETE FROM {table_name}
444
- WHERE workspace = :workspace
445
- AND cache_mode IN ({modes_list})
446
- """
447
-
448
- logger.info(f"Deleting cache by modes: {modes}")
449
- await self.db.execute(sql, {"workspace": self.db.workspace})
450
- return True
451
- except Exception as e:
452
- logger.error(f"Error deleting cache by modes {modes}: {e}")
453
- return False
454
-
455
- async def drop(self) -> dict[str, str]:
456
- """Drop the storage"""
457
- try:
458
- table_name = namespace_to_table_name(self.namespace)
459
- if not table_name:
460
- return {
461
- "status": "error",
462
- "message": f"Unknown namespace: {self.namespace}",
463
- }
464
-
465
- drop_sql = SQL_TEMPLATES["drop_specifiy_table_workspace"].format(
466
- table_name=table_name
467
- )
468
- await self.db.execute(drop_sql, {"workspace": self.db.workspace})
469
- return {"status": "success", "message": "data dropped"}
470
- except Exception as e:
471
- return {"status": "error", "message": str(e)}
472
-
473
-
474
- @final
475
- @dataclass
476
- class OracleVectorDBStorage(BaseVectorStorage):
477
- db: OracleDB | None = field(default=None)
478
-
479
- def __post_init__(self):
480
- config = self.global_config.get("vector_db_storage_cls_kwargs", {})
481
- cosine_threshold = config.get("cosine_better_than_threshold")
482
- if cosine_threshold is None:
483
- raise ValueError(
484
- "cosine_better_than_threshold must be specified in vector_db_storage_cls_kwargs"
485
- )
486
- self.cosine_better_than_threshold = cosine_threshold
487
-
488
- async def initialize(self):
489
- if self.db is None:
490
- self.db = await ClientManager.get_client()
491
-
492
- async def finalize(self):
493
- if self.db is not None:
494
- await ClientManager.release_client(self.db)
495
- self.db = None
496
-
497
- #################### query method ###############
498
- async def query(
499
- self, query: str, top_k: int, ids: list[str] | None = None
500
- ) -> list[dict[str, Any]]:
501
- embeddings = await self.embedding_func([query])
502
- embedding = embeddings[0]
503
- # 转换精度
504
- dtype = str(embedding.dtype).upper()
505
- dimension = embedding.shape[0]
506
- embedding_string = "[" + ", ".join(map(str, embedding.tolist())) + "]"
507
-
508
- SQL = SQL_TEMPLATES[self.namespace].format(dimension=dimension, dtype=dtype)
509
- params = {
510
- "embedding_string": embedding_string,
511
- "workspace": self.db.workspace,
512
- "top_k": top_k,
513
- "better_than_threshold": self.cosine_better_than_threshold,
514
- }
515
- results = await self.db.query(SQL, params=params, multirows=True)
516
- return results
517
-
518
- async def upsert(self, data: dict[str, dict[str, Any]]) -> None:
519
- raise NotImplementedError
520
-
521
- async def index_done_callback(self) -> None:
522
- # Oracles handles persistence automatically
523
- pass
524
-
525
- async def delete(self, ids: list[str]) -> None:
526
- """Delete vectors with specified IDs
527
-
528
- Args:
529
- ids: List of vector IDs to be deleted
530
- """
531
- if not ids:
532
- return
533
-
534
- try:
535
- SQL = SQL_TEMPLATES["delete_vectors"].format(
536
- ids=",".join([f"'{id}'" for id in ids])
537
- )
538
- params = {"workspace": self.db.workspace}
539
- await self.db.execute(SQL, params)
540
- logger.info(
541
- f"Successfully deleted {len(ids)} vectors from {self.namespace}"
542
- )
543
- except Exception as e:
544
- logger.error(f"Error while deleting vectors from {self.namespace}: {e}")
545
- raise
546
-
547
- async def delete_entity(self, entity_name: str) -> None:
548
- """Delete entity by name
549
-
550
- Args:
551
- entity_name: Name of the entity to delete
552
- """
553
- try:
554
- SQL = SQL_TEMPLATES["delete_entity"]
555
- params = {"workspace": self.db.workspace, "entity_name": entity_name}
556
- await self.db.execute(SQL, params)
557
- logger.info(f"Successfully deleted entity {entity_name}")
558
- except Exception as e:
559
- logger.error(f"Error deleting entity {entity_name}: {e}")
560
- raise
561
-
562
- async def delete_entity_relation(self, entity_name: str) -> None:
563
- """Delete all relations connected to an entity
564
-
565
- Args:
566
- entity_name: Name of the entity whose relations should be deleted
567
- """
568
- try:
569
- SQL = SQL_TEMPLATES["delete_entity_relations"]
570
- params = {"workspace": self.db.workspace, "entity_name": entity_name}
571
- await self.db.execute(SQL, params)
572
- logger.info(f"Successfully deleted relations for entity {entity_name}")
573
- except Exception as e:
574
- logger.error(f"Error deleting relations for entity {entity_name}: {e}")
575
- raise
576
-
577
- async def search_by_prefix(self, prefix: str) -> list[dict[str, Any]]:
578
- """Search for records with IDs starting with a specific prefix.
579
-
580
- Args:
581
- prefix: The prefix to search for in record IDs
582
-
583
- Returns:
584
- List of records with matching ID prefixes
585
- """
586
- try:
587
- # Determine the appropriate table based on namespace
588
- table_name = namespace_to_table_name(self.namespace)
589
-
590
- # Create SQL query to find records with IDs starting with prefix
591
- search_sql = f"""
592
- SELECT * FROM {table_name}
593
- WHERE workspace = :workspace
594
- AND id LIKE :prefix_pattern
595
- ORDER BY id
596
- """
597
-
598
- params = {"workspace": self.db.workspace, "prefix_pattern": f"{prefix}%"}
599
-
600
- # Execute query and get results
601
- results = await self.db.query(search_sql, params, multirows=True)
602
-
603
- logger.debug(
604
- f"Found {len(results) if results else 0} records with prefix '{prefix}'"
605
- )
606
- return results or []
607
-
608
- except Exception as e:
609
- logger.error(f"Error searching records with prefix '{prefix}': {e}")
610
- return []
611
-
612
- async def get_by_id(self, id: str) -> dict[str, Any] | None:
613
- """Get vector data by its ID
614
-
615
- Args:
616
- id: The unique identifier of the vector
617
-
618
- Returns:
619
- The vector data if found, or None if not found
620
- """
621
- try:
622
- # Determine the table name based on namespace
623
- table_name = namespace_to_table_name(self.namespace)
624
- if not table_name:
625
- logger.error(f"Unknown namespace for ID lookup: {self.namespace}")
626
- return None
627
-
628
- # Create the appropriate ID field name based on namespace
629
- id_field = "entity_id" if "NODES" in table_name else "relation_id"
630
- if "CHUNKS" in table_name:
631
- id_field = "chunk_id"
632
-
633
- # Prepare and execute the query
634
- query = f"""
635
- SELECT * FROM {table_name}
636
- WHERE {id_field} = :id AND workspace = :workspace
637
- """
638
- params = {"id": id, "workspace": self.db.workspace}
639
-
640
- result = await self.db.query(query, params)
641
- return result
642
- except Exception as e:
643
- logger.error(f"Error retrieving vector data for ID {id}: {e}")
644
- return None
645
-
646
- async def get_by_ids(self, ids: list[str]) -> list[dict[str, Any]]:
647
- """Get multiple vector data by their IDs
648
-
649
- Args:
650
- ids: List of unique identifiers
651
-
652
- Returns:
653
- List of vector data objects that were found
654
- """
655
- if not ids:
656
- return []
657
-
658
- try:
659
- # Determine the table name based on namespace
660
- table_name = namespace_to_table_name(self.namespace)
661
- if not table_name:
662
- logger.error(f"Unknown namespace for IDs lookup: {self.namespace}")
663
- return []
664
-
665
- # Create the appropriate ID field name based on namespace
666
- id_field = "entity_id" if "NODES" in table_name else "relation_id"
667
- if "CHUNKS" in table_name:
668
- id_field = "chunk_id"
669
-
670
- # Format the list of IDs for SQL IN clause
671
- ids_list = ", ".join([f"'{id}'" for id in ids])
672
-
673
- # Prepare and execute the query
674
- query = f"""
675
- SELECT * FROM {table_name}
676
- WHERE {id_field} IN ({ids_list}) AND workspace = :workspace
677
- """
678
- params = {"workspace": self.db.workspace}
679
-
680
- results = await self.db.query(query, params, multirows=True)
681
- return results or []
682
- except Exception as e:
683
- logger.error(f"Error retrieving vector data for IDs {ids}: {e}")
684
- return []
685
-
686
- async def drop(self) -> dict[str, str]:
687
- """Drop the storage"""
688
- try:
689
- table_name = namespace_to_table_name(self.namespace)
690
- if not table_name:
691
- return {
692
- "status": "error",
693
- "message": f"Unknown namespace: {self.namespace}",
694
- }
695
-
696
- drop_sql = SQL_TEMPLATES["drop_specifiy_table_workspace"].format(
697
- table_name=table_name
698
- )
699
- await self.db.execute(drop_sql, {"workspace": self.db.workspace})
700
- return {"status": "success", "message": "data dropped"}
701
- except Exception as e:
702
- return {"status": "error", "message": str(e)}
703
-
704
-
705
- @final
706
- @dataclass
707
- class OracleGraphStorage(BaseGraphStorage):
708
- db: OracleDB = field(default=None)
709
-
710
- def __post_init__(self):
711
- self._max_batch_size = self.global_config.get("embedding_batch_num", 10)
712
-
713
- async def initialize(self):
714
- if self.db is None:
715
- self.db = await ClientManager.get_client()
716
-
717
- async def finalize(self):
718
- if self.db is not None:
719
- await ClientManager.release_client(self.db)
720
- self.db = None
721
-
722
- #################### insert method ################
723
-
724
- async def upsert_node(self, node_id: str, node_data: dict[str, str]) -> None:
725
- entity_name = node_id
726
- entity_type = node_data["entity_type"]
727
- description = node_data["description"]
728
- source_id = node_data["source_id"]
729
- logger.debug(f"entity_name:{entity_name}, entity_type:{entity_type}")
730
-
731
- content = entity_name + description
732
- contents = [content]
733
- batches = [
734
- contents[i : i + self._max_batch_size]
735
- for i in range(0, len(contents), self._max_batch_size)
736
- ]
737
- embeddings_list = await asyncio.gather(
738
- *[self.embedding_func(batch) for batch in batches]
739
- )
740
- embeddings = np.concatenate(embeddings_list)
741
- content_vector = embeddings[0]
742
- merge_sql = SQL_TEMPLATES["merge_node"]
743
- data = {
744
- "workspace": self.db.workspace,
745
- "name": entity_name,
746
- "entity_type": entity_type,
747
- "description": description,
748
- "source_chunk_id": source_id,
749
- "content": content,
750
- "content_vector": content_vector,
751
- }
752
- await self.db.execute(merge_sql, data)
753
- # self._graph.add_node(node_id, **node_data)
754
-
755
- async def upsert_edge(
756
- self, source_node_id: str, target_node_id: str, edge_data: dict[str, str]
757
- ) -> None:
758
- """插入或更新边"""
759
- # print("go into upsert edge method")
760
- source_name = source_node_id
761
- target_name = target_node_id
762
- weight = edge_data["weight"]
763
- keywords = edge_data["keywords"]
764
- description = edge_data["description"]
765
- source_chunk_id = edge_data["source_id"]
766
- logger.debug(
767
- f"source_name:{source_name}, target_name:{target_name}, keywords: {keywords}"
768
- )
769
-
770
- content = keywords + source_name + target_name + description
771
- contents = [content]
772
- batches = [
773
- contents[i : i + self._max_batch_size]
774
- for i in range(0, len(contents), self._max_batch_size)
775
- ]
776
- embeddings_list = await asyncio.gather(
777
- *[self.embedding_func(batch) for batch in batches]
778
- )
779
- embeddings = np.concatenate(embeddings_list)
780
- content_vector = embeddings[0]
781
- merge_sql = SQL_TEMPLATES["merge_edge"]
782
- data = {
783
- "workspace": self.db.workspace,
784
- "source_name": source_name,
785
- "target_name": target_name,
786
- "weight": weight,
787
- "keywords": keywords,
788
- "description": description,
789
- "source_chunk_id": source_chunk_id,
790
- "content": content,
791
- "content_vector": content_vector,
792
- }
793
- # print(merge_sql)
794
- await self.db.execute(merge_sql, data)
795
- # self._graph.add_edge(source_node_id, target_node_id, **edge_data)
796
-
797
- async def embed_nodes(
798
- self, algorithm: str
799
- ) -> tuple[np.ndarray[Any, Any], list[str]]:
800
- if algorithm not in self._node_embed_algorithms:
801
- raise ValueError(f"Node embedding algorithm {algorithm} not supported")
802
- return await self._node_embed_algorithms[algorithm]()
803
-
804
- async def _node2vec_embed(self):
805
- """为节点生成向量"""
806
- embeddings, nodes = embed.node2vec_embed(
807
- self._graph,
808
- **self.config["node2vec_params"],
809
- )
810
-
811
- nodes_ids = [self._graph.nodes[node_id]["id"] for node_id in nodes]
812
- return embeddings, nodes_ids
813
-
814
- async def index_done_callback(self) -> None:
815
- # Oracles handles persistence automatically
816
- pass
817
-
818
- #################### query method #################
819
- async def has_node(self, node_id: str) -> bool:
820
- """根据节点id检查节点是否存在"""
821
- SQL = SQL_TEMPLATES["has_node"]
822
- params = {"workspace": self.db.workspace, "node_id": node_id}
823
- res = await self.db.query(SQL, params)
824
- if res:
825
- # print("Node exist!",res)
826
- return True
827
- else:
828
- # print("Node not exist!")
829
- return False
830
-
831
- async def has_edge(self, source_node_id: str, target_node_id: str) -> bool:
832
- SQL = SQL_TEMPLATES["has_edge"]
833
- params = {
834
- "workspace": self.db.workspace,
835
- "source_node_id": source_node_id,
836
- "target_node_id": target_node_id,
837
- }
838
- res = await self.db.query(SQL, params)
839
- if res:
840
- # print("Edge exist!",res)
841
- return True
842
- else:
843
- # print("Edge not exist!")
844
- return False
845
-
846
- async def node_degree(self, node_id: str) -> int:
847
- SQL = SQL_TEMPLATES["node_degree"]
848
- params = {"workspace": self.db.workspace, "node_id": node_id}
849
- res = await self.db.query(SQL, params)
850
- if res:
851
- return res["degree"]
852
- else:
853
- return 0
854
-
855
- async def edge_degree(self, src_id: str, tgt_id: str) -> int:
856
- """根据源和目标节点id获取边的度"""
857
- degree = await self.node_degree(src_id) + await self.node_degree(tgt_id)
858
- return degree
859
-
860
- async def get_node(self, node_id: str) -> dict[str, str] | None:
861
- """根据节点id获取节点数据"""
862
- SQL = SQL_TEMPLATES["get_node"]
863
- params = {"workspace": self.db.workspace, "node_id": node_id}
864
- res = await self.db.query(SQL, params)
865
- if res:
866
- return res
867
- else:
868
- return None
869
-
870
- async def get_edge(
871
- self, source_node_id: str, target_node_id: str
872
- ) -> dict[str, str] | None:
873
- SQL = SQL_TEMPLATES["get_edge"]
874
- params = {
875
- "workspace": self.db.workspace,
876
- "source_node_id": source_node_id,
877
- "target_node_id": target_node_id,
878
- }
879
- res = await self.db.query(SQL, params)
880
- if res:
881
- # print("Get edge!",self.db.workspace, source_node_id, target_node_id,res[0])
882
- return res
883
- else:
884
- # print("Edge not exist!",self.db.workspace, source_node_id, target_node_id)
885
- return None
886
-
887
- async def get_node_edges(self, source_node_id: str) -> list[tuple[str, str]] | None:
888
- if await self.has_node(source_node_id):
889
- SQL = SQL_TEMPLATES["get_node_edges"]
890
- params = {"workspace": self.db.workspace, "source_node_id": source_node_id}
891
- res = await self.db.query(sql=SQL, params=params, multirows=True)
892
- if res:
893
- data = [(i["source_name"], i["target_name"]) for i in res]
894
- # print("Get node edge!",self.db.workspace, source_node_id,data)
895
- return data
896
- else:
897
- # print("Node Edge not exist!",self.db.workspace, source_node_id)
898
- return []
899
-
900
- async def get_all_nodes(self, limit: int):
901
- """查询所有节点"""
902
- SQL = SQL_TEMPLATES["get_all_nodes"]
903
- params = {"workspace": self.db.workspace, "limit": str(limit)}
904
- res = await self.db.query(sql=SQL, params=params, multirows=True)
905
- if res:
906
- return res
907
-
908
- async def get_all_edges(self, limit: int):
909
- """查询所有边"""
910
- SQL = SQL_TEMPLATES["get_all_edges"]
911
- params = {"workspace": self.db.workspace, "limit": str(limit)}
912
- res = await self.db.query(sql=SQL, params=params, multirows=True)
913
- if res:
914
- return res
915
-
916
- async def get_statistics(self):
917
- SQL = SQL_TEMPLATES["get_statistics"]
918
- params = {"workspace": self.db.workspace}
919
- res = await self.db.query(sql=SQL, params=params, multirows=True)
920
- if res:
921
- return res
922
-
923
- async def delete_node(self, node_id: str) -> None:
924
- """Delete a node from the graph
925
-
926
- Args:
927
- node_id: ID of the node to delete
928
- """
929
- try:
930
- # First delete all relations connected to this node
931
- delete_relations_sql = SQL_TEMPLATES["delete_entity_relations"]
932
- params_relations = {"workspace": self.db.workspace, "entity_name": node_id}
933
- await self.db.execute(delete_relations_sql, params_relations)
934
-
935
- # Then delete the node itself
936
- delete_node_sql = SQL_TEMPLATES["delete_entity"]
937
- params_node = {"workspace": self.db.workspace, "entity_name": node_id}
938
- await self.db.execute(delete_node_sql, params_node)
939
-
940
- logger.info(
941
- f"Successfully deleted node {node_id} and all its relationships"
942
- )
943
- except Exception as e:
944
- logger.error(f"Error deleting node {node_id}: {e}")
945
- raise
946
-
947
- async def remove_nodes(self, nodes: list[str]) -> None:
948
- """Delete multiple nodes from the graph
949
-
950
- Args:
951
- nodes: List of node IDs to be deleted
952
- """
953
- if not nodes:
954
- return
955
-
956
- try:
957
- for node in nodes:
958
- # For each node, first delete all its relationships
959
- delete_relations_sql = SQL_TEMPLATES["delete_entity_relations"]
960
- params_relations = {"workspace": self.db.workspace, "entity_name": node}
961
- await self.db.execute(delete_relations_sql, params_relations)
962
-
963
- # Then delete the node itself
964
- delete_node_sql = SQL_TEMPLATES["delete_entity"]
965
- params_node = {"workspace": self.db.workspace, "entity_name": node}
966
- await self.db.execute(delete_node_sql, params_node)
967
-
968
- logger.info(
969
- f"Successfully deleted {len(nodes)} nodes and their relationships"
970
- )
971
- except Exception as e:
972
- logger.error(f"Error during batch node deletion: {e}")
973
- raise
974
-
975
- async def remove_edges(self, edges: list[tuple[str, str]]) -> None:
976
- """Delete multiple edges from the graph
977
-
978
- Args:
979
- edges: List of edges to be deleted, each edge is a (source, target) tuple
980
- """
981
- if not edges:
982
- return
983
-
984
- try:
985
- for source, target in edges:
986
- # Check if the edge exists before attempting to delete
987
- if await self.has_edge(source, target):
988
- # Delete the edge using a SQL query that matches both source and target
989
- delete_edge_sql = """
990
- DELETE FROM LIGHTRAG_GRAPH_EDGES
991
- WHERE workspace = :workspace
992
- AND source_name = :source_name
993
- AND target_name = :target_name
994
- """
995
- params = {
996
- "workspace": self.db.workspace,
997
- "source_name": source,
998
- "target_name": target,
999
- }
1000
- await self.db.execute(delete_edge_sql, params)
1001
-
1002
- logger.info(f"Successfully deleted {len(edges)} edges from the graph")
1003
- except Exception as e:
1004
- logger.error(f"Error during batch edge deletion: {e}")
1005
- raise
1006
-
1007
- async def get_all_labels(self) -> list[str]:
1008
- """Get all unique entity types (labels) in the graph
1009
-
1010
- Returns:
1011
- List of unique entity types/labels
1012
- """
1013
- try:
1014
- SQL = """
1015
- SELECT DISTINCT entity_type
1016
- FROM LIGHTRAG_GRAPH_NODES
1017
- WHERE workspace = :workspace
1018
- ORDER BY entity_type
1019
- """
1020
- params = {"workspace": self.db.workspace}
1021
- results = await self.db.query(SQL, params, multirows=True)
1022
-
1023
- if results:
1024
- labels = [row["entity_type"] for row in results]
1025
- return labels
1026
- else:
1027
- return []
1028
- except Exception as e:
1029
- logger.error(f"Error retrieving entity types: {e}")
1030
- return []
1031
-
1032
- async def drop(self) -> dict[str, str]:
1033
- """Drop the storage"""
1034
- try:
1035
- # 使用图形查询删除所有节点和关系
1036
- delete_edges_sql = (
1037
- """DELETE FROM LIGHTRAG_GRAPH_EDGES WHERE workspace=:workspace"""
1038
- )
1039
- await self.db.execute(delete_edges_sql, {"workspace": self.db.workspace})
1040
-
1041
- delete_nodes_sql = (
1042
- """DELETE FROM LIGHTRAG_GRAPH_NODES WHERE workspace=:workspace"""
1043
- )
1044
- await self.db.execute(delete_nodes_sql, {"workspace": self.db.workspace})
1045
-
1046
- return {"status": "success", "message": "graph data dropped"}
1047
- except Exception as e:
1048
- logger.error(f"Error dropping graph: {e}")
1049
- return {"status": "error", "message": str(e)}
1050
-
1051
- async def get_knowledge_graph(
1052
- self, node_label: str, max_depth: int = 5
1053
- ) -> KnowledgeGraph:
1054
- """Retrieve a connected subgraph starting from nodes matching the given label
1055
-
1056
- Maximum number of nodes is constrained by MAX_GRAPH_NODES environment variable.
1057
- Prioritizes nodes by:
1058
- 1. Nodes matching the specified label
1059
- 2. Nodes directly connected to matching nodes
1060
- 3. Node degree (number of connections)
1061
-
1062
- Args:
1063
- node_label: Label to match for starting nodes (use "*" for all nodes)
1064
- max_depth: Maximum depth of traversal from starting nodes
1065
-
1066
- Returns:
1067
- KnowledgeGraph object containing nodes and edges
1068
- """
1069
- result = KnowledgeGraph()
1070
-
1071
- try:
1072
- # Define maximum number of nodes to return
1073
- max_graph_nodes = int(os.environ.get("MAX_GRAPH_NODES", 1000))
1074
-
1075
- if node_label == "*":
1076
- # For "*" label, get all nodes up to the limit
1077
- nodes_sql = """
1078
- SELECT name, entity_type, description, source_chunk_id
1079
- FROM LIGHTRAG_GRAPH_NODES
1080
- WHERE workspace = :workspace
1081
- ORDER BY id
1082
- FETCH FIRST :limit ROWS ONLY
1083
- """
1084
- nodes_params = {
1085
- "workspace": self.db.workspace,
1086
- "limit": max_graph_nodes,
1087
- }
1088
- nodes = await self.db.query(nodes_sql, nodes_params, multirows=True)
1089
- else:
1090
- # For specific label, find matching nodes and related nodes
1091
- nodes_sql = """
1092
- WITH matching_nodes AS (
1093
- SELECT name
1094
- FROM LIGHTRAG_GRAPH_NODES
1095
- WHERE workspace = :workspace
1096
- AND (name LIKE '%' || :node_label || '%' OR entity_type LIKE '%' || :node_label || '%')
1097
- )
1098
- SELECT n.name, n.entity_type, n.description, n.source_chunk_id,
1099
- CASE
1100
- WHEN n.name IN (SELECT name FROM matching_nodes) THEN 2
1101
- WHEN EXISTS (
1102
- SELECT 1 FROM LIGHTRAG_GRAPH_EDGES e
1103
- WHERE workspace = :workspace
1104
- AND ((e.source_name = n.name AND e.target_name IN (SELECT name FROM matching_nodes))
1105
- OR (e.target_name = n.name AND e.source_name IN (SELECT name FROM matching_nodes)))
1106
- ) THEN 1
1107
- ELSE 0
1108
- END AS priority,
1109
- (SELECT COUNT(*) FROM LIGHTRAG_GRAPH_EDGES e
1110
- WHERE workspace = :workspace
1111
- AND (e.source_name = n.name OR e.target_name = n.name)) AS degree
1112
- FROM LIGHTRAG_GRAPH_NODES n
1113
- WHERE workspace = :workspace
1114
- ORDER BY priority DESC, degree DESC
1115
- FETCH FIRST :limit ROWS ONLY
1116
- """
1117
- nodes_params = {
1118
- "workspace": self.db.workspace,
1119
- "node_label": node_label,
1120
- "limit": max_graph_nodes,
1121
- }
1122
- nodes = await self.db.query(nodes_sql, nodes_params, multirows=True)
1123
-
1124
- if not nodes:
1125
- logger.warning(f"No nodes found matching '{node_label}'")
1126
- return result
1127
-
1128
- # Create mapping of node IDs to be used to filter edges
1129
- node_names = [node["name"] for node in nodes]
1130
-
1131
- # Add nodes to result
1132
- seen_nodes = set()
1133
- for node in nodes:
1134
- node_id = node["name"]
1135
- if node_id in seen_nodes:
1136
- continue
1137
-
1138
- # Create node properties dictionary
1139
- properties = {
1140
- "entity_type": node["entity_type"],
1141
- "description": node["description"] or "",
1142
- "source_id": node["source_chunk_id"] or "",
1143
- }
1144
-
1145
- # Add node to result
1146
- result.nodes.append(
1147
- KnowledgeGraphNode(
1148
- id=node_id, labels=[node["entity_type"]], properties=properties
1149
- )
1150
- )
1151
- seen_nodes.add(node_id)
1152
-
1153
- # Get edges between these nodes
1154
- edges_sql = """
1155
- SELECT source_name, target_name, weight, keywords, description, source_chunk_id
1156
- FROM LIGHTRAG_GRAPH_EDGES
1157
- WHERE workspace = :workspace
1158
- AND source_name IN (SELECT COLUMN_VALUE FROM TABLE(CAST(:node_names AS SYS.ODCIVARCHAR2LIST)))
1159
- AND target_name IN (SELECT COLUMN_VALUE FROM TABLE(CAST(:node_names AS SYS.ODCIVARCHAR2LIST)))
1160
- ORDER BY id
1161
- """
1162
- edges_params = {"workspace": self.db.workspace, "node_names": node_names}
1163
- edges = await self.db.query(edges_sql, edges_params, multirows=True)
1164
-
1165
- # Add edges to result
1166
- seen_edges = set()
1167
- for edge in edges:
1168
- source = edge["source_name"]
1169
- target = edge["target_name"]
1170
- edge_id = f"{source}-{target}"
1171
-
1172
- if edge_id in seen_edges:
1173
- continue
1174
-
1175
- # Create edge properties dictionary
1176
- properties = {
1177
- "weight": edge["weight"] or 0.0,
1178
- "keywords": edge["keywords"] or "",
1179
- "description": edge["description"] or "",
1180
- "source_id": edge["source_chunk_id"] or "",
1181
- }
1182
-
1183
- # Add edge to result
1184
- result.edges.append(
1185
- KnowledgeGraphEdge(
1186
- id=edge_id,
1187
- type="RELATED",
1188
- source=source,
1189
- target=target,
1190
- properties=properties,
1191
- )
1192
- )
1193
- seen_edges.add(edge_id)
1194
-
1195
- logger.info(
1196
- f"Subgraph query successful | Node count: {len(result.nodes)} | Edge count: {len(result.edges)}"
1197
- )
1198
-
1199
- except Exception as e:
1200
- logger.error(f"Error retrieving knowledge graph: {e}")
1201
-
1202
- return result
1203
-
1204
-
1205
- N_T = {
1206
- NameSpace.KV_STORE_FULL_DOCS: "LIGHTRAG_DOC_FULL",
1207
- NameSpace.KV_STORE_TEXT_CHUNKS: "LIGHTRAG_DOC_CHUNKS",
1208
- NameSpace.VECTOR_STORE_CHUNKS: "LIGHTRAG_DOC_CHUNKS",
1209
- NameSpace.VECTOR_STORE_ENTITIES: "LIGHTRAG_GRAPH_NODES",
1210
- NameSpace.VECTOR_STORE_RELATIONSHIPS: "LIGHTRAG_GRAPH_EDGES",
1211
- }
1212
-
1213
-
1214
- def namespace_to_table_name(namespace: str) -> str:
1215
- for k, v in N_T.items():
1216
- if is_namespace(namespace, k):
1217
- return v
1218
-
1219
-
1220
- TABLES = {
1221
- "LIGHTRAG_DOC_FULL": {
1222
- "ddl": """CREATE TABLE LIGHTRAG_DOC_FULL (
1223
- id varchar(256),
1224
- workspace varchar(1024),
1225
- doc_name varchar(1024),
1226
- content CLOB,
1227
- meta JSON,
1228
- content_summary varchar(1024),
1229
- content_length NUMBER,
1230
- status varchar(256),
1231
- chunks_count NUMBER,
1232
- createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
1233
- updatetime TIMESTAMP DEFAULT NULL,
1234
- error varchar(4096)
1235
- )"""
1236
- },
1237
- "LIGHTRAG_DOC_CHUNKS": {
1238
- "ddl": """CREATE TABLE LIGHTRAG_DOC_CHUNKS (
1239
- id varchar(256),
1240
- workspace varchar(1024),
1241
- full_doc_id varchar(256),
1242
- status varchar(256),
1243
- chunk_order_index NUMBER,
1244
- tokens NUMBER,
1245
- content CLOB,
1246
- content_vector VECTOR,
1247
- createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
1248
- updatetime TIMESTAMP DEFAULT NULL
1249
- )"""
1250
- },
1251
- "LIGHTRAG_GRAPH_NODES": {
1252
- "ddl": """CREATE TABLE LIGHTRAG_GRAPH_NODES (
1253
- id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
1254
- workspace varchar(1024),
1255
- name varchar(2048),
1256
- entity_type varchar(1024),
1257
- description CLOB,
1258
- source_chunk_id varchar(256),
1259
- content CLOB,
1260
- content_vector VECTOR,
1261
- createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
1262
- updatetime TIMESTAMP DEFAULT NULL
1263
- )"""
1264
- },
1265
- "LIGHTRAG_GRAPH_EDGES": {
1266
- "ddl": """CREATE TABLE LIGHTRAG_GRAPH_EDGES (
1267
- id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
1268
- workspace varchar(1024),
1269
- source_name varchar(2048),
1270
- target_name varchar(2048),
1271
- weight NUMBER,
1272
- keywords CLOB,
1273
- description CLOB,
1274
- source_chunk_id varchar(256),
1275
- content CLOB,
1276
- content_vector VECTOR,
1277
- createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
1278
- updatetime TIMESTAMP DEFAULT NULL
1279
- )"""
1280
- },
1281
- "LIGHTRAG_LLM_CACHE": {
1282
- "ddl": """CREATE TABLE LIGHTRAG_LLM_CACHE (
1283
- id varchar(256) PRIMARY KEY,
1284
- workspace varchar(1024),
1285
- cache_mode varchar(256),
1286
- model_name varchar(256),
1287
- original_prompt clob,
1288
- return_value clob,
1289
- embedding CLOB,
1290
- embedding_shape NUMBER,
1291
- embedding_min NUMBER,
1292
- embedding_max NUMBER,
1293
- createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
1294
- updatetime TIMESTAMP DEFAULT NULL
1295
- )"""
1296
- },
1297
- "LIGHTRAG_GRAPH": {
1298
- "ddl": """CREATE OR REPLACE PROPERTY GRAPH lightrag_graph
1299
- VERTEX TABLES (
1300
- lightrag_graph_nodes KEY (id)
1301
- LABEL entity
1302
- PROPERTIES (id,workspace,name) -- ,entity_type,description,source_chunk_id)
1303
- )
1304
- EDGE TABLES (
1305
- lightrag_graph_edges KEY (id)
1306
- SOURCE KEY (source_name) REFERENCES lightrag_graph_nodes(name)
1307
- DESTINATION KEY (target_name) REFERENCES lightrag_graph_nodes(name)
1308
- LABEL has_relation
1309
- PROPERTIES (id,workspace,source_name,target_name) -- ,weight, keywords,description,source_chunk_id)
1310
- ) OPTIONS(ALLOW MIXED PROPERTY TYPES)"""
1311
- },
1312
- }
1313
-
1314
-
1315
- SQL_TEMPLATES = {
1316
- # SQL for KVStorage
1317
- "get_by_id_full_docs": "select ID,content,status from LIGHTRAG_DOC_FULL where workspace=:workspace and ID=:id",
1318
- "get_by_id_text_chunks": "select ID,TOKENS,content,CHUNK_ORDER_INDEX,FULL_DOC_ID,status from LIGHTRAG_DOC_CHUNKS where workspace=:workspace and ID=:id",
1319
- "get_by_id_llm_response_cache": """SELECT id, original_prompt, NVL(return_value, '') as "return", cache_mode as "mode"
1320
- FROM LIGHTRAG_LLM_CACHE WHERE workspace=:workspace AND id=:id""",
1321
- "get_by_mode_id_llm_response_cache": """SELECT id, original_prompt, NVL(return_value, '') as "return", cache_mode as "mode"
1322
- FROM LIGHTRAG_LLM_CACHE WHERE workspace=:workspace AND cache_mode=:cache_mode AND id=:id""",
1323
- "get_by_ids_llm_response_cache": """SELECT id, original_prompt, NVL(return_value, '') as "return", cache_mode as "mode"
1324
- FROM LIGHTRAG_LLM_CACHE WHERE workspace=:workspace AND id IN ({ids})""",
1325
- "get_by_ids_full_docs": "select t.*,createtime as created_at from LIGHTRAG_DOC_FULL t where workspace=:workspace and ID in ({ids})",
1326
- "get_by_ids_text_chunks": "select ID,TOKENS,content,CHUNK_ORDER_INDEX,FULL_DOC_ID from LIGHTRAG_DOC_CHUNKS where workspace=:workspace and ID in ({ids})",
1327
- "get_by_status_ids_full_docs": "select id,status from LIGHTRAG_DOC_FULL t where workspace=:workspace AND status=:status and ID in ({ids})",
1328
- "get_by_status_ids_text_chunks": "select id,status from LIGHTRAG_DOC_CHUNKS where workspace=:workspace and status=:status ID in ({ids})",
1329
- "get_by_status_full_docs": "select id,status from LIGHTRAG_DOC_FULL t where workspace=:workspace AND status=:status",
1330
- "get_by_status_text_chunks": "select id,status from LIGHTRAG_DOC_CHUNKS where workspace=:workspace and status=:status",
1331
- "filter_keys": "select id from {table_name} where workspace=:workspace and id in ({ids})",
1332
- "merge_doc_full": """MERGE INTO LIGHTRAG_DOC_FULL a
1333
- USING DUAL
1334
- ON (a.id = :id and a.workspace = :workspace)
1335
- WHEN NOT MATCHED THEN
1336
- INSERT(id,content,workspace) values(:id,:content,:workspace)""",
1337
- "merge_chunk": """MERGE INTO LIGHTRAG_DOC_CHUNKS
1338
- USING DUAL
1339
- ON (id = :id and workspace = :workspace)
1340
- WHEN NOT MATCHED THEN INSERT
1341
- (id,content,workspace,tokens,chunk_order_index,full_doc_id,content_vector,status)
1342
- values (:id,:content,:workspace,:tokens,:chunk_order_index,:full_doc_id,:content_vector,:status) """,
1343
- "upsert_llm_response_cache": """MERGE INTO LIGHTRAG_LLM_CACHE a
1344
- USING DUAL
1345
- ON (a.id = :id)
1346
- WHEN NOT MATCHED THEN
1347
- INSERT (workspace,id,original_prompt,return_value,cache_mode)
1348
- VALUES (:workspace,:id,:original_prompt,:return_value,:cache_mode)
1349
- WHEN MATCHED THEN UPDATE
1350
- SET original_prompt = :original_prompt,
1351
- return_value = :return_value,
1352
- cache_mode = :cache_mode,
1353
- updatetime = SYSDATE""",
1354
- # SQL for VectorStorage
1355
- "entities": """SELECT name as entity_name FROM
1356
- (SELECT id,name,VECTOR_DISTANCE(content_vector,vector(:embedding_string,{dimension},{dtype}),COSINE) as distance
1357
- FROM LIGHTRAG_GRAPH_NODES WHERE workspace=:workspace)
1358
- WHERE distance>:better_than_threshold ORDER BY distance ASC FETCH FIRST :top_k ROWS ONLY""",
1359
- "relationships": """SELECT source_name as src_id, target_name as tgt_id FROM
1360
- (SELECT id,source_name,target_name,VECTOR_DISTANCE(content_vector,vector(:embedding_string,{dimension},{dtype}),COSINE) as distance
1361
- FROM LIGHTRAG_GRAPH_EDGES WHERE workspace=:workspace)
1362
- WHERE distance>:better_than_threshold ORDER BY distance ASC FETCH FIRST :top_k ROWS ONLY""",
1363
- "chunks": """SELECT id FROM
1364
- (SELECT id,VECTOR_DISTANCE(content_vector,vector(:embedding_string,{dimension},{dtype}),COSINE) as distance
1365
- FROM LIGHTRAG_DOC_CHUNKS WHERE workspace=:workspace)
1366
- WHERE distance>:better_than_threshold ORDER BY distance ASC FETCH FIRST :top_k ROWS ONLY""",
1367
- # SQL for GraphStorage
1368
- "has_node": """SELECT * FROM GRAPH_TABLE (lightrag_graph
1369
- MATCH (a)
1370
- WHERE a.workspace=:workspace AND a.name=:node_id
1371
- COLUMNS (a.name))""",
1372
- "has_edge": """SELECT * FROM GRAPH_TABLE (lightrag_graph
1373
- MATCH (a) -[e]-> (b)
1374
- WHERE e.workspace=:workspace and a.workspace=:workspace and b.workspace=:workspace
1375
- AND a.name=:source_node_id AND b.name=:target_node_id
1376
- COLUMNS (e.source_name,e.target_name) )""",
1377
- "node_degree": """SELECT count(1) as degree FROM GRAPH_TABLE (lightrag_graph
1378
- MATCH (a)-[e]->(b)
1379
- WHERE e.workspace=:workspace and a.workspace=:workspace and b.workspace=:workspace
1380
- AND a.name=:node_id or b.name = :node_id
1381
- COLUMNS (a.name))""",
1382
- "get_node": """SELECT t1.name,t2.entity_type,t2.source_chunk_id as source_id,NVL(t2.description,'') AS description
1383
- FROM GRAPH_TABLE (lightrag_graph
1384
- MATCH (a)
1385
- WHERE a.workspace=:workspace AND a.name=:node_id
1386
- COLUMNS (a.name)
1387
- ) t1 JOIN LIGHTRAG_GRAPH_NODES t2 on t1.name=t2.name
1388
- WHERE t2.workspace=:workspace""",
1389
- "get_edge": """SELECT t1.source_id,t2.weight,t2.source_chunk_id as source_id,t2.keywords,
1390
- NVL(t2.description,'') AS description,NVL(t2.KEYWORDS,'') AS keywords
1391
- FROM GRAPH_TABLE (lightrag_graph
1392
- MATCH (a)-[e]->(b)
1393
- WHERE e.workspace=:workspace and a.workspace=:workspace and b.workspace=:workspace
1394
- AND a.name=:source_node_id and b.name = :target_node_id
1395
- COLUMNS (e.id,a.name as source_id)
1396
- ) t1 JOIN LIGHTRAG_GRAPH_EDGES t2 on t1.id=t2.id""",
1397
- "get_node_edges": """SELECT source_name,target_name
1398
- FROM GRAPH_TABLE (lightrag_graph
1399
- MATCH (a)-[e]->(b)
1400
- WHERE e.workspace=:workspace and a.workspace=:workspace and b.workspace=:workspace
1401
- AND a.name=:source_node_id
1402
- COLUMNS (a.name as source_name,b.name as target_name))""",
1403
- "merge_node": """MERGE INTO LIGHTRAG_GRAPH_NODES a
1404
- USING DUAL
1405
- ON (a.workspace=:workspace and a.name=:name)
1406
- WHEN NOT MATCHED THEN
1407
- INSERT(workspace,name,entity_type,description,source_chunk_id,content,content_vector)
1408
- values (:workspace,:name,:entity_type,:description,:source_chunk_id,:content,:content_vector)
1409
- WHEN MATCHED THEN
1410
- UPDATE SET
1411
- entity_type=:entity_type,description=:description,source_chunk_id=:source_chunk_id,content=:content,content_vector=:content_vector,updatetime=SYSDATE""",
1412
- "merge_edge": """MERGE INTO LIGHTRAG_GRAPH_EDGES a
1413
- USING DUAL
1414
- ON (a.workspace=:workspace and a.source_name=:source_name and a.target_name=:target_name)
1415
- WHEN NOT MATCHED THEN
1416
- INSERT(workspace,source_name,target_name,weight,keywords,description,source_chunk_id,content,content_vector)
1417
- values (:workspace,:source_name,:target_name,:weight,:keywords,:description,:source_chunk_id,:content,:content_vector)
1418
- WHEN MATCHED THEN
1419
- UPDATE SET
1420
- weight=:weight,keywords=:keywords,description=:description,source_chunk_id=:source_chunk_id,content=:content,content_vector=:content_vector,updatetime=SYSDATE""",
1421
- "get_all_nodes": """WITH t0 AS (
1422
- SELECT name AS id, entity_type AS label, entity_type, description,
1423
- '["' || replace(source_chunk_id, '<SEP>', '","') || '"]' source_chunk_ids
1424
- FROM lightrag_graph_nodes
1425
- WHERE workspace = :workspace
1426
- ORDER BY createtime DESC fetch first :limit rows only
1427
- ), t1 AS (
1428
- SELECT t0.id, source_chunk_id
1429
- FROM t0, JSON_TABLE ( source_chunk_ids, '$[*]' COLUMNS ( source_chunk_id PATH '$' ) )
1430
- ), t2 AS (
1431
- SELECT t1.id, LISTAGG(t2.content, '\n') content
1432
- FROM t1 LEFT JOIN lightrag_doc_chunks t2 ON t1.source_chunk_id = t2.id
1433
- GROUP BY t1.id
1434
- )
1435
- SELECT t0.id, label, entity_type, description, t2.content
1436
- FROM t0 LEFT JOIN t2 ON t0.id = t2.id""",
1437
- "get_all_edges": """SELECT t1.id,t1.keywords as label,t1.keywords, t1.source_name as source, t1.target_name as target,
1438
- t1.weight,t1.DESCRIPTION,t2.content
1439
- FROM LIGHTRAG_GRAPH_EDGES t1
1440
- LEFT JOIN LIGHTRAG_DOC_CHUNKS t2 on t1.source_chunk_id=t2.id
1441
- WHERE t1.workspace=:workspace
1442
- order by t1.CREATETIME DESC
1443
- fetch first :limit rows only""",
1444
- "get_statistics": """select count(distinct CASE WHEN type='node' THEN id END) as nodes_count,
1445
- count(distinct CASE WHEN type='edge' THEN id END) as edges_count
1446
- FROM (
1447
- select 'node' as type, id FROM GRAPH_TABLE (lightrag_graph
1448
- MATCH (a) WHERE a.workspace=:workspace columns(a.name as id))
1449
- UNION
1450
- select 'edge' as type, TO_CHAR(id) id FROM GRAPH_TABLE (lightrag_graph
1451
- MATCH (a)-[e]->(b) WHERE e.workspace=:workspace columns(e.id))
1452
- )""",
1453
- # SQL for deletion
1454
- "delete_vectors": "DELETE FROM LIGHTRAG_DOC_CHUNKS WHERE workspace=:workspace AND id IN ({ids})",
1455
- "delete_entity": "DELETE FROM LIGHTRAG_GRAPH_NODES WHERE workspace=:workspace AND name=:entity_name",
1456
- "delete_entity_relations": "DELETE FROM LIGHTRAG_GRAPH_EDGES WHERE workspace=:workspace AND (source_name=:entity_name OR target_name=:entity_name)",
1457
- "delete_node": """DELETE FROM GRAPH_TABLE (lightrag_graph
1458
- MATCH (a)
1459
- WHERE a.workspace=:workspace AND a.name=:node_id
1460
- ACTION DELETE a)""",
1461
- # Drop tables
1462
- "drop_specifiy_table_workspace": "DELETE FROM {table_name} WHERE workspace=:workspace",
1463
- }