yangdx
commited on
Commit
·
62a507f
1
Parent(s):
99ce99e
Remove Oracle storage implementation
Browse files- README-zh.md +4 -4
- README.md +4 -4
- config.ini.example +0 -9
- env.example +0 -10
- examples/lightrag_api_oracle_demo.py +0 -267
- examples/lightrag_oracle_demo.py +0 -141
- lightrag/api/README-zh.md +5 -10
- lightrag/api/README.md +6 -11
- lightrag/kg/__init__.py +0 -25
- lightrag/kg/oracle_impl.py +0 -1463
README-zh.md
CHANGED
@@ -11,7 +11,6 @@
|
|
11 |
- [X] [2024.12.31]🎯📢LightRAG现在支持[通过文档ID删除](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
|
12 |
- [X] [2024.11.25]🎯📢LightRAG现在支持无缝集成[自定义知识图谱](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg),使用户能够用自己的领域专业知识增强系统。
|
13 |
- [X] [2024.11.19]🎯📢LightRAG的综合指南现已在[LearnOpenCV](https://learnopencv.com/lightrag)上发布。非常感谢博客作者。
|
14 |
-
- [X] [2024.11.12]🎯📢LightRAG现在支持[Oracle Database 23ai的所有存储类型(KV、向量和图)](https://github.com/HKUDS/LightRAG/blob/main/examples/lightrag_oracle_demo.py)。
|
15 |
- [X] [2024.11.11]🎯📢LightRAG现在支持[通过实体名称删除实体](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
|
16 |
- [X] [2024.11.09]🎯📢推出[LightRAG Gui](https://lightrag-gui.streamlit.app),允许您插入、查询、可视化和下载LightRAG知识。
|
17 |
- [X] [2024.11.04]🎯📢现在您可以[使用Neo4J进行存储](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage)。
|
@@ -1037,9 +1036,10 @@ rag.clear_cache(modes=["local"])
|
|
1037 |
| **参数** | **类型** | **说明** | **默认值** |
|
1038 |
|--------------|----------|-----------------|-------------|
|
1039 |
| **working_dir** | `str` | 存储缓存的目录 | `lightrag_cache+timestamp` |
|
1040 |
-
| **kv_storage** | `str` |
|
1041 |
-
| **vector_storage** | `str` |
|
1042 |
-
| **graph_storage** | `str` |
|
|
|
1043 |
| **chunk_token_size** | `int` | 拆分文档时每个块的最大令牌大小 | `1200` |
|
1044 |
| **chunk_overlap_token_size** | `int` | 拆分文档时两个块之间的重叠令牌大小 | `100` |
|
1045 |
| **tiktoken_model_name** | `str` | 用于计算令牌数的Tiktoken编码器的模型名称 | `gpt-4o-mini` |
|
|
|
11 |
- [X] [2024.12.31]🎯📢LightRAG现在支持[通过文档ID删除](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
|
12 |
- [X] [2024.11.25]🎯📢LightRAG现在支持无缝集成[自定义知识图谱](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg),使用户能够用自己的领域专业知识增强系统。
|
13 |
- [X] [2024.11.19]🎯📢LightRAG的综合指南现已在[LearnOpenCV](https://learnopencv.com/lightrag)上发布。非常感谢博客作者。
|
|
|
14 |
- [X] [2024.11.11]🎯📢LightRAG现在支持[通过实体名称删除实体](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
|
15 |
- [X] [2024.11.09]🎯📢推出[LightRAG Gui](https://lightrag-gui.streamlit.app),允许您插入、查询、可视化和下载LightRAG知识。
|
16 |
- [X] [2024.11.04]🎯📢现在您可以[使用Neo4J进行存储](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage)。
|
|
|
1036 |
| **参数** | **类型** | **说明** | **默认值** |
|
1037 |
|--------------|----------|-----------------|-------------|
|
1038 |
| **working_dir** | `str` | 存储缓存的目录 | `lightrag_cache+timestamp` |
|
1039 |
+
| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage`,`TiDBKVStorage` | `JsonKVStorage` |
|
1040 |
+
| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`TiDBVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
|
1041 |
+
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage`,`GremlinStorage` | `NetworkXStorage` |
|
1042 |
+
| **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
|
1043 |
| **chunk_token_size** | `int` | 拆分文档时每个块的最大令牌大小 | `1200` |
|
1044 |
| **chunk_overlap_token_size** | `int` | 拆分文档时两个块之间的重叠令牌大小 | `100` |
|
1045 |
| **tiktoken_model_name** | `str` | 用于计算令牌数的Tiktoken编码器的模型名称 | `gpt-4o-mini` |
|
README.md
CHANGED
@@ -41,7 +41,6 @@
|
|
41 |
- [X] [2024.12.31]🎯📢LightRAG now supports [deletion by document ID](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
|
42 |
- [X] [2024.11.25]🎯📢LightRAG now supports seamless integration of [custom knowledge graphs](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg), empowering users to enhance the system with their own domain expertise.
|
43 |
- [X] [2024.11.19]🎯📢A comprehensive guide to LightRAG is now available on [LearnOpenCV](https://learnopencv.com/lightrag). Many thanks to the blog author.
|
44 |
-
- [X] [2024.11.12]🎯📢LightRAG now supports [Oracle Database 23ai for all storage types (KV, vector, and graph)](https://github.com/HKUDS/LightRAG/blob/main/examples/lightrag_oracle_demo.py).
|
45 |
- [X] [2024.11.11]🎯📢LightRAG now supports [deleting entities by their names](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
|
46 |
- [X] [2024.11.09]🎯📢Introducing the [LightRAG Gui](https://lightrag-gui.streamlit.app), which allows you to insert, query, visualize, and download LightRAG knowledge.
|
47 |
- [X] [2024.11.04]🎯📢You can now [use Neo4J for Storage](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage).
|
@@ -1065,9 +1064,10 @@ Valid modes are:
|
|
1065 |
| **Parameter** | **Type** | **Explanation** | **Default** |
|
1066 |
|--------------|----------|-----------------|-------------|
|
1067 |
| **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
1068 |
-
| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage
|
1069 |
-
| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage
|
1070 |
-
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage
|
|
|
1071 |
| **chunk_token_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
|
1072 |
| **chunk_overlap_token_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
|
1073 |
| **tiktoken_model_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
|
|
|
41 |
- [X] [2024.12.31]🎯📢LightRAG now supports [deletion by document ID](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
|
42 |
- [X] [2024.11.25]🎯📢LightRAG now supports seamless integration of [custom knowledge graphs](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg), empowering users to enhance the system with their own domain expertise.
|
43 |
- [X] [2024.11.19]🎯📢A comprehensive guide to LightRAG is now available on [LearnOpenCV](https://learnopencv.com/lightrag). Many thanks to the blog author.
|
|
|
44 |
- [X] [2024.11.11]🎯📢LightRAG now supports [deleting entities by their names](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
|
45 |
- [X] [2024.11.09]🎯📢Introducing the [LightRAG Gui](https://lightrag-gui.streamlit.app), which allows you to insert, query, visualize, and download LightRAG knowledge.
|
46 |
- [X] [2024.11.04]🎯📢You can now [use Neo4J for Storage](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage).
|
|
|
1064 |
| **Parameter** | **Type** | **Explanation** | **Default** |
|
1065 |
|--------------|----------|-----------------|-------------|
|
1066 |
| **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
1067 |
+
| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage`,`TiDBKVStorage` | `JsonKVStorage` |
|
1068 |
+
| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`TiDBVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
|
1069 |
+
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage`,`GremlinStorage` | `NetworkXStorage` |
|
1070 |
+
| **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
|
1071 |
| **chunk_token_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
|
1072 |
| **chunk_overlap_token_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
|
1073 |
| **tiktoken_model_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
|
config.ini.example
CHANGED
@@ -13,15 +13,6 @@ uri=redis://localhost:6379/1
|
|
13 |
[qdrant]
|
14 |
uri = http://localhost:16333
|
15 |
|
16 |
-
[oracle]
|
17 |
-
dsn = localhost:1521/XEPDB1
|
18 |
-
user = your_username
|
19 |
-
password = your_password
|
20 |
-
config_dir = /path/to/oracle/config
|
21 |
-
wallet_location = /path/to/wallet # 可选
|
22 |
-
wallet_password = your_wallet_password # 可选
|
23 |
-
workspace = default # 可选,默认为default
|
24 |
-
|
25 |
[tidb]
|
26 |
host = localhost
|
27 |
port = 4000
|
|
|
13 |
[qdrant]
|
14 |
uri = http://localhost:16333
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
[tidb]
|
17 |
host = localhost
|
18 |
port = 4000
|
env.example
CHANGED
@@ -109,16 +109,6 @@ LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage
|
|
109 |
LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
|
110 |
LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
|
111 |
|
112 |
-
### Oracle Database Configuration
|
113 |
-
ORACLE_DSN=localhost:1521/XEPDB1
|
114 |
-
ORACLE_USER=your_username
|
115 |
-
ORACLE_PASSWORD='your_password'
|
116 |
-
ORACLE_CONFIG_DIR=/path/to/oracle/config
|
117 |
-
#ORACLE_WALLET_LOCATION=/path/to/wallet
|
118 |
-
#ORACLE_WALLET_PASSWORD='your_password'
|
119 |
-
### separating all data from difference Lightrag instances(deprecating)
|
120 |
-
#ORACLE_WORKSPACE=default
|
121 |
-
|
122 |
### TiDB Configuration
|
123 |
TIDB_HOST=localhost
|
124 |
TIDB_PORT=4000
|
|
|
109 |
LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
|
110 |
LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
|
111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
### TiDB Configuration
|
113 |
TIDB_HOST=localhost
|
114 |
TIDB_PORT=4000
|
examples/lightrag_api_oracle_demo.py
DELETED
@@ -1,267 +0,0 @@
|
|
1 |
-
from fastapi import FastAPI, HTTPException, File, UploadFile
|
2 |
-
from fastapi import Query
|
3 |
-
from contextlib import asynccontextmanager
|
4 |
-
from pydantic import BaseModel
|
5 |
-
from typing import Optional, Any
|
6 |
-
|
7 |
-
import sys
|
8 |
-
import os
|
9 |
-
|
10 |
-
|
11 |
-
from pathlib import Path
|
12 |
-
|
13 |
-
import asyncio
|
14 |
-
import nest_asyncio
|
15 |
-
from lightrag import LightRAG, QueryParam
|
16 |
-
from lightrag.llm.openai import openai_complete_if_cache, openai_embed
|
17 |
-
from lightrag.utils import EmbeddingFunc
|
18 |
-
import numpy as np
|
19 |
-
from lightrag.kg.shared_storage import initialize_pipeline_status
|
20 |
-
|
21 |
-
|
22 |
-
print(os.getcwd())
|
23 |
-
script_directory = Path(__file__).resolve().parent.parent
|
24 |
-
sys.path.append(os.path.abspath(script_directory))
|
25 |
-
|
26 |
-
|
27 |
-
# Apply nest_asyncio to solve event loop issues
|
28 |
-
nest_asyncio.apply()
|
29 |
-
|
30 |
-
DEFAULT_RAG_DIR = "index_default"
|
31 |
-
|
32 |
-
|
33 |
-
# We use OpenAI compatible API to call LLM on Oracle Cloud
|
34 |
-
# More docs here https://github.com/jin38324/OCI_GenAI_access_gateway
|
35 |
-
BASE_URL = "http://xxx.xxx.xxx.xxx:8088/v1/"
|
36 |
-
APIKEY = "ocigenerativeai"
|
37 |
-
|
38 |
-
# Configure working directory
|
39 |
-
WORKING_DIR = os.environ.get("RAG_DIR", f"{DEFAULT_RAG_DIR}")
|
40 |
-
print(f"WORKING_DIR: {WORKING_DIR}")
|
41 |
-
LLM_MODEL = os.environ.get("LLM_MODEL", "cohere.command-r-plus-08-2024")
|
42 |
-
print(f"LLM_MODEL: {LLM_MODEL}")
|
43 |
-
EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "cohere.embed-multilingual-v3.0")
|
44 |
-
print(f"EMBEDDING_MODEL: {EMBEDDING_MODEL}")
|
45 |
-
EMBEDDING_MAX_TOKEN_SIZE = int(os.environ.get("EMBEDDING_MAX_TOKEN_SIZE", 512))
|
46 |
-
print(f"EMBEDDING_MAX_TOKEN_SIZE: {EMBEDDING_MAX_TOKEN_SIZE}")
|
47 |
-
|
48 |
-
if not os.path.exists(WORKING_DIR):
|
49 |
-
os.mkdir(WORKING_DIR)
|
50 |
-
|
51 |
-
os.environ["ORACLE_USER"] = ""
|
52 |
-
os.environ["ORACLE_PASSWORD"] = ""
|
53 |
-
os.environ["ORACLE_DSN"] = ""
|
54 |
-
os.environ["ORACLE_CONFIG_DIR"] = "path_to_config_dir"
|
55 |
-
os.environ["ORACLE_WALLET_LOCATION"] = "path_to_wallet_location"
|
56 |
-
os.environ["ORACLE_WALLET_PASSWORD"] = "wallet_password"
|
57 |
-
os.environ["ORACLE_WORKSPACE"] = "company"
|
58 |
-
|
59 |
-
|
60 |
-
async def llm_model_func(
|
61 |
-
prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
|
62 |
-
) -> str:
|
63 |
-
return await openai_complete_if_cache(
|
64 |
-
LLM_MODEL,
|
65 |
-
prompt,
|
66 |
-
system_prompt=system_prompt,
|
67 |
-
history_messages=history_messages,
|
68 |
-
api_key=APIKEY,
|
69 |
-
base_url=BASE_URL,
|
70 |
-
**kwargs,
|
71 |
-
)
|
72 |
-
|
73 |
-
|
74 |
-
async def embedding_func(texts: list[str]) -> np.ndarray:
|
75 |
-
return await openai_embed(
|
76 |
-
texts,
|
77 |
-
model=EMBEDDING_MODEL,
|
78 |
-
api_key=APIKEY,
|
79 |
-
base_url=BASE_URL,
|
80 |
-
)
|
81 |
-
|
82 |
-
|
83 |
-
async def get_embedding_dim():
|
84 |
-
test_text = ["This is a test sentence."]
|
85 |
-
embedding = await embedding_func(test_text)
|
86 |
-
embedding_dim = embedding.shape[1]
|
87 |
-
return embedding_dim
|
88 |
-
|
89 |
-
|
90 |
-
async def init():
|
91 |
-
# Detect embedding dimension
|
92 |
-
embedding_dimension = await get_embedding_dim()
|
93 |
-
print(f"Detected embedding dimension: {embedding_dimension}")
|
94 |
-
# Create Oracle DB connection
|
95 |
-
# The `config` parameter is the connection configuration of Oracle DB
|
96 |
-
# More docs here https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html
|
97 |
-
# We storage data in unified tables, so we need to set a `workspace` parameter to specify which docs we want to store and query
|
98 |
-
# Below is an example of how to connect to Oracle Autonomous Database on Oracle Cloud
|
99 |
-
|
100 |
-
# Initialize LightRAG
|
101 |
-
# We use Oracle DB as the KV/vector/graph storage
|
102 |
-
rag = LightRAG(
|
103 |
-
enable_llm_cache=False,
|
104 |
-
working_dir=WORKING_DIR,
|
105 |
-
chunk_token_size=512,
|
106 |
-
llm_model_func=llm_model_func,
|
107 |
-
embedding_func=EmbeddingFunc(
|
108 |
-
embedding_dim=embedding_dimension,
|
109 |
-
max_token_size=512,
|
110 |
-
func=embedding_func,
|
111 |
-
),
|
112 |
-
graph_storage="OracleGraphStorage",
|
113 |
-
kv_storage="OracleKVStorage",
|
114 |
-
vector_storage="OracleVectorDBStorage",
|
115 |
-
)
|
116 |
-
|
117 |
-
await rag.initialize_storages()
|
118 |
-
await initialize_pipeline_status()
|
119 |
-
|
120 |
-
return rag
|
121 |
-
|
122 |
-
|
123 |
-
# Extract and Insert into LightRAG storage
|
124 |
-
# with open("./dickens/book.txt", "r", encoding="utf-8") as f:
|
125 |
-
# await rag.ainsert(f.read())
|
126 |
-
|
127 |
-
# # Perform search in different modes
|
128 |
-
# modes = ["naive", "local", "global", "hybrid"]
|
129 |
-
# for mode in modes:
|
130 |
-
# print("="*20, mode, "="*20)
|
131 |
-
# print(await rag.aquery("这篇文档是关于什么内容的?", param=QueryParam(mode=mode)))
|
132 |
-
# print("-"*100, "\n")
|
133 |
-
|
134 |
-
# Data models
|
135 |
-
|
136 |
-
|
137 |
-
class QueryRequest(BaseModel):
|
138 |
-
query: str
|
139 |
-
mode: str = "hybrid"
|
140 |
-
only_need_context: bool = False
|
141 |
-
only_need_prompt: bool = False
|
142 |
-
|
143 |
-
|
144 |
-
class DataRequest(BaseModel):
|
145 |
-
limit: int = 100
|
146 |
-
|
147 |
-
|
148 |
-
class InsertRequest(BaseModel):
|
149 |
-
text: str
|
150 |
-
|
151 |
-
|
152 |
-
class Response(BaseModel):
|
153 |
-
status: str
|
154 |
-
data: Optional[Any] = None
|
155 |
-
message: Optional[str] = None
|
156 |
-
|
157 |
-
|
158 |
-
# API routes
|
159 |
-
|
160 |
-
rag = None
|
161 |
-
|
162 |
-
|
163 |
-
@asynccontextmanager
|
164 |
-
async def lifespan(app: FastAPI):
|
165 |
-
global rag
|
166 |
-
rag = await init()
|
167 |
-
print("done!")
|
168 |
-
yield
|
169 |
-
|
170 |
-
|
171 |
-
app = FastAPI(
|
172 |
-
title="LightRAG API", description="API for RAG operations", lifespan=lifespan
|
173 |
-
)
|
174 |
-
|
175 |
-
|
176 |
-
@app.post("/query", response_model=Response)
|
177 |
-
async def query_endpoint(request: QueryRequest):
|
178 |
-
# try:
|
179 |
-
# loop = asyncio.get_event_loop()
|
180 |
-
if request.mode == "naive":
|
181 |
-
top_k = 3
|
182 |
-
else:
|
183 |
-
top_k = 60
|
184 |
-
result = await rag.aquery(
|
185 |
-
request.query,
|
186 |
-
param=QueryParam(
|
187 |
-
mode=request.mode,
|
188 |
-
only_need_context=request.only_need_context,
|
189 |
-
only_need_prompt=request.only_need_prompt,
|
190 |
-
top_k=top_k,
|
191 |
-
),
|
192 |
-
)
|
193 |
-
return Response(status="success", data=result)
|
194 |
-
# except Exception as e:
|
195 |
-
# raise HTTPException(status_code=500, detail=str(e))
|
196 |
-
|
197 |
-
|
198 |
-
@app.get("/data", response_model=Response)
|
199 |
-
async def query_all_nodes(type: str = Query("nodes"), limit: int = Query(100)):
|
200 |
-
if type == "nodes":
|
201 |
-
result = await rag.chunk_entity_relation_graph.get_all_nodes(limit=limit)
|
202 |
-
elif type == "edges":
|
203 |
-
result = await rag.chunk_entity_relation_graph.get_all_edges(limit=limit)
|
204 |
-
elif type == "statistics":
|
205 |
-
result = await rag.chunk_entity_relation_graph.get_statistics()
|
206 |
-
return Response(status="success", data=result)
|
207 |
-
|
208 |
-
|
209 |
-
@app.post("/insert", response_model=Response)
|
210 |
-
async def insert_endpoint(request: InsertRequest):
|
211 |
-
try:
|
212 |
-
loop = asyncio.get_event_loop()
|
213 |
-
await loop.run_in_executor(None, lambda: rag.insert(request.text))
|
214 |
-
return Response(status="success", message="Text inserted successfully")
|
215 |
-
except Exception as e:
|
216 |
-
raise HTTPException(status_code=500, detail=str(e))
|
217 |
-
|
218 |
-
|
219 |
-
@app.post("/insert_file", response_model=Response)
|
220 |
-
async def insert_file(file: UploadFile = File(...)):
|
221 |
-
try:
|
222 |
-
file_content = await file.read()
|
223 |
-
# Read file content
|
224 |
-
try:
|
225 |
-
content = file_content.decode("utf-8")
|
226 |
-
except UnicodeDecodeError:
|
227 |
-
# If UTF-8 decoding fails, try other encodings
|
228 |
-
content = file_content.decode("gbk")
|
229 |
-
# Insert file content
|
230 |
-
loop = asyncio.get_event_loop()
|
231 |
-
await loop.run_in_executor(None, lambda: rag.insert(content))
|
232 |
-
|
233 |
-
return Response(
|
234 |
-
status="success",
|
235 |
-
message=f"File content from {file.filename} inserted successfully",
|
236 |
-
)
|
237 |
-
except Exception as e:
|
238 |
-
raise HTTPException(status_code=500, detail=str(e))
|
239 |
-
|
240 |
-
|
241 |
-
@app.get("/health")
|
242 |
-
async def health_check():
|
243 |
-
return {"status": "healthy"}
|
244 |
-
|
245 |
-
|
246 |
-
if __name__ == "__main__":
|
247 |
-
import uvicorn
|
248 |
-
|
249 |
-
uvicorn.run(app, host="127.0.0.1", port=8020)
|
250 |
-
|
251 |
-
# Usage example
|
252 |
-
# To run the server, use the following command in your terminal:
|
253 |
-
# python lightrag_api_openai_compatible_demo.py
|
254 |
-
|
255 |
-
# Example requests:
|
256 |
-
# 1. Query:
|
257 |
-
# curl -X POST "http://127.0.0.1:8020/query" -H "Content-Type: application/json" -d '{"query": "your query here", "mode": "hybrid"}'
|
258 |
-
|
259 |
-
# 2. Insert text:
|
260 |
-
# curl -X POST "http://127.0.0.1:8020/insert" -H "Content-Type: application/json" -d '{"text": "your text here"}'
|
261 |
-
|
262 |
-
# 3. Insert file:
|
263 |
-
# curl -X POST "http://127.0.0.1:8020/insert_file" -H "Content-Type: multipart/form-data" -F "file=@path/to/your/file.txt"
|
264 |
-
|
265 |
-
|
266 |
-
# 4. Health check:
|
267 |
-
# curl -X GET "http://127.0.0.1:8020/health"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
examples/lightrag_oracle_demo.py
DELETED
@@ -1,141 +0,0 @@
|
|
1 |
-
import sys
|
2 |
-
import os
|
3 |
-
from pathlib import Path
|
4 |
-
import asyncio
|
5 |
-
from lightrag import LightRAG, QueryParam
|
6 |
-
from lightrag.llm.openai import openai_complete_if_cache, openai_embed
|
7 |
-
from lightrag.utils import EmbeddingFunc
|
8 |
-
import numpy as np
|
9 |
-
from lightrag.kg.shared_storage import initialize_pipeline_status
|
10 |
-
|
11 |
-
print(os.getcwd())
|
12 |
-
script_directory = Path(__file__).resolve().parent.parent
|
13 |
-
sys.path.append(os.path.abspath(script_directory))
|
14 |
-
|
15 |
-
WORKING_DIR = "./dickens"
|
16 |
-
|
17 |
-
# We use OpenAI compatible API to call LLM on Oracle Cloud
|
18 |
-
# More docs here https://github.com/jin38324/OCI_GenAI_access_gateway
|
19 |
-
BASE_URL = "http://xxx.xxx.xxx.xxx:8088/v1/"
|
20 |
-
APIKEY = "ocigenerativeai"
|
21 |
-
CHATMODEL = "cohere.command-r-plus"
|
22 |
-
EMBEDMODEL = "cohere.embed-multilingual-v3.0"
|
23 |
-
CHUNK_TOKEN_SIZE = 1024
|
24 |
-
MAX_TOKENS = 4000
|
25 |
-
|
26 |
-
if not os.path.exists(WORKING_DIR):
|
27 |
-
os.mkdir(WORKING_DIR)
|
28 |
-
|
29 |
-
os.environ["ORACLE_USER"] = "username"
|
30 |
-
os.environ["ORACLE_PASSWORD"] = "xxxxxxxxx"
|
31 |
-
os.environ["ORACLE_DSN"] = "xxxxxxx_medium"
|
32 |
-
os.environ["ORACLE_CONFIG_DIR"] = "path_to_config_dir"
|
33 |
-
os.environ["ORACLE_WALLET_LOCATION"] = "path_to_wallet_location"
|
34 |
-
os.environ["ORACLE_WALLET_PASSWORD"] = "wallet_password"
|
35 |
-
os.environ["ORACLE_WORKSPACE"] = "company"
|
36 |
-
|
37 |
-
|
38 |
-
async def llm_model_func(
|
39 |
-
prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
|
40 |
-
) -> str:
|
41 |
-
return await openai_complete_if_cache(
|
42 |
-
CHATMODEL,
|
43 |
-
prompt,
|
44 |
-
system_prompt=system_prompt,
|
45 |
-
history_messages=history_messages,
|
46 |
-
api_key=APIKEY,
|
47 |
-
base_url=BASE_URL,
|
48 |
-
**kwargs,
|
49 |
-
)
|
50 |
-
|
51 |
-
|
52 |
-
async def embedding_func(texts: list[str]) -> np.ndarray:
|
53 |
-
return await openai_embed(
|
54 |
-
texts,
|
55 |
-
model=EMBEDMODEL,
|
56 |
-
api_key=APIKEY,
|
57 |
-
base_url=BASE_URL,
|
58 |
-
)
|
59 |
-
|
60 |
-
|
61 |
-
async def get_embedding_dim():
|
62 |
-
test_text = ["This is a test sentence."]
|
63 |
-
embedding = await embedding_func(test_text)
|
64 |
-
embedding_dim = embedding.shape[1]
|
65 |
-
return embedding_dim
|
66 |
-
|
67 |
-
|
68 |
-
async def initialize_rag():
|
69 |
-
# Detect embedding dimension
|
70 |
-
embedding_dimension = await get_embedding_dim()
|
71 |
-
print(f"Detected embedding dimension: {embedding_dimension}")
|
72 |
-
|
73 |
-
# Initialize LightRAG
|
74 |
-
# We use Oracle DB as the KV/vector/graph storage
|
75 |
-
# You can add `addon_params={"example_number": 1, "language": "Simplfied Chinese"}` to control the prompt
|
76 |
-
rag = LightRAG(
|
77 |
-
# log_level="DEBUG",
|
78 |
-
working_dir=WORKING_DIR,
|
79 |
-
entity_extract_max_gleaning=1,
|
80 |
-
enable_llm_cache=True,
|
81 |
-
enable_llm_cache_for_entity_extract=True,
|
82 |
-
embedding_cache_config=None, # {"enabled": True,"similarity_threshold": 0.90},
|
83 |
-
chunk_token_size=CHUNK_TOKEN_SIZE,
|
84 |
-
llm_model_max_token_size=MAX_TOKENS,
|
85 |
-
llm_model_func=llm_model_func,
|
86 |
-
embedding_func=EmbeddingFunc(
|
87 |
-
embedding_dim=embedding_dimension,
|
88 |
-
max_token_size=500,
|
89 |
-
func=embedding_func,
|
90 |
-
),
|
91 |
-
graph_storage="OracleGraphStorage",
|
92 |
-
kv_storage="OracleKVStorage",
|
93 |
-
vector_storage="OracleVectorDBStorage",
|
94 |
-
addon_params={
|
95 |
-
"example_number": 1,
|
96 |
-
"language": "Simplfied Chinese",
|
97 |
-
"entity_types": ["organization", "person", "geo", "event"],
|
98 |
-
"insert_batch_size": 2,
|
99 |
-
},
|
100 |
-
)
|
101 |
-
await rag.initialize_storages()
|
102 |
-
await initialize_pipeline_status()
|
103 |
-
|
104 |
-
return rag
|
105 |
-
|
106 |
-
|
107 |
-
async def main():
|
108 |
-
try:
|
109 |
-
# Initialize RAG instance
|
110 |
-
rag = await initialize_rag()
|
111 |
-
|
112 |
-
# Extract and Insert into LightRAG storage
|
113 |
-
with open(WORKING_DIR + "/docs.txt", "r", encoding="utf-8") as f:
|
114 |
-
all_text = f.read()
|
115 |
-
texts = [x for x in all_text.split("\n") if x]
|
116 |
-
|
117 |
-
# New mode use pipeline
|
118 |
-
await rag.apipeline_enqueue_documents(texts)
|
119 |
-
await rag.apipeline_process_enqueue_documents()
|
120 |
-
|
121 |
-
# Old method use ainsert
|
122 |
-
# await rag.ainsert(texts)
|
123 |
-
|
124 |
-
# Perform search in different modes
|
125 |
-
modes = ["naive", "local", "global", "hybrid"]
|
126 |
-
for mode in modes:
|
127 |
-
print("=" * 20, mode, "=" * 20)
|
128 |
-
print(
|
129 |
-
await rag.aquery(
|
130 |
-
"What are the top themes in this story?",
|
131 |
-
param=QueryParam(mode=mode),
|
132 |
-
)
|
133 |
-
)
|
134 |
-
print("-" * 100, "\n")
|
135 |
-
|
136 |
-
except Exception as e:
|
137 |
-
print(f"An error occurred: {e}")
|
138 |
-
|
139 |
-
|
140 |
-
if __name__ == "__main__":
|
141 |
-
asyncio.run(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lightrag/api/README-zh.md
CHANGED
@@ -291,11 +291,10 @@ LightRAG 使用 4 种类型的存储用于不同目的:
|
|
291 |
|
292 |
```
|
293 |
JsonKVStorage JsonFile(默认)
|
294 |
-
|
295 |
RedisKVStorage Redis
|
|
|
296 |
TiDBKVStorage TiDB
|
297 |
-
PGKVStorage Postgres
|
298 |
-
OracleKVStorage Oracle
|
299 |
```
|
300 |
|
301 |
* GRAPH_STORAGE 支持的实现名称
|
@@ -303,25 +302,21 @@ OracleKVStorage Oracle
|
|
303 |
```
|
304 |
NetworkXStorage NetworkX(默认)
|
305 |
Neo4JStorage Neo4J
|
306 |
-
|
307 |
-
TiDBGraphStorage TiDB
|
308 |
AGEStorage AGE
|
309 |
GremlinStorage Gremlin
|
310 |
-
PGGraphStorage Postgres
|
311 |
-
OracleGraphStorage Postgres
|
312 |
```
|
313 |
|
314 |
* VECTOR_STORAGE 支持的实现名称
|
315 |
|
316 |
```
|
317 |
NanoVectorDBStorage NanoVector(默认)
|
|
|
318 |
MilvusVectorDBStorge Milvus
|
319 |
ChromaVectorDBStorage Chroma
|
320 |
-
TiDBVectorDBStorage TiDB
|
321 |
-
PGVectorStorage Postgres
|
322 |
FaissVectorDBStorage Faiss
|
323 |
QdrantVectorDBStorage Qdrant
|
324 |
-
|
325 |
MongoVectorDBStorage MongoDB
|
326 |
```
|
327 |
|
|
|
291 |
|
292 |
```
|
293 |
JsonKVStorage JsonFile(默认)
|
294 |
+
PGKVStorage Postgres
|
295 |
RedisKVStorage Redis
|
296 |
+
MongoKVStorage MogonDB
|
297 |
TiDBKVStorage TiDB
|
|
|
|
|
298 |
```
|
299 |
|
300 |
* GRAPH_STORAGE 支持的实现名称
|
|
|
302 |
```
|
303 |
NetworkXStorage NetworkX(默认)
|
304 |
Neo4JStorage Neo4J
|
305 |
+
PGGraphStorage Postgres
|
|
|
306 |
AGEStorage AGE
|
307 |
GremlinStorage Gremlin
|
|
|
|
|
308 |
```
|
309 |
|
310 |
* VECTOR_STORAGE 支持的实现名称
|
311 |
|
312 |
```
|
313 |
NanoVectorDBStorage NanoVector(默认)
|
314 |
+
PGVectorStorage Postgres
|
315 |
MilvusVectorDBStorge Milvus
|
316 |
ChromaVectorDBStorage Chroma
|
|
|
|
|
317 |
FaissVectorDBStorage Faiss
|
318 |
QdrantVectorDBStorage Qdrant
|
319 |
+
TiDBVectorDBStorage TiDB
|
320 |
MongoVectorDBStorage MongoDB
|
321 |
```
|
322 |
|
lightrag/api/README.md
CHANGED
@@ -302,11 +302,10 @@ Each storage type have servals implementations:
|
|
302 |
|
303 |
```
|
304 |
JsonKVStorage JsonFile(default)
|
305 |
-
|
306 |
RedisKVStorage Redis
|
|
|
307 |
TiDBKVStorage TiDB
|
308 |
-
PGKVStorage Postgres
|
309 |
-
OracleKVStorage Oracle
|
310 |
```
|
311 |
|
312 |
* GRAPH_STORAGE supported implement-name
|
@@ -314,25 +313,21 @@ OracleKVStorage Oracle
|
|
314 |
```
|
315 |
NetworkXStorage NetworkX(defualt)
|
316 |
Neo4JStorage Neo4J
|
317 |
-
|
318 |
-
TiDBGraphStorage TiDB
|
319 |
AGEStorage AGE
|
320 |
GremlinStorage Gremlin
|
321 |
-
PGGraphStorage Postgres
|
322 |
-
OracleGraphStorage Postgres
|
323 |
```
|
324 |
|
325 |
* VECTOR_STORAGE supported implement-name
|
326 |
|
327 |
```
|
328 |
NanoVectorDBStorage NanoVector(default)
|
329 |
-
MilvusVectorDBStorage Milvus
|
330 |
-
ChromaVectorDBStorage Chroma
|
331 |
-
TiDBVectorDBStorage TiDB
|
332 |
PGVectorStorage Postgres
|
|
|
|
|
333 |
FaissVectorDBStorage Faiss
|
334 |
QdrantVectorDBStorage Qdrant
|
335 |
-
|
336 |
MongoVectorDBStorage MongoDB
|
337 |
```
|
338 |
|
|
|
302 |
|
303 |
```
|
304 |
JsonKVStorage JsonFile(default)
|
305 |
+
PGKVStorage Postgres
|
306 |
RedisKVStorage Redis
|
307 |
+
MongoKVStorage MogonDB
|
308 |
TiDBKVStorage TiDB
|
|
|
|
|
309 |
```
|
310 |
|
311 |
* GRAPH_STORAGE supported implement-name
|
|
|
313 |
```
|
314 |
NetworkXStorage NetworkX(defualt)
|
315 |
Neo4JStorage Neo4J
|
316 |
+
PGGraphStorage Postgres
|
|
|
317 |
AGEStorage AGE
|
318 |
GremlinStorage Gremlin
|
|
|
|
|
319 |
```
|
320 |
|
321 |
* VECTOR_STORAGE supported implement-name
|
322 |
|
323 |
```
|
324 |
NanoVectorDBStorage NanoVector(default)
|
|
|
|
|
|
|
325 |
PGVectorStorage Postgres
|
326 |
+
MilvusVectorDBStorge Milvus
|
327 |
+
ChromaVectorDBStorage Chroma
|
328 |
FaissVectorDBStorage Faiss
|
329 |
QdrantVectorDBStorage Qdrant
|
330 |
+
TiDBVectorDBStorage TiDB
|
331 |
MongoVectorDBStorage MongoDB
|
332 |
```
|
333 |
|
lightrag/kg/__init__.py
CHANGED
@@ -6,7 +6,6 @@ STORAGE_IMPLEMENTATIONS = {
|
|
6 |
"RedisKVStorage",
|
7 |
"TiDBKVStorage",
|
8 |
"PGKVStorage",
|
9 |
-
"OracleKVStorage",
|
10 |
],
|
11 |
"required_methods": ["get_by_id", "upsert"],
|
12 |
},
|
@@ -19,7 +18,6 @@ STORAGE_IMPLEMENTATIONS = {
|
|
19 |
"AGEStorage",
|
20 |
"GremlinStorage",
|
21 |
"PGGraphStorage",
|
22 |
-
# "OracleGraphStorage",
|
23 |
],
|
24 |
"required_methods": ["upsert_node", "upsert_edge"],
|
25 |
},
|
@@ -32,7 +30,6 @@ STORAGE_IMPLEMENTATIONS = {
|
|
32 |
"PGVectorStorage",
|
33 |
"FaissVectorDBStorage",
|
34 |
"QdrantVectorDBStorage",
|
35 |
-
"OracleVectorDBStorage",
|
36 |
"MongoVectorDBStorage",
|
37 |
],
|
38 |
"required_methods": ["query", "upsert"],
|
@@ -41,7 +38,6 @@ STORAGE_IMPLEMENTATIONS = {
|
|
41 |
"implementations": [
|
42 |
"JsonDocStatusStorage",
|
43 |
"PGDocStatusStorage",
|
44 |
-
"PGDocStatusStorage",
|
45 |
"MongoDocStatusStorage",
|
46 |
],
|
47 |
"required_methods": ["get_docs_by_status"],
|
@@ -56,12 +52,6 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
|
|
56 |
"RedisKVStorage": ["REDIS_URI"],
|
57 |
"TiDBKVStorage": ["TIDB_USER", "TIDB_PASSWORD", "TIDB_DATABASE"],
|
58 |
"PGKVStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
|
59 |
-
"OracleKVStorage": [
|
60 |
-
"ORACLE_DSN",
|
61 |
-
"ORACLE_USER",
|
62 |
-
"ORACLE_PASSWORD",
|
63 |
-
"ORACLE_CONFIG_DIR",
|
64 |
-
],
|
65 |
# Graph Storage Implementations
|
66 |
"NetworkXStorage": [],
|
67 |
"Neo4JStorage": ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"],
|
@@ -78,12 +68,6 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
|
|
78 |
"POSTGRES_PASSWORD",
|
79 |
"POSTGRES_DATABASE",
|
80 |
],
|
81 |
-
"OracleGraphStorage": [
|
82 |
-
"ORACLE_DSN",
|
83 |
-
"ORACLE_USER",
|
84 |
-
"ORACLE_PASSWORD",
|
85 |
-
"ORACLE_CONFIG_DIR",
|
86 |
-
],
|
87 |
# Vector Storage Implementations
|
88 |
"NanoVectorDBStorage": [],
|
89 |
"MilvusVectorDBStorage": [],
|
@@ -92,12 +76,6 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
|
|
92 |
"PGVectorStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
|
93 |
"FaissVectorDBStorage": [],
|
94 |
"QdrantVectorDBStorage": ["QDRANT_URL"], # QDRANT_API_KEY has default value None
|
95 |
-
"OracleVectorDBStorage": [
|
96 |
-
"ORACLE_DSN",
|
97 |
-
"ORACLE_USER",
|
98 |
-
"ORACLE_PASSWORD",
|
99 |
-
"ORACLE_CONFIG_DIR",
|
100 |
-
],
|
101 |
"MongoVectorDBStorage": [],
|
102 |
# Document Status Storage Implementations
|
103 |
"JsonDocStatusStorage": [],
|
@@ -112,9 +90,6 @@ STORAGES = {
|
|
112 |
"NanoVectorDBStorage": ".kg.nano_vector_db_impl",
|
113 |
"JsonDocStatusStorage": ".kg.json_doc_status_impl",
|
114 |
"Neo4JStorage": ".kg.neo4j_impl",
|
115 |
-
"OracleKVStorage": ".kg.oracle_impl",
|
116 |
-
"OracleGraphStorage": ".kg.oracle_impl",
|
117 |
-
"OracleVectorDBStorage": ".kg.oracle_impl",
|
118 |
"MilvusVectorDBStorage": ".kg.milvus_impl",
|
119 |
"MongoKVStorage": ".kg.mongo_impl",
|
120 |
"MongoDocStatusStorage": ".kg.mongo_impl",
|
|
|
6 |
"RedisKVStorage",
|
7 |
"TiDBKVStorage",
|
8 |
"PGKVStorage",
|
|
|
9 |
],
|
10 |
"required_methods": ["get_by_id", "upsert"],
|
11 |
},
|
|
|
18 |
"AGEStorage",
|
19 |
"GremlinStorage",
|
20 |
"PGGraphStorage",
|
|
|
21 |
],
|
22 |
"required_methods": ["upsert_node", "upsert_edge"],
|
23 |
},
|
|
|
30 |
"PGVectorStorage",
|
31 |
"FaissVectorDBStorage",
|
32 |
"QdrantVectorDBStorage",
|
|
|
33 |
"MongoVectorDBStorage",
|
34 |
],
|
35 |
"required_methods": ["query", "upsert"],
|
|
|
38 |
"implementations": [
|
39 |
"JsonDocStatusStorage",
|
40 |
"PGDocStatusStorage",
|
|
|
41 |
"MongoDocStatusStorage",
|
42 |
],
|
43 |
"required_methods": ["get_docs_by_status"],
|
|
|
52 |
"RedisKVStorage": ["REDIS_URI"],
|
53 |
"TiDBKVStorage": ["TIDB_USER", "TIDB_PASSWORD", "TIDB_DATABASE"],
|
54 |
"PGKVStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
# Graph Storage Implementations
|
56 |
"NetworkXStorage": [],
|
57 |
"Neo4JStorage": ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"],
|
|
|
68 |
"POSTGRES_PASSWORD",
|
69 |
"POSTGRES_DATABASE",
|
70 |
],
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
# Vector Storage Implementations
|
72 |
"NanoVectorDBStorage": [],
|
73 |
"MilvusVectorDBStorage": [],
|
|
|
76 |
"PGVectorStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
|
77 |
"FaissVectorDBStorage": [],
|
78 |
"QdrantVectorDBStorage": ["QDRANT_URL"], # QDRANT_API_KEY has default value None
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
"MongoVectorDBStorage": [],
|
80 |
# Document Status Storage Implementations
|
81 |
"JsonDocStatusStorage": [],
|
|
|
90 |
"NanoVectorDBStorage": ".kg.nano_vector_db_impl",
|
91 |
"JsonDocStatusStorage": ".kg.json_doc_status_impl",
|
92 |
"Neo4JStorage": ".kg.neo4j_impl",
|
|
|
|
|
|
|
93 |
"MilvusVectorDBStorage": ".kg.milvus_impl",
|
94 |
"MongoKVStorage": ".kg.mongo_impl",
|
95 |
"MongoDocStatusStorage": ".kg.mongo_impl",
|
lightrag/kg/oracle_impl.py
DELETED
@@ -1,1463 +0,0 @@
|
|
1 |
-
import array
|
2 |
-
import asyncio
|
3 |
-
|
4 |
-
# import html
|
5 |
-
import os
|
6 |
-
from dataclasses import dataclass, field
|
7 |
-
from typing import Any, Union, final
|
8 |
-
import numpy as np
|
9 |
-
import configparser
|
10 |
-
|
11 |
-
from lightrag.types import KnowledgeGraph, KnowledgeGraphNode, KnowledgeGraphEdge
|
12 |
-
|
13 |
-
from ..base import (
|
14 |
-
BaseGraphStorage,
|
15 |
-
BaseKVStorage,
|
16 |
-
BaseVectorStorage,
|
17 |
-
)
|
18 |
-
from ..namespace import NameSpace, is_namespace
|
19 |
-
from ..utils import logger
|
20 |
-
|
21 |
-
import pipmaster as pm
|
22 |
-
|
23 |
-
if not pm.is_installed("graspologic"):
|
24 |
-
pm.install("graspologic")
|
25 |
-
|
26 |
-
if not pm.is_installed("oracledb"):
|
27 |
-
pm.install("oracledb")
|
28 |
-
|
29 |
-
from graspologic import embed
|
30 |
-
import oracledb # type: ignore
|
31 |
-
|
32 |
-
|
33 |
-
class OracleDB:
|
34 |
-
def __init__(self, config, **kwargs):
|
35 |
-
self.host = config.get("host", None)
|
36 |
-
self.port = config.get("port", None)
|
37 |
-
self.user = config.get("user", None)
|
38 |
-
self.password = config.get("password", None)
|
39 |
-
self.dsn = config.get("dsn", None)
|
40 |
-
self.config_dir = config.get("config_dir", None)
|
41 |
-
self.wallet_location = config.get("wallet_location", None)
|
42 |
-
self.wallet_password = config.get("wallet_password", None)
|
43 |
-
self.workspace = config.get("workspace", None)
|
44 |
-
self.max = 12
|
45 |
-
self.increment = 1
|
46 |
-
logger.info(f"Using the label {self.workspace} for Oracle Graph as identifier")
|
47 |
-
if self.user is None or self.password is None:
|
48 |
-
raise ValueError("Missing database user or password")
|
49 |
-
|
50 |
-
try:
|
51 |
-
oracledb.defaults.fetch_lobs = False
|
52 |
-
|
53 |
-
self.pool = oracledb.create_pool_async(
|
54 |
-
user=self.user,
|
55 |
-
password=self.password,
|
56 |
-
dsn=self.dsn,
|
57 |
-
config_dir=self.config_dir,
|
58 |
-
wallet_location=self.wallet_location,
|
59 |
-
wallet_password=self.wallet_password,
|
60 |
-
min=1,
|
61 |
-
max=self.max,
|
62 |
-
increment=self.increment,
|
63 |
-
)
|
64 |
-
logger.info(f"Connected to Oracle database at {self.dsn}")
|
65 |
-
except Exception as e:
|
66 |
-
logger.error(f"Failed to connect to Oracle database at {self.dsn}")
|
67 |
-
logger.error(f"Oracle database error: {e}")
|
68 |
-
raise
|
69 |
-
|
70 |
-
def numpy_converter_in(self, value):
|
71 |
-
"""Convert numpy array to array.array"""
|
72 |
-
if value.dtype == np.float64:
|
73 |
-
dtype = "d"
|
74 |
-
elif value.dtype == np.float32:
|
75 |
-
dtype = "f"
|
76 |
-
else:
|
77 |
-
dtype = "b"
|
78 |
-
return array.array(dtype, value)
|
79 |
-
|
80 |
-
def input_type_handler(self, cursor, value, arraysize):
|
81 |
-
"""Set the type handler for the input data"""
|
82 |
-
if isinstance(value, np.ndarray):
|
83 |
-
return cursor.var(
|
84 |
-
oracledb.DB_TYPE_VECTOR,
|
85 |
-
arraysize=arraysize,
|
86 |
-
inconverter=self.numpy_converter_in,
|
87 |
-
)
|
88 |
-
|
89 |
-
def numpy_converter_out(self, value):
|
90 |
-
"""Convert array.array to numpy array"""
|
91 |
-
if value.typecode == "b":
|
92 |
-
dtype = np.int8
|
93 |
-
elif value.typecode == "f":
|
94 |
-
dtype = np.float32
|
95 |
-
else:
|
96 |
-
dtype = np.float64
|
97 |
-
return np.array(value, copy=False, dtype=dtype)
|
98 |
-
|
99 |
-
def output_type_handler(self, cursor, metadata):
|
100 |
-
"""Set the type handler for the output data"""
|
101 |
-
if metadata.type_code is oracledb.DB_TYPE_VECTOR:
|
102 |
-
return cursor.var(
|
103 |
-
metadata.type_code,
|
104 |
-
arraysize=cursor.arraysize,
|
105 |
-
outconverter=self.numpy_converter_out,
|
106 |
-
)
|
107 |
-
|
108 |
-
async def check_tables(self):
|
109 |
-
for k, v in TABLES.items():
|
110 |
-
try:
|
111 |
-
if k.lower() == "lightrag_graph":
|
112 |
-
await self.query(
|
113 |
-
"SELECT id FROM GRAPH_TABLE (lightrag_graph MATCH (a) COLUMNS (a.id)) fetch first row only"
|
114 |
-
)
|
115 |
-
else:
|
116 |
-
await self.query(f"SELECT 1 FROM {k}")
|
117 |
-
except Exception as e:
|
118 |
-
logger.error(f"Failed to check table {k} in Oracle database")
|
119 |
-
logger.error(f"Oracle database error: {e}")
|
120 |
-
try:
|
121 |
-
# print(v["ddl"])
|
122 |
-
await self.execute(v["ddl"])
|
123 |
-
logger.info(f"Created table {k} in Oracle database")
|
124 |
-
except Exception as e:
|
125 |
-
logger.error(f"Failed to create table {k} in Oracle database")
|
126 |
-
logger.error(f"Oracle database error: {e}")
|
127 |
-
|
128 |
-
logger.info("Finished check all tables in Oracle database")
|
129 |
-
|
130 |
-
async def query(
|
131 |
-
self, sql: str, params: dict = None, multirows: bool = False
|
132 |
-
) -> Union[dict, None]:
|
133 |
-
async with self.pool.acquire() as connection:
|
134 |
-
connection.inputtypehandler = self.input_type_handler
|
135 |
-
connection.outputtypehandler = self.output_type_handler
|
136 |
-
with connection.cursor() as cursor:
|
137 |
-
try:
|
138 |
-
await cursor.execute(sql, params)
|
139 |
-
except Exception as e:
|
140 |
-
logger.error(f"Oracle database error: {e}")
|
141 |
-
raise
|
142 |
-
columns = [column[0].lower() for column in cursor.description]
|
143 |
-
if multirows:
|
144 |
-
rows = await cursor.fetchall()
|
145 |
-
if rows:
|
146 |
-
data = [dict(zip(columns, row)) for row in rows]
|
147 |
-
else:
|
148 |
-
data = []
|
149 |
-
else:
|
150 |
-
row = await cursor.fetchone()
|
151 |
-
if row:
|
152 |
-
data = dict(zip(columns, row))
|
153 |
-
else:
|
154 |
-
data = None
|
155 |
-
return data
|
156 |
-
|
157 |
-
async def execute(self, sql: str, data: Union[list, dict] = None):
|
158 |
-
# logger.info("go into OracleDB execute method")
|
159 |
-
try:
|
160 |
-
async with self.pool.acquire() as connection:
|
161 |
-
connection.inputtypehandler = self.input_type_handler
|
162 |
-
connection.outputtypehandler = self.output_type_handler
|
163 |
-
with connection.cursor() as cursor:
|
164 |
-
if data is None:
|
165 |
-
await cursor.execute(sql)
|
166 |
-
else:
|
167 |
-
await cursor.execute(sql, data)
|
168 |
-
await connection.commit()
|
169 |
-
except Exception as e:
|
170 |
-
logger.error(f"Oracle database error: {e}")
|
171 |
-
raise
|
172 |
-
|
173 |
-
|
174 |
-
class ClientManager:
|
175 |
-
_instances: dict[str, Any] = {"db": None, "ref_count": 0}
|
176 |
-
_lock = asyncio.Lock()
|
177 |
-
|
178 |
-
@staticmethod
|
179 |
-
def get_config() -> dict[str, Any]:
|
180 |
-
config = configparser.ConfigParser()
|
181 |
-
config.read("config.ini", "utf-8")
|
182 |
-
|
183 |
-
return {
|
184 |
-
"user": os.environ.get(
|
185 |
-
"ORACLE_USER",
|
186 |
-
config.get("oracle", "user", fallback=None),
|
187 |
-
),
|
188 |
-
"password": os.environ.get(
|
189 |
-
"ORACLE_PASSWORD",
|
190 |
-
config.get("oracle", "password", fallback=None),
|
191 |
-
),
|
192 |
-
"dsn": os.environ.get(
|
193 |
-
"ORACLE_DSN",
|
194 |
-
config.get("oracle", "dsn", fallback=None),
|
195 |
-
),
|
196 |
-
"config_dir": os.environ.get(
|
197 |
-
"ORACLE_CONFIG_DIR",
|
198 |
-
config.get("oracle", "config_dir", fallback=None),
|
199 |
-
),
|
200 |
-
"wallet_location": os.environ.get(
|
201 |
-
"ORACLE_WALLET_LOCATION",
|
202 |
-
config.get("oracle", "wallet_location", fallback=None),
|
203 |
-
),
|
204 |
-
"wallet_password": os.environ.get(
|
205 |
-
"ORACLE_WALLET_PASSWORD",
|
206 |
-
config.get("oracle", "wallet_password", fallback=None),
|
207 |
-
),
|
208 |
-
"workspace": os.environ.get(
|
209 |
-
"ORACLE_WORKSPACE",
|
210 |
-
config.get("oracle", "workspace", fallback="default"),
|
211 |
-
),
|
212 |
-
}
|
213 |
-
|
214 |
-
@classmethod
|
215 |
-
async def get_client(cls) -> OracleDB:
|
216 |
-
async with cls._lock:
|
217 |
-
if cls._instances["db"] is None:
|
218 |
-
config = ClientManager.get_config()
|
219 |
-
db = OracleDB(config)
|
220 |
-
await db.check_tables()
|
221 |
-
cls._instances["db"] = db
|
222 |
-
cls._instances["ref_count"] = 0
|
223 |
-
cls._instances["ref_count"] += 1
|
224 |
-
return cls._instances["db"]
|
225 |
-
|
226 |
-
@classmethod
|
227 |
-
async def release_client(cls, db: OracleDB):
|
228 |
-
async with cls._lock:
|
229 |
-
if db is not None:
|
230 |
-
if db is cls._instances["db"]:
|
231 |
-
cls._instances["ref_count"] -= 1
|
232 |
-
if cls._instances["ref_count"] == 0:
|
233 |
-
await db.pool.close()
|
234 |
-
logger.info("Closed OracleDB database connection pool")
|
235 |
-
cls._instances["db"] = None
|
236 |
-
else:
|
237 |
-
await db.pool.close()
|
238 |
-
|
239 |
-
|
240 |
-
@final
|
241 |
-
@dataclass
|
242 |
-
class OracleKVStorage(BaseKVStorage):
|
243 |
-
db: OracleDB = field(default=None)
|
244 |
-
meta_fields = None
|
245 |
-
|
246 |
-
def __post_init__(self):
|
247 |
-
self._data = {}
|
248 |
-
self._max_batch_size = self.global_config.get("embedding_batch_num", 10)
|
249 |
-
|
250 |
-
async def initialize(self):
|
251 |
-
if self.db is None:
|
252 |
-
self.db = await ClientManager.get_client()
|
253 |
-
|
254 |
-
async def finalize(self):
|
255 |
-
if self.db is not None:
|
256 |
-
await ClientManager.release_client(self.db)
|
257 |
-
self.db = None
|
258 |
-
|
259 |
-
################ QUERY METHODS ################
|
260 |
-
|
261 |
-
async def get_by_id(self, id: str) -> dict[str, Any] | None:
|
262 |
-
"""Get doc_full data based on id."""
|
263 |
-
SQL = SQL_TEMPLATES["get_by_id_" + self.namespace]
|
264 |
-
params = {"workspace": self.db.workspace, "id": id}
|
265 |
-
# print("get_by_id:"+SQL)
|
266 |
-
if is_namespace(self.namespace, NameSpace.KV_STORE_LLM_RESPONSE_CACHE):
|
267 |
-
array_res = await self.db.query(SQL, params, multirows=True)
|
268 |
-
res = {}
|
269 |
-
for row in array_res:
|
270 |
-
res[row["id"]] = row
|
271 |
-
if res:
|
272 |
-
return res
|
273 |
-
else:
|
274 |
-
return None
|
275 |
-
else:
|
276 |
-
return await self.db.query(SQL, params)
|
277 |
-
|
278 |
-
async def get_by_mode_and_id(self, mode: str, id: str) -> Union[dict, None]:
|
279 |
-
"""Specifically for llm_response_cache."""
|
280 |
-
SQL = SQL_TEMPLATES["get_by_mode_id_" + self.namespace]
|
281 |
-
params = {"workspace": self.db.workspace, "cache_mode": mode, "id": id}
|
282 |
-
if is_namespace(self.namespace, NameSpace.KV_STORE_LLM_RESPONSE_CACHE):
|
283 |
-
array_res = await self.db.query(SQL, params, multirows=True)
|
284 |
-
res = {}
|
285 |
-
for row in array_res:
|
286 |
-
res[row["id"]] = row
|
287 |
-
return res
|
288 |
-
else:
|
289 |
-
return None
|
290 |
-
|
291 |
-
async def get_by_ids(self, ids: list[str]) -> list[dict[str, Any]]:
|
292 |
-
"""Get doc_chunks data based on id"""
|
293 |
-
SQL = SQL_TEMPLATES["get_by_ids_" + self.namespace].format(
|
294 |
-
ids=",".join([f"'{id}'" for id in ids])
|
295 |
-
)
|
296 |
-
params = {"workspace": self.db.workspace}
|
297 |
-
# print("get_by_ids:"+SQL)
|
298 |
-
res = await self.db.query(SQL, params, multirows=True)
|
299 |
-
if is_namespace(self.namespace, NameSpace.KV_STORE_LLM_RESPONSE_CACHE):
|
300 |
-
modes = set()
|
301 |
-
dict_res: dict[str, dict] = {}
|
302 |
-
for row in res:
|
303 |
-
modes.add(row["mode"])
|
304 |
-
for mode in modes:
|
305 |
-
if mode not in dict_res:
|
306 |
-
dict_res[mode] = {}
|
307 |
-
for row in res:
|
308 |
-
dict_res[row["mode"]][row["id"]] = row
|
309 |
-
res = [{k: v} for k, v in dict_res.items()]
|
310 |
-
return res
|
311 |
-
|
312 |
-
async def filter_keys(self, keys: set[str]) -> set[str]:
|
313 |
-
"""Return keys that don't exist in storage"""
|
314 |
-
SQL = SQL_TEMPLATES["filter_keys"].format(
|
315 |
-
table_name=namespace_to_table_name(self.namespace),
|
316 |
-
ids=",".join([f"'{id}'" for id in keys]),
|
317 |
-
)
|
318 |
-
params = {"workspace": self.db.workspace}
|
319 |
-
res = await self.db.query(SQL, params, multirows=True)
|
320 |
-
if res:
|
321 |
-
exist_keys = [key["id"] for key in res]
|
322 |
-
data = set([s for s in keys if s not in exist_keys])
|
323 |
-
return data
|
324 |
-
else:
|
325 |
-
return set(keys)
|
326 |
-
|
327 |
-
################ INSERT METHODS ################
|
328 |
-
async def upsert(self, data: dict[str, dict[str, Any]]) -> None:
|
329 |
-
logger.info(f"Inserting {len(data)} to {self.namespace}")
|
330 |
-
if not data:
|
331 |
-
return
|
332 |
-
|
333 |
-
if is_namespace(self.namespace, NameSpace.KV_STORE_TEXT_CHUNKS):
|
334 |
-
list_data = [
|
335 |
-
{
|
336 |
-
"id": k,
|
337 |
-
**{k1: v1 for k1, v1 in v.items()},
|
338 |
-
}
|
339 |
-
for k, v in data.items()
|
340 |
-
]
|
341 |
-
contents = [v["content"] for v in data.values()]
|
342 |
-
batches = [
|
343 |
-
contents[i : i + self._max_batch_size]
|
344 |
-
for i in range(0, len(contents), self._max_batch_size)
|
345 |
-
]
|
346 |
-
embeddings_list = await asyncio.gather(
|
347 |
-
*[self.embedding_func(batch) for batch in batches]
|
348 |
-
)
|
349 |
-
embeddings = np.concatenate(embeddings_list)
|
350 |
-
for i, d in enumerate(list_data):
|
351 |
-
d["__vector__"] = embeddings[i]
|
352 |
-
|
353 |
-
merge_sql = SQL_TEMPLATES["merge_chunk"]
|
354 |
-
for item in list_data:
|
355 |
-
_data = {
|
356 |
-
"id": item["id"],
|
357 |
-
"content": item["content"],
|
358 |
-
"workspace": self.db.workspace,
|
359 |
-
"tokens": item["tokens"],
|
360 |
-
"chunk_order_index": item["chunk_order_index"],
|
361 |
-
"full_doc_id": item["full_doc_id"],
|
362 |
-
"content_vector": item["__vector__"],
|
363 |
-
"status": item["status"],
|
364 |
-
}
|
365 |
-
await self.db.execute(merge_sql, _data)
|
366 |
-
if is_namespace(self.namespace, NameSpace.KV_STORE_FULL_DOCS):
|
367 |
-
for k, v in data.items():
|
368 |
-
# values.clear()
|
369 |
-
merge_sql = SQL_TEMPLATES["merge_doc_full"]
|
370 |
-
_data = {
|
371 |
-
"id": k,
|
372 |
-
"content": v["content"],
|
373 |
-
"workspace": self.db.workspace,
|
374 |
-
}
|
375 |
-
await self.db.execute(merge_sql, _data)
|
376 |
-
|
377 |
-
if is_namespace(self.namespace, NameSpace.KV_STORE_LLM_RESPONSE_CACHE):
|
378 |
-
for mode, items in data.items():
|
379 |
-
for k, v in items.items():
|
380 |
-
upsert_sql = SQL_TEMPLATES["upsert_llm_response_cache"]
|
381 |
-
_data = {
|
382 |
-
"workspace": self.db.workspace,
|
383 |
-
"id": k,
|
384 |
-
"original_prompt": v["original_prompt"],
|
385 |
-
"return_value": v["return"],
|
386 |
-
"cache_mode": mode,
|
387 |
-
}
|
388 |
-
|
389 |
-
await self.db.execute(upsert_sql, _data)
|
390 |
-
|
391 |
-
async def index_done_callback(self) -> None:
|
392 |
-
# Oracle handles persistence automatically
|
393 |
-
pass
|
394 |
-
|
395 |
-
async def delete(self, ids: list[str]) -> None:
|
396 |
-
"""Delete records with specified IDs from the storage.
|
397 |
-
|
398 |
-
Args:
|
399 |
-
ids: List of record IDs to be deleted
|
400 |
-
"""
|
401 |
-
if not ids:
|
402 |
-
return
|
403 |
-
|
404 |
-
try:
|
405 |
-
table_name = namespace_to_table_name(self.namespace)
|
406 |
-
if not table_name:
|
407 |
-
logger.error(f"Unknown namespace for deletion: {self.namespace}")
|
408 |
-
return
|
409 |
-
|
410 |
-
ids_list = ",".join([f"'{id}'" for id in ids])
|
411 |
-
delete_sql = f"DELETE FROM {table_name} WHERE workspace=:workspace AND id IN ({ids_list})"
|
412 |
-
|
413 |
-
await self.db.execute(delete_sql, {"workspace": self.db.workspace})
|
414 |
-
logger.info(
|
415 |
-
f"Successfully deleted {len(ids)} records from {self.namespace}"
|
416 |
-
)
|
417 |
-
except Exception as e:
|
418 |
-
logger.error(f"Error deleting records from {self.namespace}: {e}")
|
419 |
-
|
420 |
-
async def drop_cache_by_modes(self, modes: list[str] | None = None) -> bool:
|
421 |
-
"""Delete specific records from storage by cache mode
|
422 |
-
|
423 |
-
Args:
|
424 |
-
modes (list[str]): List of cache modes to be dropped from storage
|
425 |
-
|
426 |
-
Returns:
|
427 |
-
bool: True if successful, False otherwise
|
428 |
-
"""
|
429 |
-
if not modes:
|
430 |
-
return False
|
431 |
-
|
432 |
-
try:
|
433 |
-
table_name = namespace_to_table_name(self.namespace)
|
434 |
-
if not table_name:
|
435 |
-
return False
|
436 |
-
|
437 |
-
if table_name != "LIGHTRAG_LLM_CACHE":
|
438 |
-
return False
|
439 |
-
|
440 |
-
# 构建Oracle风格的IN查询
|
441 |
-
modes_list = ", ".join([f"'{mode}'" for mode in modes])
|
442 |
-
sql = f"""
|
443 |
-
DELETE FROM {table_name}
|
444 |
-
WHERE workspace = :workspace
|
445 |
-
AND cache_mode IN ({modes_list})
|
446 |
-
"""
|
447 |
-
|
448 |
-
logger.info(f"Deleting cache by modes: {modes}")
|
449 |
-
await self.db.execute(sql, {"workspace": self.db.workspace})
|
450 |
-
return True
|
451 |
-
except Exception as e:
|
452 |
-
logger.error(f"Error deleting cache by modes {modes}: {e}")
|
453 |
-
return False
|
454 |
-
|
455 |
-
async def drop(self) -> dict[str, str]:
|
456 |
-
"""Drop the storage"""
|
457 |
-
try:
|
458 |
-
table_name = namespace_to_table_name(self.namespace)
|
459 |
-
if not table_name:
|
460 |
-
return {
|
461 |
-
"status": "error",
|
462 |
-
"message": f"Unknown namespace: {self.namespace}",
|
463 |
-
}
|
464 |
-
|
465 |
-
drop_sql = SQL_TEMPLATES["drop_specifiy_table_workspace"].format(
|
466 |
-
table_name=table_name
|
467 |
-
)
|
468 |
-
await self.db.execute(drop_sql, {"workspace": self.db.workspace})
|
469 |
-
return {"status": "success", "message": "data dropped"}
|
470 |
-
except Exception as e:
|
471 |
-
return {"status": "error", "message": str(e)}
|
472 |
-
|
473 |
-
|
474 |
-
@final
|
475 |
-
@dataclass
|
476 |
-
class OracleVectorDBStorage(BaseVectorStorage):
|
477 |
-
db: OracleDB | None = field(default=None)
|
478 |
-
|
479 |
-
def __post_init__(self):
|
480 |
-
config = self.global_config.get("vector_db_storage_cls_kwargs", {})
|
481 |
-
cosine_threshold = config.get("cosine_better_than_threshold")
|
482 |
-
if cosine_threshold is None:
|
483 |
-
raise ValueError(
|
484 |
-
"cosine_better_than_threshold must be specified in vector_db_storage_cls_kwargs"
|
485 |
-
)
|
486 |
-
self.cosine_better_than_threshold = cosine_threshold
|
487 |
-
|
488 |
-
async def initialize(self):
|
489 |
-
if self.db is None:
|
490 |
-
self.db = await ClientManager.get_client()
|
491 |
-
|
492 |
-
async def finalize(self):
|
493 |
-
if self.db is not None:
|
494 |
-
await ClientManager.release_client(self.db)
|
495 |
-
self.db = None
|
496 |
-
|
497 |
-
#################### query method ###############
|
498 |
-
async def query(
|
499 |
-
self, query: str, top_k: int, ids: list[str] | None = None
|
500 |
-
) -> list[dict[str, Any]]:
|
501 |
-
embeddings = await self.embedding_func([query])
|
502 |
-
embedding = embeddings[0]
|
503 |
-
# 转换精度
|
504 |
-
dtype = str(embedding.dtype).upper()
|
505 |
-
dimension = embedding.shape[0]
|
506 |
-
embedding_string = "[" + ", ".join(map(str, embedding.tolist())) + "]"
|
507 |
-
|
508 |
-
SQL = SQL_TEMPLATES[self.namespace].format(dimension=dimension, dtype=dtype)
|
509 |
-
params = {
|
510 |
-
"embedding_string": embedding_string,
|
511 |
-
"workspace": self.db.workspace,
|
512 |
-
"top_k": top_k,
|
513 |
-
"better_than_threshold": self.cosine_better_than_threshold,
|
514 |
-
}
|
515 |
-
results = await self.db.query(SQL, params=params, multirows=True)
|
516 |
-
return results
|
517 |
-
|
518 |
-
async def upsert(self, data: dict[str, dict[str, Any]]) -> None:
|
519 |
-
raise NotImplementedError
|
520 |
-
|
521 |
-
async def index_done_callback(self) -> None:
|
522 |
-
# Oracles handles persistence automatically
|
523 |
-
pass
|
524 |
-
|
525 |
-
async def delete(self, ids: list[str]) -> None:
|
526 |
-
"""Delete vectors with specified IDs
|
527 |
-
|
528 |
-
Args:
|
529 |
-
ids: List of vector IDs to be deleted
|
530 |
-
"""
|
531 |
-
if not ids:
|
532 |
-
return
|
533 |
-
|
534 |
-
try:
|
535 |
-
SQL = SQL_TEMPLATES["delete_vectors"].format(
|
536 |
-
ids=",".join([f"'{id}'" for id in ids])
|
537 |
-
)
|
538 |
-
params = {"workspace": self.db.workspace}
|
539 |
-
await self.db.execute(SQL, params)
|
540 |
-
logger.info(
|
541 |
-
f"Successfully deleted {len(ids)} vectors from {self.namespace}"
|
542 |
-
)
|
543 |
-
except Exception as e:
|
544 |
-
logger.error(f"Error while deleting vectors from {self.namespace}: {e}")
|
545 |
-
raise
|
546 |
-
|
547 |
-
async def delete_entity(self, entity_name: str) -> None:
|
548 |
-
"""Delete entity by name
|
549 |
-
|
550 |
-
Args:
|
551 |
-
entity_name: Name of the entity to delete
|
552 |
-
"""
|
553 |
-
try:
|
554 |
-
SQL = SQL_TEMPLATES["delete_entity"]
|
555 |
-
params = {"workspace": self.db.workspace, "entity_name": entity_name}
|
556 |
-
await self.db.execute(SQL, params)
|
557 |
-
logger.info(f"Successfully deleted entity {entity_name}")
|
558 |
-
except Exception as e:
|
559 |
-
logger.error(f"Error deleting entity {entity_name}: {e}")
|
560 |
-
raise
|
561 |
-
|
562 |
-
async def delete_entity_relation(self, entity_name: str) -> None:
|
563 |
-
"""Delete all relations connected to an entity
|
564 |
-
|
565 |
-
Args:
|
566 |
-
entity_name: Name of the entity whose relations should be deleted
|
567 |
-
"""
|
568 |
-
try:
|
569 |
-
SQL = SQL_TEMPLATES["delete_entity_relations"]
|
570 |
-
params = {"workspace": self.db.workspace, "entity_name": entity_name}
|
571 |
-
await self.db.execute(SQL, params)
|
572 |
-
logger.info(f"Successfully deleted relations for entity {entity_name}")
|
573 |
-
except Exception as e:
|
574 |
-
logger.error(f"Error deleting relations for entity {entity_name}: {e}")
|
575 |
-
raise
|
576 |
-
|
577 |
-
async def search_by_prefix(self, prefix: str) -> list[dict[str, Any]]:
|
578 |
-
"""Search for records with IDs starting with a specific prefix.
|
579 |
-
|
580 |
-
Args:
|
581 |
-
prefix: The prefix to search for in record IDs
|
582 |
-
|
583 |
-
Returns:
|
584 |
-
List of records with matching ID prefixes
|
585 |
-
"""
|
586 |
-
try:
|
587 |
-
# Determine the appropriate table based on namespace
|
588 |
-
table_name = namespace_to_table_name(self.namespace)
|
589 |
-
|
590 |
-
# Create SQL query to find records with IDs starting with prefix
|
591 |
-
search_sql = f"""
|
592 |
-
SELECT * FROM {table_name}
|
593 |
-
WHERE workspace = :workspace
|
594 |
-
AND id LIKE :prefix_pattern
|
595 |
-
ORDER BY id
|
596 |
-
"""
|
597 |
-
|
598 |
-
params = {"workspace": self.db.workspace, "prefix_pattern": f"{prefix}%"}
|
599 |
-
|
600 |
-
# Execute query and get results
|
601 |
-
results = await self.db.query(search_sql, params, multirows=True)
|
602 |
-
|
603 |
-
logger.debug(
|
604 |
-
f"Found {len(results) if results else 0} records with prefix '{prefix}'"
|
605 |
-
)
|
606 |
-
return results or []
|
607 |
-
|
608 |
-
except Exception as e:
|
609 |
-
logger.error(f"Error searching records with prefix '{prefix}': {e}")
|
610 |
-
return []
|
611 |
-
|
612 |
-
async def get_by_id(self, id: str) -> dict[str, Any] | None:
|
613 |
-
"""Get vector data by its ID
|
614 |
-
|
615 |
-
Args:
|
616 |
-
id: The unique identifier of the vector
|
617 |
-
|
618 |
-
Returns:
|
619 |
-
The vector data if found, or None if not found
|
620 |
-
"""
|
621 |
-
try:
|
622 |
-
# Determine the table name based on namespace
|
623 |
-
table_name = namespace_to_table_name(self.namespace)
|
624 |
-
if not table_name:
|
625 |
-
logger.error(f"Unknown namespace for ID lookup: {self.namespace}")
|
626 |
-
return None
|
627 |
-
|
628 |
-
# Create the appropriate ID field name based on namespace
|
629 |
-
id_field = "entity_id" if "NODES" in table_name else "relation_id"
|
630 |
-
if "CHUNKS" in table_name:
|
631 |
-
id_field = "chunk_id"
|
632 |
-
|
633 |
-
# Prepare and execute the query
|
634 |
-
query = f"""
|
635 |
-
SELECT * FROM {table_name}
|
636 |
-
WHERE {id_field} = :id AND workspace = :workspace
|
637 |
-
"""
|
638 |
-
params = {"id": id, "workspace": self.db.workspace}
|
639 |
-
|
640 |
-
result = await self.db.query(query, params)
|
641 |
-
return result
|
642 |
-
except Exception as e:
|
643 |
-
logger.error(f"Error retrieving vector data for ID {id}: {e}")
|
644 |
-
return None
|
645 |
-
|
646 |
-
async def get_by_ids(self, ids: list[str]) -> list[dict[str, Any]]:
|
647 |
-
"""Get multiple vector data by their IDs
|
648 |
-
|
649 |
-
Args:
|
650 |
-
ids: List of unique identifiers
|
651 |
-
|
652 |
-
Returns:
|
653 |
-
List of vector data objects that were found
|
654 |
-
"""
|
655 |
-
if not ids:
|
656 |
-
return []
|
657 |
-
|
658 |
-
try:
|
659 |
-
# Determine the table name based on namespace
|
660 |
-
table_name = namespace_to_table_name(self.namespace)
|
661 |
-
if not table_name:
|
662 |
-
logger.error(f"Unknown namespace for IDs lookup: {self.namespace}")
|
663 |
-
return []
|
664 |
-
|
665 |
-
# Create the appropriate ID field name based on namespace
|
666 |
-
id_field = "entity_id" if "NODES" in table_name else "relation_id"
|
667 |
-
if "CHUNKS" in table_name:
|
668 |
-
id_field = "chunk_id"
|
669 |
-
|
670 |
-
# Format the list of IDs for SQL IN clause
|
671 |
-
ids_list = ", ".join([f"'{id}'" for id in ids])
|
672 |
-
|
673 |
-
# Prepare and execute the query
|
674 |
-
query = f"""
|
675 |
-
SELECT * FROM {table_name}
|
676 |
-
WHERE {id_field} IN ({ids_list}) AND workspace = :workspace
|
677 |
-
"""
|
678 |
-
params = {"workspace": self.db.workspace}
|
679 |
-
|
680 |
-
results = await self.db.query(query, params, multirows=True)
|
681 |
-
return results or []
|
682 |
-
except Exception as e:
|
683 |
-
logger.error(f"Error retrieving vector data for IDs {ids}: {e}")
|
684 |
-
return []
|
685 |
-
|
686 |
-
async def drop(self) -> dict[str, str]:
|
687 |
-
"""Drop the storage"""
|
688 |
-
try:
|
689 |
-
table_name = namespace_to_table_name(self.namespace)
|
690 |
-
if not table_name:
|
691 |
-
return {
|
692 |
-
"status": "error",
|
693 |
-
"message": f"Unknown namespace: {self.namespace}",
|
694 |
-
}
|
695 |
-
|
696 |
-
drop_sql = SQL_TEMPLATES["drop_specifiy_table_workspace"].format(
|
697 |
-
table_name=table_name
|
698 |
-
)
|
699 |
-
await self.db.execute(drop_sql, {"workspace": self.db.workspace})
|
700 |
-
return {"status": "success", "message": "data dropped"}
|
701 |
-
except Exception as e:
|
702 |
-
return {"status": "error", "message": str(e)}
|
703 |
-
|
704 |
-
|
705 |
-
@final
|
706 |
-
@dataclass
|
707 |
-
class OracleGraphStorage(BaseGraphStorage):
|
708 |
-
db: OracleDB = field(default=None)
|
709 |
-
|
710 |
-
def __post_init__(self):
|
711 |
-
self._max_batch_size = self.global_config.get("embedding_batch_num", 10)
|
712 |
-
|
713 |
-
async def initialize(self):
|
714 |
-
if self.db is None:
|
715 |
-
self.db = await ClientManager.get_client()
|
716 |
-
|
717 |
-
async def finalize(self):
|
718 |
-
if self.db is not None:
|
719 |
-
await ClientManager.release_client(self.db)
|
720 |
-
self.db = None
|
721 |
-
|
722 |
-
#################### insert method ################
|
723 |
-
|
724 |
-
async def upsert_node(self, node_id: str, node_data: dict[str, str]) -> None:
|
725 |
-
entity_name = node_id
|
726 |
-
entity_type = node_data["entity_type"]
|
727 |
-
description = node_data["description"]
|
728 |
-
source_id = node_data["source_id"]
|
729 |
-
logger.debug(f"entity_name:{entity_name}, entity_type:{entity_type}")
|
730 |
-
|
731 |
-
content = entity_name + description
|
732 |
-
contents = [content]
|
733 |
-
batches = [
|
734 |
-
contents[i : i + self._max_batch_size]
|
735 |
-
for i in range(0, len(contents), self._max_batch_size)
|
736 |
-
]
|
737 |
-
embeddings_list = await asyncio.gather(
|
738 |
-
*[self.embedding_func(batch) for batch in batches]
|
739 |
-
)
|
740 |
-
embeddings = np.concatenate(embeddings_list)
|
741 |
-
content_vector = embeddings[0]
|
742 |
-
merge_sql = SQL_TEMPLATES["merge_node"]
|
743 |
-
data = {
|
744 |
-
"workspace": self.db.workspace,
|
745 |
-
"name": entity_name,
|
746 |
-
"entity_type": entity_type,
|
747 |
-
"description": description,
|
748 |
-
"source_chunk_id": source_id,
|
749 |
-
"content": content,
|
750 |
-
"content_vector": content_vector,
|
751 |
-
}
|
752 |
-
await self.db.execute(merge_sql, data)
|
753 |
-
# self._graph.add_node(node_id, **node_data)
|
754 |
-
|
755 |
-
async def upsert_edge(
|
756 |
-
self, source_node_id: str, target_node_id: str, edge_data: dict[str, str]
|
757 |
-
) -> None:
|
758 |
-
"""插入或更新边"""
|
759 |
-
# print("go into upsert edge method")
|
760 |
-
source_name = source_node_id
|
761 |
-
target_name = target_node_id
|
762 |
-
weight = edge_data["weight"]
|
763 |
-
keywords = edge_data["keywords"]
|
764 |
-
description = edge_data["description"]
|
765 |
-
source_chunk_id = edge_data["source_id"]
|
766 |
-
logger.debug(
|
767 |
-
f"source_name:{source_name}, target_name:{target_name}, keywords: {keywords}"
|
768 |
-
)
|
769 |
-
|
770 |
-
content = keywords + source_name + target_name + description
|
771 |
-
contents = [content]
|
772 |
-
batches = [
|
773 |
-
contents[i : i + self._max_batch_size]
|
774 |
-
for i in range(0, len(contents), self._max_batch_size)
|
775 |
-
]
|
776 |
-
embeddings_list = await asyncio.gather(
|
777 |
-
*[self.embedding_func(batch) for batch in batches]
|
778 |
-
)
|
779 |
-
embeddings = np.concatenate(embeddings_list)
|
780 |
-
content_vector = embeddings[0]
|
781 |
-
merge_sql = SQL_TEMPLATES["merge_edge"]
|
782 |
-
data = {
|
783 |
-
"workspace": self.db.workspace,
|
784 |
-
"source_name": source_name,
|
785 |
-
"target_name": target_name,
|
786 |
-
"weight": weight,
|
787 |
-
"keywords": keywords,
|
788 |
-
"description": description,
|
789 |
-
"source_chunk_id": source_chunk_id,
|
790 |
-
"content": content,
|
791 |
-
"content_vector": content_vector,
|
792 |
-
}
|
793 |
-
# print(merge_sql)
|
794 |
-
await self.db.execute(merge_sql, data)
|
795 |
-
# self._graph.add_edge(source_node_id, target_node_id, **edge_data)
|
796 |
-
|
797 |
-
async def embed_nodes(
|
798 |
-
self, algorithm: str
|
799 |
-
) -> tuple[np.ndarray[Any, Any], list[str]]:
|
800 |
-
if algorithm not in self._node_embed_algorithms:
|
801 |
-
raise ValueError(f"Node embedding algorithm {algorithm} not supported")
|
802 |
-
return await self._node_embed_algorithms[algorithm]()
|
803 |
-
|
804 |
-
async def _node2vec_embed(self):
|
805 |
-
"""为节点生成向量"""
|
806 |
-
embeddings, nodes = embed.node2vec_embed(
|
807 |
-
self._graph,
|
808 |
-
**self.config["node2vec_params"],
|
809 |
-
)
|
810 |
-
|
811 |
-
nodes_ids = [self._graph.nodes[node_id]["id"] for node_id in nodes]
|
812 |
-
return embeddings, nodes_ids
|
813 |
-
|
814 |
-
async def index_done_callback(self) -> None:
|
815 |
-
# Oracles handles persistence automatically
|
816 |
-
pass
|
817 |
-
|
818 |
-
#################### query method #################
|
819 |
-
async def has_node(self, node_id: str) -> bool:
|
820 |
-
"""根据节点id检查节点是否存在"""
|
821 |
-
SQL = SQL_TEMPLATES["has_node"]
|
822 |
-
params = {"workspace": self.db.workspace, "node_id": node_id}
|
823 |
-
res = await self.db.query(SQL, params)
|
824 |
-
if res:
|
825 |
-
# print("Node exist!",res)
|
826 |
-
return True
|
827 |
-
else:
|
828 |
-
# print("Node not exist!")
|
829 |
-
return False
|
830 |
-
|
831 |
-
async def has_edge(self, source_node_id: str, target_node_id: str) -> bool:
|
832 |
-
SQL = SQL_TEMPLATES["has_edge"]
|
833 |
-
params = {
|
834 |
-
"workspace": self.db.workspace,
|
835 |
-
"source_node_id": source_node_id,
|
836 |
-
"target_node_id": target_node_id,
|
837 |
-
}
|
838 |
-
res = await self.db.query(SQL, params)
|
839 |
-
if res:
|
840 |
-
# print("Edge exist!",res)
|
841 |
-
return True
|
842 |
-
else:
|
843 |
-
# print("Edge not exist!")
|
844 |
-
return False
|
845 |
-
|
846 |
-
async def node_degree(self, node_id: str) -> int:
|
847 |
-
SQL = SQL_TEMPLATES["node_degree"]
|
848 |
-
params = {"workspace": self.db.workspace, "node_id": node_id}
|
849 |
-
res = await self.db.query(SQL, params)
|
850 |
-
if res:
|
851 |
-
return res["degree"]
|
852 |
-
else:
|
853 |
-
return 0
|
854 |
-
|
855 |
-
async def edge_degree(self, src_id: str, tgt_id: str) -> int:
|
856 |
-
"""根据源和目标节点id获取边的度"""
|
857 |
-
degree = await self.node_degree(src_id) + await self.node_degree(tgt_id)
|
858 |
-
return degree
|
859 |
-
|
860 |
-
async def get_node(self, node_id: str) -> dict[str, str] | None:
|
861 |
-
"""根据节点id获取节点数据"""
|
862 |
-
SQL = SQL_TEMPLATES["get_node"]
|
863 |
-
params = {"workspace": self.db.workspace, "node_id": node_id}
|
864 |
-
res = await self.db.query(SQL, params)
|
865 |
-
if res:
|
866 |
-
return res
|
867 |
-
else:
|
868 |
-
return None
|
869 |
-
|
870 |
-
async def get_edge(
|
871 |
-
self, source_node_id: str, target_node_id: str
|
872 |
-
) -> dict[str, str] | None:
|
873 |
-
SQL = SQL_TEMPLATES["get_edge"]
|
874 |
-
params = {
|
875 |
-
"workspace": self.db.workspace,
|
876 |
-
"source_node_id": source_node_id,
|
877 |
-
"target_node_id": target_node_id,
|
878 |
-
}
|
879 |
-
res = await self.db.query(SQL, params)
|
880 |
-
if res:
|
881 |
-
# print("Get edge!",self.db.workspace, source_node_id, target_node_id,res[0])
|
882 |
-
return res
|
883 |
-
else:
|
884 |
-
# print("Edge not exist!",self.db.workspace, source_node_id, target_node_id)
|
885 |
-
return None
|
886 |
-
|
887 |
-
async def get_node_edges(self, source_node_id: str) -> list[tuple[str, str]] | None:
|
888 |
-
if await self.has_node(source_node_id):
|
889 |
-
SQL = SQL_TEMPLATES["get_node_edges"]
|
890 |
-
params = {"workspace": self.db.workspace, "source_node_id": source_node_id}
|
891 |
-
res = await self.db.query(sql=SQL, params=params, multirows=True)
|
892 |
-
if res:
|
893 |
-
data = [(i["source_name"], i["target_name"]) for i in res]
|
894 |
-
# print("Get node edge!",self.db.workspace, source_node_id,data)
|
895 |
-
return data
|
896 |
-
else:
|
897 |
-
# print("Node Edge not exist!",self.db.workspace, source_node_id)
|
898 |
-
return []
|
899 |
-
|
900 |
-
async def get_all_nodes(self, limit: int):
|
901 |
-
"""查询所有节点"""
|
902 |
-
SQL = SQL_TEMPLATES["get_all_nodes"]
|
903 |
-
params = {"workspace": self.db.workspace, "limit": str(limit)}
|
904 |
-
res = await self.db.query(sql=SQL, params=params, multirows=True)
|
905 |
-
if res:
|
906 |
-
return res
|
907 |
-
|
908 |
-
async def get_all_edges(self, limit: int):
|
909 |
-
"""查询所有边"""
|
910 |
-
SQL = SQL_TEMPLATES["get_all_edges"]
|
911 |
-
params = {"workspace": self.db.workspace, "limit": str(limit)}
|
912 |
-
res = await self.db.query(sql=SQL, params=params, multirows=True)
|
913 |
-
if res:
|
914 |
-
return res
|
915 |
-
|
916 |
-
async def get_statistics(self):
|
917 |
-
SQL = SQL_TEMPLATES["get_statistics"]
|
918 |
-
params = {"workspace": self.db.workspace}
|
919 |
-
res = await self.db.query(sql=SQL, params=params, multirows=True)
|
920 |
-
if res:
|
921 |
-
return res
|
922 |
-
|
923 |
-
async def delete_node(self, node_id: str) -> None:
|
924 |
-
"""Delete a node from the graph
|
925 |
-
|
926 |
-
Args:
|
927 |
-
node_id: ID of the node to delete
|
928 |
-
"""
|
929 |
-
try:
|
930 |
-
# First delete all relations connected to this node
|
931 |
-
delete_relations_sql = SQL_TEMPLATES["delete_entity_relations"]
|
932 |
-
params_relations = {"workspace": self.db.workspace, "entity_name": node_id}
|
933 |
-
await self.db.execute(delete_relations_sql, params_relations)
|
934 |
-
|
935 |
-
# Then delete the node itself
|
936 |
-
delete_node_sql = SQL_TEMPLATES["delete_entity"]
|
937 |
-
params_node = {"workspace": self.db.workspace, "entity_name": node_id}
|
938 |
-
await self.db.execute(delete_node_sql, params_node)
|
939 |
-
|
940 |
-
logger.info(
|
941 |
-
f"Successfully deleted node {node_id} and all its relationships"
|
942 |
-
)
|
943 |
-
except Exception as e:
|
944 |
-
logger.error(f"Error deleting node {node_id}: {e}")
|
945 |
-
raise
|
946 |
-
|
947 |
-
async def remove_nodes(self, nodes: list[str]) -> None:
|
948 |
-
"""Delete multiple nodes from the graph
|
949 |
-
|
950 |
-
Args:
|
951 |
-
nodes: List of node IDs to be deleted
|
952 |
-
"""
|
953 |
-
if not nodes:
|
954 |
-
return
|
955 |
-
|
956 |
-
try:
|
957 |
-
for node in nodes:
|
958 |
-
# For each node, first delete all its relationships
|
959 |
-
delete_relations_sql = SQL_TEMPLATES["delete_entity_relations"]
|
960 |
-
params_relations = {"workspace": self.db.workspace, "entity_name": node}
|
961 |
-
await self.db.execute(delete_relations_sql, params_relations)
|
962 |
-
|
963 |
-
# Then delete the node itself
|
964 |
-
delete_node_sql = SQL_TEMPLATES["delete_entity"]
|
965 |
-
params_node = {"workspace": self.db.workspace, "entity_name": node}
|
966 |
-
await self.db.execute(delete_node_sql, params_node)
|
967 |
-
|
968 |
-
logger.info(
|
969 |
-
f"Successfully deleted {len(nodes)} nodes and their relationships"
|
970 |
-
)
|
971 |
-
except Exception as e:
|
972 |
-
logger.error(f"Error during batch node deletion: {e}")
|
973 |
-
raise
|
974 |
-
|
975 |
-
async def remove_edges(self, edges: list[tuple[str, str]]) -> None:
|
976 |
-
"""Delete multiple edges from the graph
|
977 |
-
|
978 |
-
Args:
|
979 |
-
edges: List of edges to be deleted, each edge is a (source, target) tuple
|
980 |
-
"""
|
981 |
-
if not edges:
|
982 |
-
return
|
983 |
-
|
984 |
-
try:
|
985 |
-
for source, target in edges:
|
986 |
-
# Check if the edge exists before attempting to delete
|
987 |
-
if await self.has_edge(source, target):
|
988 |
-
# Delete the edge using a SQL query that matches both source and target
|
989 |
-
delete_edge_sql = """
|
990 |
-
DELETE FROM LIGHTRAG_GRAPH_EDGES
|
991 |
-
WHERE workspace = :workspace
|
992 |
-
AND source_name = :source_name
|
993 |
-
AND target_name = :target_name
|
994 |
-
"""
|
995 |
-
params = {
|
996 |
-
"workspace": self.db.workspace,
|
997 |
-
"source_name": source,
|
998 |
-
"target_name": target,
|
999 |
-
}
|
1000 |
-
await self.db.execute(delete_edge_sql, params)
|
1001 |
-
|
1002 |
-
logger.info(f"Successfully deleted {len(edges)} edges from the graph")
|
1003 |
-
except Exception as e:
|
1004 |
-
logger.error(f"Error during batch edge deletion: {e}")
|
1005 |
-
raise
|
1006 |
-
|
1007 |
-
async def get_all_labels(self) -> list[str]:
|
1008 |
-
"""Get all unique entity types (labels) in the graph
|
1009 |
-
|
1010 |
-
Returns:
|
1011 |
-
List of unique entity types/labels
|
1012 |
-
"""
|
1013 |
-
try:
|
1014 |
-
SQL = """
|
1015 |
-
SELECT DISTINCT entity_type
|
1016 |
-
FROM LIGHTRAG_GRAPH_NODES
|
1017 |
-
WHERE workspace = :workspace
|
1018 |
-
ORDER BY entity_type
|
1019 |
-
"""
|
1020 |
-
params = {"workspace": self.db.workspace}
|
1021 |
-
results = await self.db.query(SQL, params, multirows=True)
|
1022 |
-
|
1023 |
-
if results:
|
1024 |
-
labels = [row["entity_type"] for row in results]
|
1025 |
-
return labels
|
1026 |
-
else:
|
1027 |
-
return []
|
1028 |
-
except Exception as e:
|
1029 |
-
logger.error(f"Error retrieving entity types: {e}")
|
1030 |
-
return []
|
1031 |
-
|
1032 |
-
async def drop(self) -> dict[str, str]:
|
1033 |
-
"""Drop the storage"""
|
1034 |
-
try:
|
1035 |
-
# 使用图形查询删除所有节点和关系
|
1036 |
-
delete_edges_sql = (
|
1037 |
-
"""DELETE FROM LIGHTRAG_GRAPH_EDGES WHERE workspace=:workspace"""
|
1038 |
-
)
|
1039 |
-
await self.db.execute(delete_edges_sql, {"workspace": self.db.workspace})
|
1040 |
-
|
1041 |
-
delete_nodes_sql = (
|
1042 |
-
"""DELETE FROM LIGHTRAG_GRAPH_NODES WHERE workspace=:workspace"""
|
1043 |
-
)
|
1044 |
-
await self.db.execute(delete_nodes_sql, {"workspace": self.db.workspace})
|
1045 |
-
|
1046 |
-
return {"status": "success", "message": "graph data dropped"}
|
1047 |
-
except Exception as e:
|
1048 |
-
logger.error(f"Error dropping graph: {e}")
|
1049 |
-
return {"status": "error", "message": str(e)}
|
1050 |
-
|
1051 |
-
async def get_knowledge_graph(
|
1052 |
-
self, node_label: str, max_depth: int = 5
|
1053 |
-
) -> KnowledgeGraph:
|
1054 |
-
"""Retrieve a connected subgraph starting from nodes matching the given label
|
1055 |
-
|
1056 |
-
Maximum number of nodes is constrained by MAX_GRAPH_NODES environment variable.
|
1057 |
-
Prioritizes nodes by:
|
1058 |
-
1. Nodes matching the specified label
|
1059 |
-
2. Nodes directly connected to matching nodes
|
1060 |
-
3. Node degree (number of connections)
|
1061 |
-
|
1062 |
-
Args:
|
1063 |
-
node_label: Label to match for starting nodes (use "*" for all nodes)
|
1064 |
-
max_depth: Maximum depth of traversal from starting nodes
|
1065 |
-
|
1066 |
-
Returns:
|
1067 |
-
KnowledgeGraph object containing nodes and edges
|
1068 |
-
"""
|
1069 |
-
result = KnowledgeGraph()
|
1070 |
-
|
1071 |
-
try:
|
1072 |
-
# Define maximum number of nodes to return
|
1073 |
-
max_graph_nodes = int(os.environ.get("MAX_GRAPH_NODES", 1000))
|
1074 |
-
|
1075 |
-
if node_label == "*":
|
1076 |
-
# For "*" label, get all nodes up to the limit
|
1077 |
-
nodes_sql = """
|
1078 |
-
SELECT name, entity_type, description, source_chunk_id
|
1079 |
-
FROM LIGHTRAG_GRAPH_NODES
|
1080 |
-
WHERE workspace = :workspace
|
1081 |
-
ORDER BY id
|
1082 |
-
FETCH FIRST :limit ROWS ONLY
|
1083 |
-
"""
|
1084 |
-
nodes_params = {
|
1085 |
-
"workspace": self.db.workspace,
|
1086 |
-
"limit": max_graph_nodes,
|
1087 |
-
}
|
1088 |
-
nodes = await self.db.query(nodes_sql, nodes_params, multirows=True)
|
1089 |
-
else:
|
1090 |
-
# For specific label, find matching nodes and related nodes
|
1091 |
-
nodes_sql = """
|
1092 |
-
WITH matching_nodes AS (
|
1093 |
-
SELECT name
|
1094 |
-
FROM LIGHTRAG_GRAPH_NODES
|
1095 |
-
WHERE workspace = :workspace
|
1096 |
-
AND (name LIKE '%' || :node_label || '%' OR entity_type LIKE '%' || :node_label || '%')
|
1097 |
-
)
|
1098 |
-
SELECT n.name, n.entity_type, n.description, n.source_chunk_id,
|
1099 |
-
CASE
|
1100 |
-
WHEN n.name IN (SELECT name FROM matching_nodes) THEN 2
|
1101 |
-
WHEN EXISTS (
|
1102 |
-
SELECT 1 FROM LIGHTRAG_GRAPH_EDGES e
|
1103 |
-
WHERE workspace = :workspace
|
1104 |
-
AND ((e.source_name = n.name AND e.target_name IN (SELECT name FROM matching_nodes))
|
1105 |
-
OR (e.target_name = n.name AND e.source_name IN (SELECT name FROM matching_nodes)))
|
1106 |
-
) THEN 1
|
1107 |
-
ELSE 0
|
1108 |
-
END AS priority,
|
1109 |
-
(SELECT COUNT(*) FROM LIGHTRAG_GRAPH_EDGES e
|
1110 |
-
WHERE workspace = :workspace
|
1111 |
-
AND (e.source_name = n.name OR e.target_name = n.name)) AS degree
|
1112 |
-
FROM LIGHTRAG_GRAPH_NODES n
|
1113 |
-
WHERE workspace = :workspace
|
1114 |
-
ORDER BY priority DESC, degree DESC
|
1115 |
-
FETCH FIRST :limit ROWS ONLY
|
1116 |
-
"""
|
1117 |
-
nodes_params = {
|
1118 |
-
"workspace": self.db.workspace,
|
1119 |
-
"node_label": node_label,
|
1120 |
-
"limit": max_graph_nodes,
|
1121 |
-
}
|
1122 |
-
nodes = await self.db.query(nodes_sql, nodes_params, multirows=True)
|
1123 |
-
|
1124 |
-
if not nodes:
|
1125 |
-
logger.warning(f"No nodes found matching '{node_label}'")
|
1126 |
-
return result
|
1127 |
-
|
1128 |
-
# Create mapping of node IDs to be used to filter edges
|
1129 |
-
node_names = [node["name"] for node in nodes]
|
1130 |
-
|
1131 |
-
# Add nodes to result
|
1132 |
-
seen_nodes = set()
|
1133 |
-
for node in nodes:
|
1134 |
-
node_id = node["name"]
|
1135 |
-
if node_id in seen_nodes:
|
1136 |
-
continue
|
1137 |
-
|
1138 |
-
# Create node properties dictionary
|
1139 |
-
properties = {
|
1140 |
-
"entity_type": node["entity_type"],
|
1141 |
-
"description": node["description"] or "",
|
1142 |
-
"source_id": node["source_chunk_id"] or "",
|
1143 |
-
}
|
1144 |
-
|
1145 |
-
# Add node to result
|
1146 |
-
result.nodes.append(
|
1147 |
-
KnowledgeGraphNode(
|
1148 |
-
id=node_id, labels=[node["entity_type"]], properties=properties
|
1149 |
-
)
|
1150 |
-
)
|
1151 |
-
seen_nodes.add(node_id)
|
1152 |
-
|
1153 |
-
# Get edges between these nodes
|
1154 |
-
edges_sql = """
|
1155 |
-
SELECT source_name, target_name, weight, keywords, description, source_chunk_id
|
1156 |
-
FROM LIGHTRAG_GRAPH_EDGES
|
1157 |
-
WHERE workspace = :workspace
|
1158 |
-
AND source_name IN (SELECT COLUMN_VALUE FROM TABLE(CAST(:node_names AS SYS.ODCIVARCHAR2LIST)))
|
1159 |
-
AND target_name IN (SELECT COLUMN_VALUE FROM TABLE(CAST(:node_names AS SYS.ODCIVARCHAR2LIST)))
|
1160 |
-
ORDER BY id
|
1161 |
-
"""
|
1162 |
-
edges_params = {"workspace": self.db.workspace, "node_names": node_names}
|
1163 |
-
edges = await self.db.query(edges_sql, edges_params, multirows=True)
|
1164 |
-
|
1165 |
-
# Add edges to result
|
1166 |
-
seen_edges = set()
|
1167 |
-
for edge in edges:
|
1168 |
-
source = edge["source_name"]
|
1169 |
-
target = edge["target_name"]
|
1170 |
-
edge_id = f"{source}-{target}"
|
1171 |
-
|
1172 |
-
if edge_id in seen_edges:
|
1173 |
-
continue
|
1174 |
-
|
1175 |
-
# Create edge properties dictionary
|
1176 |
-
properties = {
|
1177 |
-
"weight": edge["weight"] or 0.0,
|
1178 |
-
"keywords": edge["keywords"] or "",
|
1179 |
-
"description": edge["description"] or "",
|
1180 |
-
"source_id": edge["source_chunk_id"] or "",
|
1181 |
-
}
|
1182 |
-
|
1183 |
-
# Add edge to result
|
1184 |
-
result.edges.append(
|
1185 |
-
KnowledgeGraphEdge(
|
1186 |
-
id=edge_id,
|
1187 |
-
type="RELATED",
|
1188 |
-
source=source,
|
1189 |
-
target=target,
|
1190 |
-
properties=properties,
|
1191 |
-
)
|
1192 |
-
)
|
1193 |
-
seen_edges.add(edge_id)
|
1194 |
-
|
1195 |
-
logger.info(
|
1196 |
-
f"Subgraph query successful | Node count: {len(result.nodes)} | Edge count: {len(result.edges)}"
|
1197 |
-
)
|
1198 |
-
|
1199 |
-
except Exception as e:
|
1200 |
-
logger.error(f"Error retrieving knowledge graph: {e}")
|
1201 |
-
|
1202 |
-
return result
|
1203 |
-
|
1204 |
-
|
1205 |
-
N_T = {
|
1206 |
-
NameSpace.KV_STORE_FULL_DOCS: "LIGHTRAG_DOC_FULL",
|
1207 |
-
NameSpace.KV_STORE_TEXT_CHUNKS: "LIGHTRAG_DOC_CHUNKS",
|
1208 |
-
NameSpace.VECTOR_STORE_CHUNKS: "LIGHTRAG_DOC_CHUNKS",
|
1209 |
-
NameSpace.VECTOR_STORE_ENTITIES: "LIGHTRAG_GRAPH_NODES",
|
1210 |
-
NameSpace.VECTOR_STORE_RELATIONSHIPS: "LIGHTRAG_GRAPH_EDGES",
|
1211 |
-
}
|
1212 |
-
|
1213 |
-
|
1214 |
-
def namespace_to_table_name(namespace: str) -> str:
|
1215 |
-
for k, v in N_T.items():
|
1216 |
-
if is_namespace(namespace, k):
|
1217 |
-
return v
|
1218 |
-
|
1219 |
-
|
1220 |
-
TABLES = {
|
1221 |
-
"LIGHTRAG_DOC_FULL": {
|
1222 |
-
"ddl": """CREATE TABLE LIGHTRAG_DOC_FULL (
|
1223 |
-
id varchar(256),
|
1224 |
-
workspace varchar(1024),
|
1225 |
-
doc_name varchar(1024),
|
1226 |
-
content CLOB,
|
1227 |
-
meta JSON,
|
1228 |
-
content_summary varchar(1024),
|
1229 |
-
content_length NUMBER,
|
1230 |
-
status varchar(256),
|
1231 |
-
chunks_count NUMBER,
|
1232 |
-
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
1233 |
-
updatetime TIMESTAMP DEFAULT NULL,
|
1234 |
-
error varchar(4096)
|
1235 |
-
)"""
|
1236 |
-
},
|
1237 |
-
"LIGHTRAG_DOC_CHUNKS": {
|
1238 |
-
"ddl": """CREATE TABLE LIGHTRAG_DOC_CHUNKS (
|
1239 |
-
id varchar(256),
|
1240 |
-
workspace varchar(1024),
|
1241 |
-
full_doc_id varchar(256),
|
1242 |
-
status varchar(256),
|
1243 |
-
chunk_order_index NUMBER,
|
1244 |
-
tokens NUMBER,
|
1245 |
-
content CLOB,
|
1246 |
-
content_vector VECTOR,
|
1247 |
-
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
1248 |
-
updatetime TIMESTAMP DEFAULT NULL
|
1249 |
-
)"""
|
1250 |
-
},
|
1251 |
-
"LIGHTRAG_GRAPH_NODES": {
|
1252 |
-
"ddl": """CREATE TABLE LIGHTRAG_GRAPH_NODES (
|
1253 |
-
id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
|
1254 |
-
workspace varchar(1024),
|
1255 |
-
name varchar(2048),
|
1256 |
-
entity_type varchar(1024),
|
1257 |
-
description CLOB,
|
1258 |
-
source_chunk_id varchar(256),
|
1259 |
-
content CLOB,
|
1260 |
-
content_vector VECTOR,
|
1261 |
-
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
1262 |
-
updatetime TIMESTAMP DEFAULT NULL
|
1263 |
-
)"""
|
1264 |
-
},
|
1265 |
-
"LIGHTRAG_GRAPH_EDGES": {
|
1266 |
-
"ddl": """CREATE TABLE LIGHTRAG_GRAPH_EDGES (
|
1267 |
-
id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
|
1268 |
-
workspace varchar(1024),
|
1269 |
-
source_name varchar(2048),
|
1270 |
-
target_name varchar(2048),
|
1271 |
-
weight NUMBER,
|
1272 |
-
keywords CLOB,
|
1273 |
-
description CLOB,
|
1274 |
-
source_chunk_id varchar(256),
|
1275 |
-
content CLOB,
|
1276 |
-
content_vector VECTOR,
|
1277 |
-
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
1278 |
-
updatetime TIMESTAMP DEFAULT NULL
|
1279 |
-
)"""
|
1280 |
-
},
|
1281 |
-
"LIGHTRAG_LLM_CACHE": {
|
1282 |
-
"ddl": """CREATE TABLE LIGHTRAG_LLM_CACHE (
|
1283 |
-
id varchar(256) PRIMARY KEY,
|
1284 |
-
workspace varchar(1024),
|
1285 |
-
cache_mode varchar(256),
|
1286 |
-
model_name varchar(256),
|
1287 |
-
original_prompt clob,
|
1288 |
-
return_value clob,
|
1289 |
-
embedding CLOB,
|
1290 |
-
embedding_shape NUMBER,
|
1291 |
-
embedding_min NUMBER,
|
1292 |
-
embedding_max NUMBER,
|
1293 |
-
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
1294 |
-
updatetime TIMESTAMP DEFAULT NULL
|
1295 |
-
)"""
|
1296 |
-
},
|
1297 |
-
"LIGHTRAG_GRAPH": {
|
1298 |
-
"ddl": """CREATE OR REPLACE PROPERTY GRAPH lightrag_graph
|
1299 |
-
VERTEX TABLES (
|
1300 |
-
lightrag_graph_nodes KEY (id)
|
1301 |
-
LABEL entity
|
1302 |
-
PROPERTIES (id,workspace,name) -- ,entity_type,description,source_chunk_id)
|
1303 |
-
)
|
1304 |
-
EDGE TABLES (
|
1305 |
-
lightrag_graph_edges KEY (id)
|
1306 |
-
SOURCE KEY (source_name) REFERENCES lightrag_graph_nodes(name)
|
1307 |
-
DESTINATION KEY (target_name) REFERENCES lightrag_graph_nodes(name)
|
1308 |
-
LABEL has_relation
|
1309 |
-
PROPERTIES (id,workspace,source_name,target_name) -- ,weight, keywords,description,source_chunk_id)
|
1310 |
-
) OPTIONS(ALLOW MIXED PROPERTY TYPES)"""
|
1311 |
-
},
|
1312 |
-
}
|
1313 |
-
|
1314 |
-
|
1315 |
-
SQL_TEMPLATES = {
|
1316 |
-
# SQL for KVStorage
|
1317 |
-
"get_by_id_full_docs": "select ID,content,status from LIGHTRAG_DOC_FULL where workspace=:workspace and ID=:id",
|
1318 |
-
"get_by_id_text_chunks": "select ID,TOKENS,content,CHUNK_ORDER_INDEX,FULL_DOC_ID,status from LIGHTRAG_DOC_CHUNKS where workspace=:workspace and ID=:id",
|
1319 |
-
"get_by_id_llm_response_cache": """SELECT id, original_prompt, NVL(return_value, '') as "return", cache_mode as "mode"
|
1320 |
-
FROM LIGHTRAG_LLM_CACHE WHERE workspace=:workspace AND id=:id""",
|
1321 |
-
"get_by_mode_id_llm_response_cache": """SELECT id, original_prompt, NVL(return_value, '') as "return", cache_mode as "mode"
|
1322 |
-
FROM LIGHTRAG_LLM_CACHE WHERE workspace=:workspace AND cache_mode=:cache_mode AND id=:id""",
|
1323 |
-
"get_by_ids_llm_response_cache": """SELECT id, original_prompt, NVL(return_value, '') as "return", cache_mode as "mode"
|
1324 |
-
FROM LIGHTRAG_LLM_CACHE WHERE workspace=:workspace AND id IN ({ids})""",
|
1325 |
-
"get_by_ids_full_docs": "select t.*,createtime as created_at from LIGHTRAG_DOC_FULL t where workspace=:workspace and ID in ({ids})",
|
1326 |
-
"get_by_ids_text_chunks": "select ID,TOKENS,content,CHUNK_ORDER_INDEX,FULL_DOC_ID from LIGHTRAG_DOC_CHUNKS where workspace=:workspace and ID in ({ids})",
|
1327 |
-
"get_by_status_ids_full_docs": "select id,status from LIGHTRAG_DOC_FULL t where workspace=:workspace AND status=:status and ID in ({ids})",
|
1328 |
-
"get_by_status_ids_text_chunks": "select id,status from LIGHTRAG_DOC_CHUNKS where workspace=:workspace and status=:status ID in ({ids})",
|
1329 |
-
"get_by_status_full_docs": "select id,status from LIGHTRAG_DOC_FULL t where workspace=:workspace AND status=:status",
|
1330 |
-
"get_by_status_text_chunks": "select id,status from LIGHTRAG_DOC_CHUNKS where workspace=:workspace and status=:status",
|
1331 |
-
"filter_keys": "select id from {table_name} where workspace=:workspace and id in ({ids})",
|
1332 |
-
"merge_doc_full": """MERGE INTO LIGHTRAG_DOC_FULL a
|
1333 |
-
USING DUAL
|
1334 |
-
ON (a.id = :id and a.workspace = :workspace)
|
1335 |
-
WHEN NOT MATCHED THEN
|
1336 |
-
INSERT(id,content,workspace) values(:id,:content,:workspace)""",
|
1337 |
-
"merge_chunk": """MERGE INTO LIGHTRAG_DOC_CHUNKS
|
1338 |
-
USING DUAL
|
1339 |
-
ON (id = :id and workspace = :workspace)
|
1340 |
-
WHEN NOT MATCHED THEN INSERT
|
1341 |
-
(id,content,workspace,tokens,chunk_order_index,full_doc_id,content_vector,status)
|
1342 |
-
values (:id,:content,:workspace,:tokens,:chunk_order_index,:full_doc_id,:content_vector,:status) """,
|
1343 |
-
"upsert_llm_response_cache": """MERGE INTO LIGHTRAG_LLM_CACHE a
|
1344 |
-
USING DUAL
|
1345 |
-
ON (a.id = :id)
|
1346 |
-
WHEN NOT MATCHED THEN
|
1347 |
-
INSERT (workspace,id,original_prompt,return_value,cache_mode)
|
1348 |
-
VALUES (:workspace,:id,:original_prompt,:return_value,:cache_mode)
|
1349 |
-
WHEN MATCHED THEN UPDATE
|
1350 |
-
SET original_prompt = :original_prompt,
|
1351 |
-
return_value = :return_value,
|
1352 |
-
cache_mode = :cache_mode,
|
1353 |
-
updatetime = SYSDATE""",
|
1354 |
-
# SQL for VectorStorage
|
1355 |
-
"entities": """SELECT name as entity_name FROM
|
1356 |
-
(SELECT id,name,VECTOR_DISTANCE(content_vector,vector(:embedding_string,{dimension},{dtype}),COSINE) as distance
|
1357 |
-
FROM LIGHTRAG_GRAPH_NODES WHERE workspace=:workspace)
|
1358 |
-
WHERE distance>:better_than_threshold ORDER BY distance ASC FETCH FIRST :top_k ROWS ONLY""",
|
1359 |
-
"relationships": """SELECT source_name as src_id, target_name as tgt_id FROM
|
1360 |
-
(SELECT id,source_name,target_name,VECTOR_DISTANCE(content_vector,vector(:embedding_string,{dimension},{dtype}),COSINE) as distance
|
1361 |
-
FROM LIGHTRAG_GRAPH_EDGES WHERE workspace=:workspace)
|
1362 |
-
WHERE distance>:better_than_threshold ORDER BY distance ASC FETCH FIRST :top_k ROWS ONLY""",
|
1363 |
-
"chunks": """SELECT id FROM
|
1364 |
-
(SELECT id,VECTOR_DISTANCE(content_vector,vector(:embedding_string,{dimension},{dtype}),COSINE) as distance
|
1365 |
-
FROM LIGHTRAG_DOC_CHUNKS WHERE workspace=:workspace)
|
1366 |
-
WHERE distance>:better_than_threshold ORDER BY distance ASC FETCH FIRST :top_k ROWS ONLY""",
|
1367 |
-
# SQL for GraphStorage
|
1368 |
-
"has_node": """SELECT * FROM GRAPH_TABLE (lightrag_graph
|
1369 |
-
MATCH (a)
|
1370 |
-
WHERE a.workspace=:workspace AND a.name=:node_id
|
1371 |
-
COLUMNS (a.name))""",
|
1372 |
-
"has_edge": """SELECT * FROM GRAPH_TABLE (lightrag_graph
|
1373 |
-
MATCH (a) -[e]-> (b)
|
1374 |
-
WHERE e.workspace=:workspace and a.workspace=:workspace and b.workspace=:workspace
|
1375 |
-
AND a.name=:source_node_id AND b.name=:target_node_id
|
1376 |
-
COLUMNS (e.source_name,e.target_name) )""",
|
1377 |
-
"node_degree": """SELECT count(1) as degree FROM GRAPH_TABLE (lightrag_graph
|
1378 |
-
MATCH (a)-[e]->(b)
|
1379 |
-
WHERE e.workspace=:workspace and a.workspace=:workspace and b.workspace=:workspace
|
1380 |
-
AND a.name=:node_id or b.name = :node_id
|
1381 |
-
COLUMNS (a.name))""",
|
1382 |
-
"get_node": """SELECT t1.name,t2.entity_type,t2.source_chunk_id as source_id,NVL(t2.description,'') AS description
|
1383 |
-
FROM GRAPH_TABLE (lightrag_graph
|
1384 |
-
MATCH (a)
|
1385 |
-
WHERE a.workspace=:workspace AND a.name=:node_id
|
1386 |
-
COLUMNS (a.name)
|
1387 |
-
) t1 JOIN LIGHTRAG_GRAPH_NODES t2 on t1.name=t2.name
|
1388 |
-
WHERE t2.workspace=:workspace""",
|
1389 |
-
"get_edge": """SELECT t1.source_id,t2.weight,t2.source_chunk_id as source_id,t2.keywords,
|
1390 |
-
NVL(t2.description,'') AS description,NVL(t2.KEYWORDS,'') AS keywords
|
1391 |
-
FROM GRAPH_TABLE (lightrag_graph
|
1392 |
-
MATCH (a)-[e]->(b)
|
1393 |
-
WHERE e.workspace=:workspace and a.workspace=:workspace and b.workspace=:workspace
|
1394 |
-
AND a.name=:source_node_id and b.name = :target_node_id
|
1395 |
-
COLUMNS (e.id,a.name as source_id)
|
1396 |
-
) t1 JOIN LIGHTRAG_GRAPH_EDGES t2 on t1.id=t2.id""",
|
1397 |
-
"get_node_edges": """SELECT source_name,target_name
|
1398 |
-
FROM GRAPH_TABLE (lightrag_graph
|
1399 |
-
MATCH (a)-[e]->(b)
|
1400 |
-
WHERE e.workspace=:workspace and a.workspace=:workspace and b.workspace=:workspace
|
1401 |
-
AND a.name=:source_node_id
|
1402 |
-
COLUMNS (a.name as source_name,b.name as target_name))""",
|
1403 |
-
"merge_node": """MERGE INTO LIGHTRAG_GRAPH_NODES a
|
1404 |
-
USING DUAL
|
1405 |
-
ON (a.workspace=:workspace and a.name=:name)
|
1406 |
-
WHEN NOT MATCHED THEN
|
1407 |
-
INSERT(workspace,name,entity_type,description,source_chunk_id,content,content_vector)
|
1408 |
-
values (:workspace,:name,:entity_type,:description,:source_chunk_id,:content,:content_vector)
|
1409 |
-
WHEN MATCHED THEN
|
1410 |
-
UPDATE SET
|
1411 |
-
entity_type=:entity_type,description=:description,source_chunk_id=:source_chunk_id,content=:content,content_vector=:content_vector,updatetime=SYSDATE""",
|
1412 |
-
"merge_edge": """MERGE INTO LIGHTRAG_GRAPH_EDGES a
|
1413 |
-
USING DUAL
|
1414 |
-
ON (a.workspace=:workspace and a.source_name=:source_name and a.target_name=:target_name)
|
1415 |
-
WHEN NOT MATCHED THEN
|
1416 |
-
INSERT(workspace,source_name,target_name,weight,keywords,description,source_chunk_id,content,content_vector)
|
1417 |
-
values (:workspace,:source_name,:target_name,:weight,:keywords,:description,:source_chunk_id,:content,:content_vector)
|
1418 |
-
WHEN MATCHED THEN
|
1419 |
-
UPDATE SET
|
1420 |
-
weight=:weight,keywords=:keywords,description=:description,source_chunk_id=:source_chunk_id,content=:content,content_vector=:content_vector,updatetime=SYSDATE""",
|
1421 |
-
"get_all_nodes": """WITH t0 AS (
|
1422 |
-
SELECT name AS id, entity_type AS label, entity_type, description,
|
1423 |
-
'["' || replace(source_chunk_id, '<SEP>', '","') || '"]' source_chunk_ids
|
1424 |
-
FROM lightrag_graph_nodes
|
1425 |
-
WHERE workspace = :workspace
|
1426 |
-
ORDER BY createtime DESC fetch first :limit rows only
|
1427 |
-
), t1 AS (
|
1428 |
-
SELECT t0.id, source_chunk_id
|
1429 |
-
FROM t0, JSON_TABLE ( source_chunk_ids, '$[*]' COLUMNS ( source_chunk_id PATH '$' ) )
|
1430 |
-
), t2 AS (
|
1431 |
-
SELECT t1.id, LISTAGG(t2.content, '\n') content
|
1432 |
-
FROM t1 LEFT JOIN lightrag_doc_chunks t2 ON t1.source_chunk_id = t2.id
|
1433 |
-
GROUP BY t1.id
|
1434 |
-
)
|
1435 |
-
SELECT t0.id, label, entity_type, description, t2.content
|
1436 |
-
FROM t0 LEFT JOIN t2 ON t0.id = t2.id""",
|
1437 |
-
"get_all_edges": """SELECT t1.id,t1.keywords as label,t1.keywords, t1.source_name as source, t1.target_name as target,
|
1438 |
-
t1.weight,t1.DESCRIPTION,t2.content
|
1439 |
-
FROM LIGHTRAG_GRAPH_EDGES t1
|
1440 |
-
LEFT JOIN LIGHTRAG_DOC_CHUNKS t2 on t1.source_chunk_id=t2.id
|
1441 |
-
WHERE t1.workspace=:workspace
|
1442 |
-
order by t1.CREATETIME DESC
|
1443 |
-
fetch first :limit rows only""",
|
1444 |
-
"get_statistics": """select count(distinct CASE WHEN type='node' THEN id END) as nodes_count,
|
1445 |
-
count(distinct CASE WHEN type='edge' THEN id END) as edges_count
|
1446 |
-
FROM (
|
1447 |
-
select 'node' as type, id FROM GRAPH_TABLE (lightrag_graph
|
1448 |
-
MATCH (a) WHERE a.workspace=:workspace columns(a.name as id))
|
1449 |
-
UNION
|
1450 |
-
select 'edge' as type, TO_CHAR(id) id FROM GRAPH_TABLE (lightrag_graph
|
1451 |
-
MATCH (a)-[e]->(b) WHERE e.workspace=:workspace columns(e.id))
|
1452 |
-
)""",
|
1453 |
-
# SQL for deletion
|
1454 |
-
"delete_vectors": "DELETE FROM LIGHTRAG_DOC_CHUNKS WHERE workspace=:workspace AND id IN ({ids})",
|
1455 |
-
"delete_entity": "DELETE FROM LIGHTRAG_GRAPH_NODES WHERE workspace=:workspace AND name=:entity_name",
|
1456 |
-
"delete_entity_relations": "DELETE FROM LIGHTRAG_GRAPH_EDGES WHERE workspace=:workspace AND (source_name=:entity_name OR target_name=:entity_name)",
|
1457 |
-
"delete_node": """DELETE FROM GRAPH_TABLE (lightrag_graph
|
1458 |
-
MATCH (a)
|
1459 |
-
WHERE a.workspace=:workspace AND a.name=:node_id
|
1460 |
-
ACTION DELETE a)""",
|
1461 |
-
# Drop tables
|
1462 |
-
"drop_specifiy_table_workspace": "DELETE FROM {table_name} WHERE workspace=:workspace",
|
1463 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|