Merge branch 'main' into rerank
Browse files- README-zh.md +12 -0
- README.md +12 -0
- lightrag/lightrag.py +9 -3
README-zh.md
CHANGED
@@ -859,6 +859,18 @@ rag = LightRAG(
|
|
859 |
|
860 |
</details>
|
861 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
862 |
## 编辑实体和关系
|
863 |
|
864 |
LightRAG现在支持全面的知识图谱管理功能,允许您在知识图谱中创建、编辑和删除实体和关系。
|
|
|
859 |
|
860 |
</details>
|
861 |
|
862 |
+
### LightRAG实例间的数据隔离
|
863 |
+
|
864 |
+
通过 workspace 参数可以不同实现不同LightRAG实例之间的存储数据隔离。LightRAG在初始化后workspace就已经确定,之后修改workspace是无效的。下面是不同类型的存储实现工作空间的方式:
|
865 |
+
|
866 |
+
- **对于本地基于文件的数据库,数据隔离通过工作空间子目录实现:** JsonKVStorage, JsonDocStatusStorage, NetworkXStorage, NanoVectorDBStorage, FaissVectorDBStorage。
|
867 |
+
- **对于将数据存储在集合(collection)中的数据库,通过在集合名称前添加工作空间前缀来实现:** RedisKVStorage, RedisDocStatusStorage, MilvusVectorDBStorage, QdrantVectorDBStorage, MongoKVStorage, MongoDocStatusStorage, MongoVectorDBStorage, MongoGraphStorage, PGGraphStorage。
|
868 |
+
- **对于关系型数据库,数据隔离通过向表中添加 `workspace` 字段进行数据的逻辑隔离:** PGKVStorage, PGVectorStorage, PGDocStatusStorage。
|
869 |
+
|
870 |
+
* **对于Neo4j图数据库,通过label来实现数据的逻辑隔离**:Neo4JStorage
|
871 |
+
|
872 |
+
为了保持对遗留数据的兼容,在未配置工作空间时PostgreSQL的默认工作空间为`default`,Neo4j的默认工作空间为`base`。对于所有的外部存储,系统都提供了专用的工作空间环境变量,用于覆盖公共的 `WORKSPACE`环境变量配置。这些适用于指定存储类型的工作空间环境变量为:`REDIS_WORKSPACE`, `MILVUS_WORKSPACE`, `QDRANT_WORKSPACE`, `MONGODB_WORKSPACE`, `POSTGRES_WORKSPACE`, `NEO4J_WORKSPACE`。
|
873 |
+
|
874 |
## 编辑实体和关系
|
875 |
|
876 |
LightRAG现在支持全面的知识图谱管理功能,允许您在知识图谱中创建、编辑和删除实体和关系。
|
README.md
CHANGED
@@ -239,6 +239,7 @@ A full list of LightRAG init parameters:
|
|
239 |
| **Parameter** | **Type** | **Explanation** | **Default** |
|
240 |
|--------------|----------|-----------------|-------------|
|
241 |
| **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
|
|
242 |
| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage` | `JsonKVStorage` |
|
243 |
| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
|
244 |
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage` | `NetworkXStorage` |
|
@@ -905,6 +906,17 @@ async def initialize_rag():
|
|
905 |
|
906 |
</details>
|
907 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
908 |
## Edit Entities and Relations
|
909 |
|
910 |
LightRAG now supports comprehensive knowledge graph management capabilities, allowing you to create, edit, and delete entities and relationships within your knowledge graph.
|
|
|
239 |
| **Parameter** | **Type** | **Explanation** | **Default** |
|
240 |
|--------------|----------|-----------------|-------------|
|
241 |
| **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
242 |
+
| **workspace** | str | Workspace name for data isolation between different LightRAG Instances | |
|
243 |
| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage` | `JsonKVStorage` |
|
244 |
| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
|
245 |
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage` | `NetworkXStorage` |
|
|
|
906 |
|
907 |
</details>
|
908 |
|
909 |
+
### Data Isolation Between LightRAG Instances
|
910 |
+
|
911 |
+
The `workspace` parameter ensures data isolation between different LightRAG instances. Once initialized, the `workspace` is immutable and cannot be changed.Here is how workspaces are implemented for different types of storage:
|
912 |
+
|
913 |
+
- **For local file-based databases, data isolation is achieved through workspace subdirectories:** `JsonKVStorage`, `JsonDocStatusStorage`, `NetworkXStorage`, `NanoVectorDBStorage`, `FaissVectorDBStorage`.
|
914 |
+
- **For databases that store data in collections, it's done by adding a workspace prefix to the collection name:** `RedisKVStorage`, `RedisDocStatusStorage`, `MilvusVectorDBStorage`, `QdrantVectorDBStorage`, `MongoKVStorage`, `MongoDocStatusStorage`, `MongoVectorDBStorage`, `MongoGraphStorage`, `PGGraphStorage`.
|
915 |
+
- **For relational databases, data isolation is achieved by adding a `workspace` field to the tables for logical data separation:** `PGKVStorage`, `PGVectorStorage`, `PGDocStatusStorage`.
|
916 |
+
- **For the Neo4j graph database, logical data isolation is achieved through labels:** `Neo4JStorage`
|
917 |
+
|
918 |
+
To maintain compatibility with legacy data, the default workspace for PostgreSQL is `default` and for Neo4j is `base` when no workspace is configured. For all external storages, the system provides dedicated workspace environment variables to override the common `WORKSPACE` environment variable configuration. These storage-specific workspace environment variables are: `REDIS_WORKSPACE`, `MILVUS_WORKSPACE`, `QDRANT_WORKSPACE`, `MONGODB_WORKSPACE`, `POSTGRES_WORKSPACE`, `NEO4J_WORKSPACE`.
|
919 |
+
|
920 |
## Edit Entities and Relations
|
921 |
|
922 |
LightRAG now supports comprehensive knowledge graph management capabilities, allowing you to create, edit, and delete entities and relationships within your knowledge graph.
|
lightrag/lightrag.py
CHANGED
@@ -919,9 +919,15 @@ class LightRAG:
|
|
919 |
# Get first document's file path and total count for job name
|
920 |
first_doc_id, first_doc = next(iter(to_process_docs.items()))
|
921 |
first_doc_path = first_doc.file_path
|
922 |
-
|
923 |
-
|
924 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
925 |
total_files = len(to_process_docs)
|
926 |
job_name = f"{path_prefix}[{total_files} files]"
|
927 |
pipeline_status["job_name"] = job_name
|
|
|
919 |
# Get first document's file path and total count for job name
|
920 |
first_doc_id, first_doc = next(iter(to_process_docs.items()))
|
921 |
first_doc_path = first_doc.file_path
|
922 |
+
|
923 |
+
# Handle cases where first_doc_path is None
|
924 |
+
if first_doc_path:
|
925 |
+
path_prefix = first_doc_path[:20] + (
|
926 |
+
"..." if len(first_doc_path) > 20 else ""
|
927 |
+
)
|
928 |
+
else:
|
929 |
+
path_prefix = "unknown_source"
|
930 |
+
|
931 |
total_files = len(to_process_docs)
|
932 |
job_name = f"{path_prefix}[{total_files} files]"
|
933 |
pipeline_status["job_name"] = job_name
|