zrguo commited on
Commit
d9fd40b
·
unverified ·
2 Parent(s): 860c8a5 a5325c0

Merge pull request #502 from ParisNeo/main

Browse files
api/README_LOLLMS.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LightRAG API Server
2
+
3
+ A powerful FastAPI-based server for managing and querying documents using LightRAG (Light Retrieval-Augmented Generation). This server provides a REST API interface for document management and intelligent querying using various LLM models through LoLLMS.
4
+
5
+ ## Features
6
+
7
+ - 🔍 Multiple search modes (naive, local, global, hybrid)
8
+ - 📡 Streaming and non-streaming responses
9
+ - 📝 Document management (insert, batch upload, clear)
10
+ - ⚙️ Highly configurable model parameters
11
+ - 📚 Support for text and file uploads
12
+ - 🔧 RESTful API with automatic documentation
13
+ - 🚀 Built with FastAPI for high performance
14
+
15
+ ## Prerequisites
16
+
17
+ - Python 3.8+
18
+ - LoLLMS server running locally or remotely
19
+ - Required Python packages:
20
+ - fastapi
21
+ - uvicorn
22
+ - lightrag
23
+ - pydantic
24
+
25
+ ## Installation
26
+ If you are using windows, you will need to donwload and install visual c++ build tools from [https://visualstudio.microsoft.com/visual-cpp-build-tools/ ](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
27
+ Make sure you install the VS 2022 C++ x64/x86 Build tools like from indivisual componants tab:
28
+ ![image](https://github.com/user-attachments/assets/3723e15b-0a2c-42ed-aebf-e595a9f9c946)
29
+
30
+ This is mandatory for builmding some modules.
31
+
32
+ 1. Clone the repository:
33
+ ```bash
34
+ git clone https://github.com/ParisNeo/LightRAG.git
35
+ cd api
36
+ ```
37
+
38
+ 2. Install dependencies:
39
+ ```bash
40
+ pip install -r requirements.txt
41
+ ```
42
+
43
+ 3. Make sure LoLLMS is running and accessible.
44
+
45
+ ## Configuration
46
+
47
+ The server can be configured using command-line arguments:
48
+
49
+ ```bash
50
+ python ollama_lightollama_lightrag_server.py --help
51
+ ```
52
+
53
+ Available options:
54
+
55
+ | Parameter | Default | Description |
56
+ |-----------|---------|-------------|
57
+ | --host | 0.0.0.0 | Server host |
58
+ | --port | 9621 | Server port |
59
+ | --model | mistral-nemo:latest | LLM model name |
60
+ | --embedding-model | bge-m3:latest | Embedding model name |
61
+ | --lollms-host | http://localhost:11434 | LoLLMS host URL |
62
+ | --working-dir | ./rag_storage | Working directory for RAG |
63
+ | --max-async | 4 | Maximum async operations |
64
+ | --max-tokens | 32768 | Maximum token size |
65
+ | --embedding-dim | 1024 | Embedding dimensions |
66
+ | --max-embed-tokens | 8192 | Maximum embedding token size |
67
+ | --input-file | ./book.txt | Initial input file |
68
+ | --log-level | INFO | Logging level |
69
+
70
+ ## Quick Start
71
+
72
+ 1. Basic usage with default settings:
73
+ ```bash
74
+ python ollama_lightrag_server.py
75
+ ```
76
+
77
+ 2. Custom configuration:
78
+ ```bash
79
+ python ollama_lightrag_server.py --model llama2:13b --port 8080 --working-dir ./custom_rag
80
+ ```
81
+
82
+ Make sure the models are installed in your lollms instance
83
+ ```bash
84
+ python ollama_lightrag_server.py --model mistral-nemo:latest --embedding-model bge-m3 --embedding-dim 1024
85
+ ```
86
+
87
+ ## API Endpoints
88
+
89
+ ### Query Endpoints
90
+
91
+ #### POST /query
92
+ Query the RAG system with options for different search modes.
93
+
94
+ ```bash
95
+ curl -X POST "http://localhost:9621/query" \
96
+ -H "Content-Type: application/json" \
97
+ -d '{"query": "Your question here", "mode": "hybrid"}'
98
+ ```
99
+
100
+ #### POST /query/stream
101
+ Stream responses from the RAG system.
102
+
103
+ ```bash
104
+ curl -X POST "http://localhost:9621/query/stream" \
105
+ -H "Content-Type: application/json" \
106
+ -d '{"query": "Your question here", "mode": "hybrid"}'
107
+ ```
108
+
109
+ ### Document Management Endpoints
110
+
111
+ #### POST /documents/text
112
+ Insert text directly into the RAG system.
113
+
114
+ ```bash
115
+ curl -X POST "http://localhost:9621/documents/text" \
116
+ -H "Content-Type: application/json" \
117
+ -d '{"text": "Your text content here", "description": "Optional description"}'
118
+ ```
119
+
120
+ #### POST /documents/file
121
+ Upload a single file to the RAG system.
122
+
123
+ ```bash
124
+ curl -X POST "http://localhost:9621/documents/file" \
125
+ -F "file=@/path/to/your/document.txt" \
126
+ -F "description=Optional description"
127
+ ```
128
+
129
+ #### POST /documents/batch
130
+ Upload multiple files at once.
131
+
132
+ ```bash
133
+ curl -X POST "http://localhost:9621/documents/batch" \
134
+ -F "files=@/path/to/doc1.txt" \
135
+ -F "files=@/path/to/doc2.txt"
136
+ ```
137
+
138
+ #### DELETE /documents
139
+ Clear all documents from the RAG system.
140
+
141
+ ```bash
142
+ curl -X DELETE "http://localhost:9621/documents"
143
+ ```
144
+
145
+ ### Utility Endpoints
146
+
147
+ #### GET /health
148
+ Check server health and configuration.
149
+
150
+ ```bash
151
+ curl "http://localhost:9621/health"
152
+ ```
153
+
154
+ ## Development
155
+
156
+ ### Running in Development Mode
157
+
158
+ ```bash
159
+ uvicorn ollama_lightrag_server:app --reload --port 9621
160
+ ```
161
+
162
+ ### API Documentation
163
+
164
+ When the server is running, visit:
165
+ - Swagger UI: http://localhost:9621/docs
166
+ - ReDoc: http://localhost:9621/redoc
167
+
168
+
169
+ ## License
170
+
171
+ This project is licensed under the MIT License - see the LICENSE file for details.
172
+
173
+ ## Acknowledgments
174
+
175
+ - Built with [FastAPI](https://fastapi.tiangolo.com/)
176
+ - Uses [LightRAG](https://github.com/HKUDS/LightRAG) for document processing
177
+ - Powered by [LoLLMS](https://lollms.ai/) for LLM inference
api/lollms_lightrag_server.py ADDED
@@ -0,0 +1,401 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, HTTPException, File, UploadFile, Form
2
+ from pydantic import BaseModel
3
+ import logging
4
+ import argparse
5
+ from lightrag import LightRAG, QueryParam
6
+ from lightrag.llm import lollms_model_complete, lollms_embed
7
+ from lightrag.utils import EmbeddingFunc
8
+ from typing import Optional, List
9
+ from enum import Enum
10
+ from pathlib import Path
11
+ import shutil
12
+ import aiofiles
13
+ from ascii_colors import trace_exception
14
+
15
+
16
+ def parse_args():
17
+ parser = argparse.ArgumentParser(
18
+ description="LightRAG FastAPI Server with separate working and input directories"
19
+ )
20
+
21
+ # Server configuration
22
+ parser.add_argument(
23
+ "--host", default="0.0.0.0", help="Server host (default: 0.0.0.0)"
24
+ )
25
+ parser.add_argument(
26
+ "--port", type=int, default=9621, help="Server port (default: 9621)"
27
+ )
28
+
29
+ # Directory configuration
30
+ parser.add_argument(
31
+ "--working-dir",
32
+ default="./rag_storage",
33
+ help="Working directory for RAG storage (default: ./rag_storage)",
34
+ )
35
+ parser.add_argument(
36
+ "--input-dir",
37
+ default="./inputs",
38
+ help="Directory containing input documents (default: ./inputs)",
39
+ )
40
+
41
+ # Model configuration
42
+ parser.add_argument(
43
+ "--model",
44
+ default="mistral-nemo:latest",
45
+ help="LLM model name (default: mistral-nemo:latest)",
46
+ )
47
+ parser.add_argument(
48
+ "--embedding-model",
49
+ default="bge-m3:latest",
50
+ help="Embedding model name (default: bge-m3:latest)",
51
+ )
52
+ parser.add_argument(
53
+ "--lollms-host",
54
+ default="http://localhost:11434",
55
+ help="lollms host URL (default: http://localhost:11434)",
56
+ )
57
+
58
+ # RAG configuration
59
+ parser.add_argument(
60
+ "--max-async", type=int, default=4, help="Maximum async operations (default: 4)"
61
+ )
62
+ parser.add_argument(
63
+ "--max-tokens",
64
+ type=int,
65
+ default=32768,
66
+ help="Maximum token size (default: 32768)",
67
+ )
68
+ parser.add_argument(
69
+ "--embedding-dim",
70
+ type=int,
71
+ default=1024,
72
+ help="Embedding dimensions (default: 1024)",
73
+ )
74
+ parser.add_argument(
75
+ "--max-embed-tokens",
76
+ type=int,
77
+ default=8192,
78
+ help="Maximum embedding token size (default: 8192)",
79
+ )
80
+
81
+ # Logging configuration
82
+ parser.add_argument(
83
+ "--log-level",
84
+ default="INFO",
85
+ choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
86
+ help="Logging level (default: INFO)",
87
+ )
88
+
89
+ return parser.parse_args()
90
+
91
+
92
+ class DocumentManager:
93
+ """Handles document operations and tracking"""
94
+
95
+ def __init__(self, input_dir: str, supported_extensions: tuple = (".txt", ".md")):
96
+ self.input_dir = Path(input_dir)
97
+ self.supported_extensions = supported_extensions
98
+ self.indexed_files = set()
99
+
100
+ # Create input directory if it doesn't exist
101
+ self.input_dir.mkdir(parents=True, exist_ok=True)
102
+
103
+ def scan_directory(self) -> List[Path]:
104
+ """Scan input directory for new files"""
105
+ new_files = []
106
+ for ext in self.supported_extensions:
107
+ for file_path in self.input_dir.rglob(f"*{ext}"):
108
+ if file_path not in self.indexed_files:
109
+ new_files.append(file_path)
110
+ return new_files
111
+
112
+ def mark_as_indexed(self, file_path: Path):
113
+ """Mark a file as indexed"""
114
+ self.indexed_files.add(file_path)
115
+
116
+ def is_supported_file(self, filename: str) -> bool:
117
+ """Check if file type is supported"""
118
+ return any(filename.lower().endswith(ext) for ext in self.supported_extensions)
119
+
120
+
121
+ # Pydantic models
122
+ class SearchMode(str, Enum):
123
+ naive = "naive"
124
+ local = "local"
125
+ global_ = "global"
126
+ hybrid = "hybrid"
127
+
128
+
129
+ class QueryRequest(BaseModel):
130
+ query: str
131
+ mode: SearchMode = SearchMode.hybrid
132
+ stream: bool = False
133
+
134
+
135
+ class QueryResponse(BaseModel):
136
+ response: str
137
+
138
+
139
+ class InsertTextRequest(BaseModel):
140
+ text: str
141
+ description: Optional[str] = None
142
+
143
+
144
+ class InsertResponse(BaseModel):
145
+ status: str
146
+ message: str
147
+ document_count: int
148
+
149
+
150
+ def create_app(args):
151
+ # Setup logging
152
+ logging.basicConfig(
153
+ format="%(levelname)s:%(message)s", level=getattr(logging, args.log_level)
154
+ )
155
+
156
+ # Initialize FastAPI app
157
+ app = FastAPI(
158
+ title="LightRAG API",
159
+ description="API for querying text using LightRAG with separate storage and input directories",
160
+ )
161
+
162
+ # Create working directory if it doesn't exist
163
+ Path(args.working_dir).mkdir(parents=True, exist_ok=True)
164
+
165
+ # Initialize document manager
166
+ doc_manager = DocumentManager(args.input_dir)
167
+
168
+ # Initialize RAG
169
+ rag = LightRAG(
170
+ working_dir=args.working_dir,
171
+ llm_model_func=lollms_model_complete,
172
+ llm_model_name=args.model,
173
+ llm_model_max_async=args.max_async,
174
+ llm_model_max_token_size=args.max_tokens,
175
+ llm_model_kwargs={
176
+ "host": args.lollms_host,
177
+ "options": {"num_ctx": args.max_tokens},
178
+ },
179
+ embedding_func=EmbeddingFunc(
180
+ embedding_dim=args.embedding_dim,
181
+ max_token_size=args.max_embed_tokens,
182
+ func=lambda texts: lollms_embed(
183
+ texts, embed_model=args.embedding_model, host=args.lollms_host
184
+ ),
185
+ ),
186
+ )
187
+
188
+ @app.on_event("startup")
189
+ async def startup_event():
190
+ """Index all files in input directory during startup"""
191
+ try:
192
+ new_files = doc_manager.scan_directory()
193
+ for file_path in new_files:
194
+ try:
195
+ # Use async file reading
196
+ async with aiofiles.open(file_path, "r", encoding="utf-8") as f:
197
+ content = await f.read()
198
+ # Use the async version of insert directly
199
+ await rag.ainsert(content)
200
+ doc_manager.mark_as_indexed(file_path)
201
+ logging.info(f"Indexed file: {file_path}")
202
+ except Exception as e:
203
+ trace_exception(e)
204
+ logging.error(f"Error indexing file {file_path}: {str(e)}")
205
+
206
+ logging.info(f"Indexed {len(new_files)} documents from {args.input_dir}")
207
+
208
+ except Exception as e:
209
+ logging.error(f"Error during startup indexing: {str(e)}")
210
+
211
+ @app.post("/documents/scan")
212
+ async def scan_for_new_documents():
213
+ """Manually trigger scanning for new documents"""
214
+ try:
215
+ new_files = doc_manager.scan_directory()
216
+ indexed_count = 0
217
+
218
+ for file_path in new_files:
219
+ try:
220
+ with open(file_path, "r", encoding="utf-8") as f:
221
+ content = f.read()
222
+ rag.insert(content)
223
+ doc_manager.mark_as_indexed(file_path)
224
+ indexed_count += 1
225
+ except Exception as e:
226
+ logging.error(f"Error indexing file {file_path}: {str(e)}")
227
+
228
+ return {
229
+ "status": "success",
230
+ "indexed_count": indexed_count,
231
+ "total_documents": len(doc_manager.indexed_files),
232
+ }
233
+ except Exception as e:
234
+ raise HTTPException(status_code=500, detail=str(e))
235
+
236
+ @app.post("/documents/upload")
237
+ async def upload_to_input_dir(file: UploadFile = File(...)):
238
+ """Upload a file to the input directory"""
239
+ try:
240
+ if not doc_manager.is_supported_file(file.filename):
241
+ raise HTTPException(
242
+ status_code=400,
243
+ detail=f"Unsupported file type. Supported types: {doc_manager.supported_extensions}",
244
+ )
245
+
246
+ file_path = doc_manager.input_dir / file.filename
247
+ with open(file_path, "wb") as buffer:
248
+ shutil.copyfileobj(file.file, buffer)
249
+
250
+ # Immediately index the uploaded file
251
+ with open(file_path, "r", encoding="utf-8") as f:
252
+ content = f.read()
253
+ rag.insert(content)
254
+ doc_manager.mark_as_indexed(file_path)
255
+
256
+ return {
257
+ "status": "success",
258
+ "message": f"File uploaded and indexed: {file.filename}",
259
+ "total_documents": len(doc_manager.indexed_files),
260
+ }
261
+ except Exception as e:
262
+ raise HTTPException(status_code=500, detail=str(e))
263
+
264
+ @app.post("/query", response_model=QueryResponse)
265
+ async def query_text(request: QueryRequest):
266
+ try:
267
+ response = await rag.aquery(
268
+ request.query,
269
+ param=QueryParam(mode=request.mode, stream=request.stream),
270
+ )
271
+
272
+ if request.stream:
273
+ result = ""
274
+ async for chunk in response:
275
+ result += chunk
276
+ return QueryResponse(response=result)
277
+ else:
278
+ return QueryResponse(response=response)
279
+ except Exception as e:
280
+ raise HTTPException(status_code=500, detail=str(e))
281
+
282
+ @app.post("/query/stream")
283
+ async def query_text_stream(request: QueryRequest):
284
+ try:
285
+ response = rag.query(
286
+ request.query, param=QueryParam(mode=request.mode, stream=True)
287
+ )
288
+
289
+ async def stream_generator():
290
+ async for chunk in response:
291
+ yield chunk
292
+
293
+ return stream_generator()
294
+ except Exception as e:
295
+ raise HTTPException(status_code=500, detail=str(e))
296
+
297
+ @app.post("/documents/text", response_model=InsertResponse)
298
+ async def insert_text(request: InsertTextRequest):
299
+ try:
300
+ rag.insert(request.text)
301
+ return InsertResponse(
302
+ status="success",
303
+ message="Text successfully inserted",
304
+ document_count=len(rag),
305
+ )
306
+ except Exception as e:
307
+ raise HTTPException(status_code=500, detail=str(e))
308
+
309
+ @app.post("/documents/file", response_model=InsertResponse)
310
+ async def insert_file(file: UploadFile = File(...), description: str = Form(None)):
311
+ try:
312
+ content = await file.read()
313
+
314
+ if file.filename.endswith((".txt", ".md")):
315
+ text = content.decode("utf-8")
316
+ rag.insert(text)
317
+ else:
318
+ raise HTTPException(
319
+ status_code=400,
320
+ detail="Unsupported file type. Only .txt and .md files are supported",
321
+ )
322
+
323
+ return InsertResponse(
324
+ status="success",
325
+ message=f"File '{file.filename}' successfully inserted",
326
+ document_count=len(rag),
327
+ )
328
+ except UnicodeDecodeError:
329
+ raise HTTPException(status_code=400, detail="File encoding not supported")
330
+ except Exception as e:
331
+ raise HTTPException(status_code=500, detail=str(e))
332
+
333
+ @app.post("/documents/batch", response_model=InsertResponse)
334
+ async def insert_batch(files: List[UploadFile] = File(...)):
335
+ try:
336
+ inserted_count = 0
337
+ failed_files = []
338
+
339
+ for file in files:
340
+ try:
341
+ content = await file.read()
342
+ if file.filename.endswith((".txt", ".md")):
343
+ text = content.decode("utf-8")
344
+ rag.insert(text)
345
+ inserted_count += 1
346
+ else:
347
+ failed_files.append(f"{file.filename} (unsupported type)")
348
+ except Exception as e:
349
+ failed_files.append(f"{file.filename} ({str(e)})")
350
+
351
+ status_message = f"Successfully inserted {inserted_count} documents"
352
+ if failed_files:
353
+ status_message += f". Failed files: {', '.join(failed_files)}"
354
+
355
+ return InsertResponse(
356
+ status="success" if inserted_count > 0 else "partial_success",
357
+ message=status_message,
358
+ document_count=len(rag),
359
+ )
360
+ except Exception as e:
361
+ raise HTTPException(status_code=500, detail=str(e))
362
+
363
+ @app.delete("/documents", response_model=InsertResponse)
364
+ async def clear_documents():
365
+ try:
366
+ rag.text_chunks = []
367
+ rag.entities_vdb = None
368
+ rag.relationships_vdb = None
369
+ return InsertResponse(
370
+ status="success",
371
+ message="All documents cleared successfully",
372
+ document_count=0,
373
+ )
374
+ except Exception as e:
375
+ raise HTTPException(status_code=500, detail=str(e))
376
+
377
+ @app.get("/health")
378
+ async def get_status():
379
+ """Get current system status"""
380
+ return {
381
+ "status": "healthy",
382
+ "working_directory": str(args.working_dir),
383
+ "input_directory": str(args.input_dir),
384
+ "indexed_files": len(doc_manager.indexed_files),
385
+ "configuration": {
386
+ "model": args.model,
387
+ "embedding_model": args.embedding_model,
388
+ "max_tokens": args.max_tokens,
389
+ "lollms_host": args.lollms_host,
390
+ },
391
+ }
392
+
393
+ return app
394
+
395
+
396
+ if __name__ == "__main__":
397
+ args = parse_args()
398
+ import uvicorn
399
+
400
+ app = create_app(args)
401
+ uvicorn.run(app, host=args.host, port=args.port)
api/ollama_lightrag_server.py CHANGED
@@ -3,7 +3,7 @@ from pydantic import BaseModel
3
  import logging
4
  import argparse
5
  from lightrag import LightRAG, QueryParam
6
- from lightrag.llm import ollama_model_complete, ollama_embedding
7
  from lightrag.utils import EmbeddingFunc
8
  from typing import Optional, List
9
  from enum import Enum
@@ -179,7 +179,7 @@ def create_app(args):
179
  embedding_func=EmbeddingFunc(
180
  embedding_dim=args.embedding_dim,
181
  max_token_size=args.max_embed_tokens,
182
- func=lambda texts: ollama_embedding(
183
  texts, embed_model=args.embedding_model, host=args.ollama_host
184
  ),
185
  ),
 
3
  import logging
4
  import argparse
5
  from lightrag import LightRAG, QueryParam
6
+ from lightrag.llm import ollama_model_complete, ollama_embed
7
  from lightrag.utils import EmbeddingFunc
8
  from typing import Optional, List
9
  from enum import Enum
 
179
  embedding_func=EmbeddingFunc(
180
  embedding_dim=args.embedding_dim,
181
  max_token_size=args.max_embed_tokens,
182
+ func=lambda texts: ollama_embed(
183
  texts, embed_model=args.embedding_model, host=args.ollama_host
184
  ),
185
  ),
lightrag/llm.py CHANGED
@@ -339,6 +339,62 @@ async def ollama_model_if_cache(
339
  return response["message"]["content"]
340
 
341
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
342
  @lru_cache(maxsize=1)
343
  def initialize_lmdeploy_pipeline(
344
  model,
@@ -597,6 +653,32 @@ async def ollama_model_complete(
597
  )
598
 
599
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
600
  @retry(
601
  stop=stop_after_attempt(3),
602
  wait=wait_exponential(multiplier=1, min=4, max=10),
@@ -1026,6 +1108,35 @@ async def ollama_embed(texts: list[str], embed_model, **kwargs) -> np.ndarray:
1026
  return data["embeddings"]
1027
 
1028
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1029
  class Model(BaseModel):
1030
  """
1031
  This is a Pydantic model class named 'Model' that is used to define a custom language model.
 
339
  return response["message"]["content"]
340
 
341
 
342
+ async def lollms_model_if_cache(
343
+ model,
344
+ prompt,
345
+ system_prompt=None,
346
+ history_messages=[],
347
+ base_url="http://localhost:9600",
348
+ **kwargs,
349
+ ) -> Union[str, AsyncIterator[str]]:
350
+ """Client implementation for lollms generation."""
351
+
352
+ stream = True if kwargs.get("stream") else False
353
+
354
+ # Extract lollms specific parameters
355
+ request_data = {
356
+ "prompt": prompt,
357
+ "model_name": model,
358
+ "personality": kwargs.get("personality", -1),
359
+ "n_predict": kwargs.get("n_predict", None),
360
+ "stream": stream,
361
+ "temperature": kwargs.get("temperature", 0.1),
362
+ "top_k": kwargs.get("top_k", 50),
363
+ "top_p": kwargs.get("top_p", 0.95),
364
+ "repeat_penalty": kwargs.get("repeat_penalty", 0.8),
365
+ "repeat_last_n": kwargs.get("repeat_last_n", 40),
366
+ "seed": kwargs.get("seed", None),
367
+ "n_threads": kwargs.get("n_threads", 8),
368
+ }
369
+
370
+ # Prepare the full prompt including history
371
+ full_prompt = ""
372
+ if system_prompt:
373
+ full_prompt += f"{system_prompt}\n"
374
+ for msg in history_messages:
375
+ full_prompt += f"{msg['role']}: {msg['content']}\n"
376
+ full_prompt += prompt
377
+
378
+ request_data["prompt"] = full_prompt
379
+
380
+ async with aiohttp.ClientSession() as session:
381
+ if stream:
382
+
383
+ async def inner():
384
+ async with session.post(
385
+ f"{base_url}/lollms_generate", json=request_data
386
+ ) as response:
387
+ async for line in response.content:
388
+ yield line.decode().strip()
389
+
390
+ return inner()
391
+ else:
392
+ async with session.post(
393
+ f"{base_url}/lollms_generate", json=request_data
394
+ ) as response:
395
+ return await response.text()
396
+
397
+
398
  @lru_cache(maxsize=1)
399
  def initialize_lmdeploy_pipeline(
400
  model,
 
653
  )
654
 
655
 
656
+ async def lollms_model_complete(
657
+ prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
658
+ ) -> Union[str, AsyncIterator[str]]:
659
+ """Complete function for lollms model generation."""
660
+
661
+ # Extract and remove keyword_extraction from kwargs if present
662
+ keyword_extraction = kwargs.pop("keyword_extraction", None)
663
+
664
+ # Get model name from config
665
+ model_name = kwargs["hashing_kv"].global_config["llm_model_name"]
666
+
667
+ # If keyword extraction is needed, we might need to modify the prompt
668
+ # or add specific parameters for JSON output (if lollms supports it)
669
+ if keyword_extraction:
670
+ # Note: You might need to adjust this based on how lollms handles structured output
671
+ pass
672
+
673
+ return await lollms_model_if_cache(
674
+ model_name,
675
+ prompt,
676
+ system_prompt=system_prompt,
677
+ history_messages=history_messages,
678
+ **kwargs,
679
+ )
680
+
681
+
682
  @retry(
683
  stop=stop_after_attempt(3),
684
  wait=wait_exponential(multiplier=1, min=4, max=10),
 
1108
  return data["embeddings"]
1109
 
1110
 
1111
+ async def lollms_embed(
1112
+ texts: List[str], embed_model=None, base_url="http://localhost:9600", **kwargs
1113
+ ) -> np.ndarray:
1114
+ """
1115
+ Generate embeddings for a list of texts using lollms server.
1116
+
1117
+ Args:
1118
+ texts: List of strings to embed
1119
+ embed_model: Model name (not used directly as lollms uses configured vectorizer)
1120
+ base_url: URL of the lollms server
1121
+ **kwargs: Additional arguments passed to the request
1122
+
1123
+ Returns:
1124
+ np.ndarray: Array of embeddings
1125
+ """
1126
+ async with aiohttp.ClientSession() as session:
1127
+ embeddings = []
1128
+ for text in texts:
1129
+ request_data = {"text": text}
1130
+
1131
+ async with session.post(
1132
+ f"{base_url}/lollms_embed", json=request_data
1133
+ ) as response:
1134
+ result = await response.json()
1135
+ embeddings.append(result["vector"])
1136
+
1137
+ return np.array(embeddings)
1138
+
1139
+
1140
  class Model(BaseModel):
1141
  """
1142
  This is a Pydantic model class named 'Model' that is used to define a custom language model.