Martin Perez-Guevara Martin Perez-Guevara Young Jin Kim commited on
Commit
883904b
·
1 Parent(s): 833afae

feat: Integrate Opik for Enhanced Observability in LlamaIndex LLM Interactions

Browse files

This pull request demonstrates how to create a new Opik project when using LiteLLM for LlamaIndex-based LLM calls. The primary goal is to enable detailed tracing, monitoring, and logging of LLM interactions in a new Opik project_name, particularly when using LiteLLM as an API proxy. This enhancement allows for better debugging, performance analysis, observability when using LightRAG with LiteLLM and Opik.

**Motivation:**

As our application's reliance on Large Language Models (LLMs) grows, robust observability becomes crucial for maintaining system health, optimizing performance, and understanding usage patterns. Integrating Opik provides the following key benefits:

1. **Improved Debugging:** Enables end-to-end tracing of requests through the LlamaIndex and LiteLLM layers, making it easier to identify and resolve issues or performance bottlenecks.
2. **Comprehensive Performance Monitoring:** Allows for the collection of vital metrics such as LLM call latency, token usage, and error rates. This data can be filtered and analyzed within Opik using project names and tags.
3. **Effective Cost Management:** Facilitates tracking of token consumption associated with specific requests or projects, leading to better cost control and optimization.
4. **Deeper Usage Insights:** Provides a clearer understanding of how different components of the application or various projects are utilizing LLM capabilities.

These changes empower developers to seamlessly add observability to their LlamaIndex-based LLM workflows, especially when leveraging LiteLLM, by passing necessary Opik metadata.

**Changes Made:**

1. **`lightrag/llm/llama_index_impl.py`:**
* Modified the `llama_index_complete_if_cache` function:
* The `**kwargs` parameter, which previously handled additional arguments, has been refined. A dedicated `chat_kwargs={}` parameter is now used to pass keyword arguments directly to the `model.achat()` method. This change ensures that vendor-specific parameters, such as LiteLLM's `litellm_params` for Opik metadata, are correctly propagated.
* The logic for retrieving `llm_instance` from `kwargs` was removed as `model` is now a direct parameter, simplifying the function.
* Updated the `llama_index_complete` function:
* Ensured that `**kwargs` (which may include `chat_kwargs` or other parameters intended for `llama_index_complete_if_cache`) are correctly passed down.

2. **`examples/unofficial-sample/lightrag_llamaindex_litellm_demo.py`:**
* This existing demo file was updated to align with the changes in `llama_index_impl.py`.
* The `llm_model_func` now passes an empty `chat_kwargs={}` by default to `llama_index_complete_if_cache` if no specific chat arguments are needed, maintaining compatibility with the updated function signature. This file serves as a baseline example without Opik integration.

3. **`examples/unofficial-sample/lightrag_llamaindex_litellm_opik_demo.py` (New File):**
* A new example script has been added to specifically demonstrate the integration of LightRAG with LlamaIndex, LiteLLM, and Opik for observability.
* The `llm_model_func` in this demo showcases how to construct the `chat_kwargs` dictionary.
* It includes `litellm_params` with a `metadata` field for Opik, containing `project_name` and `tags`. This provides a clear example of how to send observability data to Opik.
* The call to `llama_index_complete_if_cache` within `llm_model_func` passes these `chat_kwargs`, ensuring Opik metadata is included in the LiteLLM request.

These modifications provide a more robust and extensible way to pass parameters to the underlying LLM calls, specifically enabling the integration of observability tools like Opik.

Co-authored-by: Martin Perez-Guevara <[email protected]>
Co-authored-by: Young Jin Kim <[email protected]>

examples/unofficial-sample/lightrag_llamaindex_litellm_demo.py CHANGED
@@ -53,7 +53,6 @@ async def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwar
53
  prompt,
54
  system_prompt=system_prompt,
55
  history_messages=history_messages,
56
- **kwargs,
57
  )
58
  return response
59
  except Exception as e:
 
53
  prompt,
54
  system_prompt=system_prompt,
55
  history_messages=history_messages,
 
56
  )
57
  return response
58
  except Exception as e:
examples/unofficial-sample/lightrag_llamaindex_litellm_opik_demo.py ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from lightrag import LightRAG, QueryParam
3
+ from lightrag.llm.llama_index_impl import (
4
+ llama_index_complete_if_cache,
5
+ llama_index_embed,
6
+ )
7
+ from lightrag.utils import EmbeddingFunc
8
+ from llama_index.llms.litellm import LiteLLM
9
+ from llama_index.embeddings.litellm import LiteLLMEmbedding
10
+ import asyncio
11
+ import nest_asyncio
12
+
13
+ nest_asyncio.apply()
14
+
15
+ from lightrag.kg.shared_storage import initialize_pipeline_status
16
+
17
+ # Configure working directory
18
+ WORKING_DIR = "./index_default"
19
+ print(f"WORKING_DIR: {WORKING_DIR}")
20
+
21
+ # Model configuration
22
+ LLM_MODEL = os.environ.get("LLM_MODEL", "gemma-3-4b")
23
+ print(f"LLM_MODEL: {LLM_MODEL}")
24
+ EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "arctic-embed")
25
+ print(f"EMBEDDING_MODEL: {EMBEDDING_MODEL}")
26
+ EMBEDDING_MAX_TOKEN_SIZE = int(os.environ.get("EMBEDDING_MAX_TOKEN_SIZE", 8192))
27
+ print(f"EMBEDDING_MAX_TOKEN_SIZE: {EMBEDDING_MAX_TOKEN_SIZE}")
28
+
29
+ # LiteLLM configuration
30
+ LITELLM_URL = os.environ.get("LITELLM_URL", "http://localhost:4000")
31
+ print(f"LITELLM_URL: {LITELLM_URL}")
32
+ LITELLM_KEY = os.environ.get("LITELLM_KEY", "sk-4JdvGFKqSA3S0k_5p0xufw")
33
+
34
+ if not os.path.exists(WORKING_DIR):
35
+ os.mkdir(WORKING_DIR)
36
+
37
+
38
+ # Initialize LLM function
39
+ async def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs):
40
+ try:
41
+ # Initialize LiteLLM if not in kwargs
42
+ if "llm_instance" not in kwargs:
43
+ llm_instance = LiteLLM(
44
+ model=f"openai/{LLM_MODEL}", # Format: "provider/model_name"
45
+ api_base=LITELLM_URL,
46
+ api_key=LITELLM_KEY,
47
+ temperature=0.7,
48
+ )
49
+ kwargs["llm_instance"] = llm_instance
50
+
51
+ chat_kwargs = {}
52
+ chat_kwargs["litellm_params"] = {
53
+ "metadata": {
54
+ "opik": {
55
+ "project_name": "lightrag_llamaindex_litellm_opik_demo",
56
+ "tags": ["lightrag", "litellm"],
57
+ }
58
+ }
59
+ }
60
+
61
+ response = await llama_index_complete_if_cache(
62
+ kwargs["llm_instance"],
63
+ prompt,
64
+ system_prompt=system_prompt,
65
+ history_messages=history_messages,
66
+ chat_kwargs=chat_kwargs,
67
+ )
68
+ return response
69
+ except Exception as e:
70
+ print(f"LLM request failed: {str(e)}")
71
+ raise
72
+
73
+
74
+ # Initialize embedding function
75
+ async def embedding_func(texts):
76
+ try:
77
+ embed_model = LiteLLMEmbedding(
78
+ model_name=f"openai/{EMBEDDING_MODEL}",
79
+ api_base=LITELLM_URL,
80
+ api_key=LITELLM_KEY,
81
+ )
82
+ return await llama_index_embed(texts, embed_model=embed_model)
83
+ except Exception as e:
84
+ print(f"Embedding failed: {str(e)}")
85
+ raise
86
+
87
+
88
+ # Get embedding dimension
89
+ async def get_embedding_dim():
90
+ test_text = ["This is a test sentence."]
91
+ embedding = await embedding_func(test_text)
92
+ embedding_dim = embedding.shape[1]
93
+ print(f"embedding_dim={embedding_dim}")
94
+ return embedding_dim
95
+
96
+
97
+ async def initialize_rag():
98
+ embedding_dimension = await get_embedding_dim()
99
+
100
+ rag = LightRAG(
101
+ working_dir=WORKING_DIR,
102
+ llm_model_func=llm_model_func,
103
+ embedding_func=EmbeddingFunc(
104
+ embedding_dim=embedding_dimension,
105
+ max_token_size=EMBEDDING_MAX_TOKEN_SIZE,
106
+ func=embedding_func,
107
+ ),
108
+ )
109
+
110
+ await rag.initialize_storages()
111
+ await initialize_pipeline_status()
112
+
113
+ return rag
114
+
115
+
116
+ def main():
117
+ # Initialize RAG instance
118
+ rag = asyncio.run(initialize_rag())
119
+
120
+ # Insert example text
121
+ with open("./book.txt", "r", encoding="utf-8") as f:
122
+ rag.insert(f.read())
123
+
124
+ # Test different query modes
125
+ print("\nNaive Search:")
126
+ print(
127
+ rag.query(
128
+ "What are the top themes in this story?", param=QueryParam(mode="naive")
129
+ )
130
+ )
131
+
132
+ print("\nLocal Search:")
133
+ print(
134
+ rag.query(
135
+ "What are the top themes in this story?", param=QueryParam(mode="local")
136
+ )
137
+ )
138
+
139
+ print("\nGlobal Search:")
140
+ print(
141
+ rag.query(
142
+ "What are the top themes in this story?", param=QueryParam(mode="global")
143
+ )
144
+ )
145
+
146
+ print("\nHybrid Search:")
147
+ print(
148
+ rag.query(
149
+ "What are the top themes in this story?", param=QueryParam(mode="hybrid")
150
+ )
151
+ )
152
+
153
+
154
+ if __name__ == "__main__":
155
+ main()
lightrag/llm/llama_index_impl.py CHANGED
@@ -95,7 +95,7 @@ async def llama_index_complete_if_cache(
95
  prompt: str,
96
  system_prompt: Optional[str] = None,
97
  history_messages: List[dict] = [],
98
- **kwargs,
99
  ) -> str:
100
  """Complete the prompt using LlamaIndex."""
101
  try:
@@ -122,13 +122,7 @@ async def llama_index_complete_if_cache(
122
  # Add current prompt
123
  formatted_messages.append(ChatMessage(role=MessageRole.USER, content=prompt))
124
 
125
- # Get LLM instance from kwargs
126
- if "llm_instance" not in kwargs:
127
- raise ValueError("llm_instance must be provided in kwargs")
128
- llm = kwargs["llm_instance"]
129
-
130
- # Get response
131
- response: ChatResponse = await llm.achat(messages=formatted_messages)
132
 
133
  # In newer versions, the response is in message.content
134
  content = response.message.content
 
95
  prompt: str,
96
  system_prompt: Optional[str] = None,
97
  history_messages: List[dict] = [],
98
+ chat_kwargs = {},
99
  ) -> str:
100
  """Complete the prompt using LlamaIndex."""
101
  try:
 
122
  # Add current prompt
123
  formatted_messages.append(ChatMessage(role=MessageRole.USER, content=prompt))
124
 
125
+ response: ChatResponse = await model.achat(messages=formatted_messages, **chat_kwargs)
 
 
 
 
 
 
126
 
127
  # In newer versions, the response is in message.content
128
  content = response.message.content