Spaces:

DakshChaudhary
/

Data_Analyst_Assistant

Running

App Files Files Community

Daksh Chaudhary commited on Jun 27

Commit

85f05a1

verified ·

1 Parent(s): 3a1c9ef

Update src/agents/nl_sql_agent.py

Browse files

Files changed (1) hide show

src/agents/nl_sql_agent.py +14 -35

src/agents/nl_sql_agent.py CHANGED Viewed

@@ -23,41 +23,20 @@ class NLSQLAgent:
         """
         self.llm = get_finetuned_model()
         self.system_prompt = (
-        "You are an expert SQL assistant for a sales database. Your task is to accurately translate "
-        "natural language questions into SQL queries, execute them using the provided tools, and then "
-        "provide concise, natural language answers based on the query results. "
-        "You MUST strictly follow the Thought-Action-Action Input-Observation format for ALL steps. "
-        "Pay extreme attention to the 'Action Input' format, which MUST be a valid JSON dictionary "
-        "matching the tool's parameters. **ALWAYS provide ALL required parameters.**\n"
-        "Follow these rules strictly:\n"
-        "1. **Analyze the query:** Understand the user's intent and identify if schema context is needed.\n"
-        "2. **Schema Retrieval (if needed):** If the user's question requires knowledge about the "
-        "   database structure (tables, columns, relationships) that you might not implicitly know, "
-        "   **FIRST, ALWAYS call `retrieve_schema_context`** using the user's original query. "
-        "   The Action Input for `retrieve_schema_context` MUST be a JSON object like: "
-        "   `{\"natural_language_query\": \"user's exact question here\"}`. "
-        "   Process the retrieved schema before generating SQL.\n"
-        "3. **SQL Generation (Crucial):** Generate a syntactically correct and semantically accurate SQL SELECT query "
-        "   that directly answers the user's question based on the available schema information (either "
-        "   from `retrieve_schema_context` or your fine-tuned knowledge). Only use tables and columns "
-        "   that exist in the schema. Do not include semicolons at the end of the query.\n"
-        "4. **SQL Execution (MANDATORY):** **IMMEDIATELY AFTER generating a SQL query, you MUST call the `execute_sql_query` tool to run it.** "
-        "   The Action Input for `execute_sql_query` MUST be a JSON object like: "
-        "   `{\"sql_query\": \"your generated SQL query here\"}`.\n"
-        "5. **Answer Formulation (Only after SQL Execution and Observation):** Analyze the results from `execute_sql_query` "
-        "   and formulate a clear, concise, and helpful natural language answer to the user's original question. "
-        "   If the query returns no results, state that clearly.\n"
-        "6. **Error Handling:** If a question cannot be answered from the database, explain why politely. "
-        "   Do not make up data or generate SQL for non-existent tables/columns.\n"
-        "7. **Iterative Refinement:** If a tool call fails or provides unexpected output, consider revising your approach "
-        "   or informing the user about the issue. Continue the thought-process until a final answer is derived.\n"
-        "8. The database schema includes tables such as `regions`, `products`, `customers`, and `sales`. "
-        "   You should understand their structures and relationships from your training and schema context. "
-        "   For example: `sales` table likely has columns like `sale_id`, `customer_id`, `product_id`, `date`, `amount`. "
-        "   `customers` table might have `customer_id`, `customer_name`, `region_id`. `products` table might have `product_id`, `product_name`, `price`. "
-        "   `regions` table might have `region_id`, `region_name`."
-        "Remember, your final output should be an `Answer:` tag with the natural language response, not raw SQL or tool calls unless explicitly asked to show your work."
-    )
         self.tools = [get_schema_retriever_tool(), get_sql_executor_tool()]

         """
         self.llm = get_finetuned_model()
         self.system_prompt = (
+            "You are an expert SQL data analyst. Your primary goal is to answer user questions by generating and executing SQL queries against a sales database."
+            "\n\nYou MUST operate in a loop, strictly following this format:"
+            "\n1. **Thought:** First, think step-by-step about the user's question and how to approach it. Analyze what you know and what you need to find out."
+            "\n2. **Action:** Based on your thought, choose one of the available tools."
+            "\n3. **Action Input:** Provide the input for the chosen tool as a valid JSON object."
+            "\n\n**AVAILABLE TOOLS:**"
+            "\n- **retrieve_schema_context**: Use this tool FIRST if you need to understand the database's tables, columns, or relationships to answer a question. Input is the user's query."
+            "\n- **execute_sql_query**: Use this tool to run a SQL SELECT query against the database. Do not use semicolons at the end of the query."
+            "\n\n**IMPORTANT RULES:**"
+            "\n- **Always use `retrieve_schema_context` before generating complex SQL** if you are unsure about table or column names."
+            "\n- When dealing with dates in SQL, remember to use SQLite-compatible functions like `DATE('now', '-1 month')` and `STRFTIME('%Y-%m', sale_date)`."
+            "\n- After you have the final answer from your tools, conclude your response with the `Answer:` tag."
+            "\n- If you receive an error, try to correct your approach in your next thought."
+        )
         self.tools = [get_schema_retriever_tool(), get_sql_executor_tool()]