File size: 48,982 Bytes
a23082c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944

### 3.5. `research_agent.py` Refactoring

*   **Rationale:** To improve browser instance management, error handling, and configuration.
*   **Proposals:**
    1.  **Browser Lifecycle Management:** Instead of initializing the browser (`start_chrome`) at the module level, manage its lifecycle explicitly. Options:
        *   Initialize the browser within the agent's initialization and provide a method or tool to explicitly close it (`kill_browser`) when the agent's task is done or the application shuts down.
        *   Use a context manager (`with start_chrome(...) as browser:`) if the browser is only needed for a specific scope within a tool call (less likely for a persistent agent).
        *   Ensure `kill_browser` is reliably called. Perhaps the `planner_agent` could invoke a cleanup tool/method on the `research_agent` after its tasks are complete.
    2.  **Configuration:** Move hardcoded Chrome options to configuration. Externalize API keys/IDs if not already done (they seem to be using `os.getenv`, which is good).
    3.  **Robust Error Handling:** For browser interaction tools (`visit`, `get_text_by_css`, `click_element`), raise specific custom exceptions instead of returning error strings. This allows for more structured error handling by the agent or workflow.
    4.  **Tool Consolidation (Optional):** The agent has many tools. Consider if some related tools (e.g., different search APIs) could be consolidated behind a single tool that internally chooses the best source, or if the LLM handles the large toolset effectively.

*   **Diff Patch (Illustrative - Configuration & Browser Init):**

    ```diff
    --- a/research_agent.py
    +++ b/research_agent.py
    @@ -1,5 +1,6 @@
     import os
     import time
+    import logging
     from typing import List
 
     from llama_index.core.agent.workflow import ReActAgent
@@ -15,17 +16,21 @@
     from helium import start_chrome, go_to, find_all, Text, kill_browser
     from helium import get_driver
 
+    logger = logging.getLogger(__name__)
+
 # 1. Helium
-chrome_options = webdriver.ChromeOptions()
-chrome_options.add_argument("--no-sandbox")
-chrome_options.add_argument("--disable-dev-shm-usage")
-chrome_options.add_experimental_option("prefs", {
-    "download.prompt_for_download": False,
-    "plugins.always_open_pdf_externally": True,
-    "profile.default_content_settings.popups": 0
-})
-
-browser = start_chrome(headless=True, options=chrome_options)
+# Browser instance should be managed, not global at module level
+# browser = start_chrome(headless=True, options=chrome_options)
+
+def get_chrome_options():
+    options = webdriver.ChromeOptions()
+    if os.getenv("RESEARCH_AGENT_CHROME_NO_SANDBOX", "true").lower() == "true":
+        options.add_argument("--no-sandbox")
+    if os.getenv("RESEARCH_AGENT_CHROME_DISABLE_DEV_SHM", "true").lower() == "true":
+        options.add_argument("--disable-dev-shm-usage")
+    # Add other options from config as needed
+    # options.add_experimental_option(...) # Example
+    return options
 
 def visit(url: str, wait_seconds: float = 2.0) -> str |None:
     """
@@ -36,10 +41,11 @@
         wait_seconds (float): Time to wait after navigation.
     """
     try:
+        # Assumes browser is available in context (e.g., class member)
         go_to(url)
         time.sleep(wait_seconds)
         return f"Visited: {url}"
     except Exception as e:
+       logger.error(f"Error visiting {url}: {e}", exc_info=True)
        return f"Error visiting {url}: {e}"
 
 def get_text_by_css(selector: str) -> List[str] | str:
@@ -52,13 +58,15 @@
         List[str]: List of text contents.
     """
     try:
+        # Assumes browser/helium context is active
         if selector.lower() == 'body':
             elements = find_all(Text())
         else:
             elements = find_all(selector)
         texts = [elem.web_element.text for elem in elements]
-        print(f"Extracted {len(texts)} elements for selector \'{selector}\'")
+        logger.info(f"Extracted {len(texts)} elements for selector \'{selector}\'")
         return texts
     except Exception as e:
+        logger.error(f"Error extracting text for selector {selector}: {e}", exc_info=True)
         return f"Error extracting text for selector {selector}: {e}"
 
 def get_page_html() -> str:
@@ -70,9 +78,11 @@
         str: HTML content, or empty string on error.
     """
     try:
+        # Assumes browser/helium context is active
         driver = get_driver()
         html = driver.page_source
         return html
     except Exception as e:
+        logger.error(f"Error extracting HTML: {e}", exc_info=True)
         return f"Error extracting HTML: {e}"
 
 def click_element(selector: str, index_element: int = 0) -> str:
@@ -83,10 +93,12 @@
         selector (str): CSS selector of the element to click.
     """
     try:
+        # Assumes browser/helium context is active
         element = find_all(selector)[index_element]
         element.click()
         time.sleep(1)
         return f"Clicked element matching selector \'{selector}\'"
     except Exception as e:
+        logger.error(f"Error clicking element {selector}: {e}", exc_info=True)
         return f"Error clicking element {selector}: {e}"
 
 def search_item_ctrl_f(text: str, nth_result: int = 1) -> str:
@@ -97,6 +109,7 @@
         nth_result: Which occurrence to jump to (default: 1)
     """
     elements = browser.find_elements(By.XPATH, f"//*[contains(text(), \'{text}\')]")
+    # Assumes browser is available in context
     if nth_result > len(elements):
         return f"Match n°{nth_result} not found (only {len(elements)} matches found)"
     result = f"Found {len(elements)} matches for \'{text}\'."
@@ -107,19 +120,22 @@
 
def go_back() -> None:
     """Goes back to previous page."""
     browser.back()
+    # Assumes browser is available in context
 
 def close_popups() -> None:
     """
     Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows! This does not work on cookie consent banners.
     """
     webdriver.ActionChains(browser).send_keys(Keys.ESCAPE).perform()
+    # Assumes browser is available in context
 
 def close() -> None:
     """
     Close the browser instance.
     """
     try:
+        # Assumes kill_browser is appropriate here
         kill_browser()
-        print("Browser closed")
+        logger.info("Browser closed via kill_browser()")
     except Exception as e:
-        print(f"Error closing browser: {e}")
+        logger.error(f"Error closing browser: {e}", exc_info=True)
 
 visit_tool = FunctionTool.from_defaults(
     fn=visit,
@@ -240,9 +256,14 @@
 
 
def initialize_research_agent() -> ReActAgent:
+    # Browser initialization should happen here or be managed externally
+    # Example: browser = start_chrome(headless=True, options=get_chrome_options())
+    # Ensure browser instance is passed to tools or accessible via agent state/class
+
+    llm_model_name = os.getenv("RESEARCH_AGENT_LLM_MODEL", "models/gemini-1.5-pro")
     llm = GoogleGenAI(
         api_key=os.getenv("GEMINI_API_KEY"),
-        model="models/gemini-1.5-pro",
+        model=llm_model_name,
     )
 
     system_prompt = """\
    ```


### 3.6. `text_analyzer_agent.py` Refactoring

*   **Rationale:** To improve configuration management and error handling.
*   **Proposals:**
    1.  **Configuration:** Move the hardcoded LLM model name (`models/gemini-1.5-pro`) to environment variables or a configuration file.
    2.  **Prompt Management:** Move the `analyze_text` prompt to a separate template file.
    3.  **Error Handling:** In `extract_text_from_pdf`, consider raising specific exceptions (e.g., `PDFDownloadError`, `PDFParsingError`) instead of returning error strings, allowing the agent to handle failures more gracefully.

*   **Diff Patch (Illustrative - Configuration & Error Handling):**

    ```diff
    --- a/text_analyzer_agent.py
    +++ b/text_analyzer_agent.py
    @@ -6,6 +6,14 @@
 
     logger = logging.getLogger(__name__)
 
+    class PDFExtractionError(Exception):
+        """Custom exception for PDF extraction failures."""
+        pass
+
+    class PDFDownloadError(PDFExtractionError):
+        """Custom exception for PDF download failures."""
+        pass
+
 def extract_text_from_pdf(source: str) -> str:
     """
     Extract raw text from a PDF file on disk or at a URL.
@@ -19,21 +27,21 @@
         try:
             resp = requests.get(source, timeout=10)
             resp.raise_for_status()
-        except Exception as e:
-            return f"Error downloading PDF from {source}: {e}"
+        except requests.exceptions.RequestException as e:
+            raise PDFDownloadError(f"Error downloading PDF from {source}: {e}") from e
 
         try:
             tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".pdf")
             tmp.write(resp.content)
             tmp.flush()
             tmp_path = tmp.name
             tmp.close()
-        except Exception as e:
-            return f"Error writing temp PDF file: {e}"
+        except IOError as e:
+            raise PDFExtractionError(f"Error writing temp PDF file: {e}") from e
         path = tmp_path
     else:
         path = source
 
     # Now extract text from the PDF on disk
     if not os.path.isfile(path):
-        return f"PDF not found: {path}"
+        raise PDFExtractionError(f"PDF not found: {path}")
 
     text = ""
 
@@ -41,10 +49,10 @@
         reader = PdfReader(path)
         pages = [page.extract_text() or "" for page in reader.pages]
         text = "\n".join(pages)
-        print(f"Extracted {len(pages)} pages of text from PDF")
+        logger.info(f"Extracted {len(pages)} pages of text from PDF: {path}")
     except Exception as e:
         # Catch specific PyPDF2 errors if possible, otherwise general Exception
-        return f"Error reading PDF: {e}"
+        raise PDFExtractionError(f"Error reading PDF {path}: {e}") from e
 
     # Clean up temporary file if one was created
     if source.lower().startswith(("http://", "https://")):
@@ -67,6 +75,14 @@
         str: A plain-text string containing:
              • A “Summary:” section with bullet points.
              • A “Facts:” section with bullet points.
+    """
+    # Load prompt from file ideally
+    prompt_template = """You are an expert analyst.
+
+    Please analyze the following text and produce a plain-text response
+    with two sections:
+
+    Summary:
+    • Provide 2–3 concise bullet points summarizing the main ideas.
+
+    Facts:
+    • List each verifiable fact found in the text as a bullet point.
+
+    Respond with exactly that format—no JSON, no extra commentary.
+
+    Text to analyze:
+    \"\"\"
+    {text}
+    \"\"\"
     """
     # Build the prompt to guide the LLM’s output format
     input_prompt = f"""You are an expert analyst.
@@ -84,13 +100,14 @@
     {text}
     \"\"\"
     """
+    input_prompt = prompt_template.format(text=text)
 
     # Use the LLM to generate the analysis
+    llm_model_name = os.getenv("TEXT_ANALYZER_LLM_MODEL", "models/gemini-1.5-pro")
     llm = GoogleGenAI(
         api_key=os.getenv("GEMINI_API_KEY"),
-        model="models/gemini-1.5-pro",
+        model=llm_model_name,
     )
 
     generated = llm.complete(input_prompt)
@@ -124,9 +141,10 @@
         FunctionAgent: Configured analysis agent.
     """
 
+    llm_model_name = os.getenv("TEXT_ANALYZER_AGENT_LLM_MODEL", "models/gemini-1.5-pro")
     llm = GoogleGenAI(
         api_key=os.getenv("GEMINI_API_KEY"),
-        model="models/gemini-1.5-pro",
+        model=llm_model_name,
     )
 
     system_prompt = """\
    ```


### 3.7. `reasoning_agent.py` Refactoring

*   **Rationale:** To simplify the agent structure, improve configuration, and potentially optimize LLM usage.
*   **Proposals:**
    1.  **Configuration:** Move hardcoded LLM model names (`models/gemini-1.5-pro`, `o4-mini`) and the API key environment variable name (`ALPAFLOW_OPENAI_API_KEY`) to configuration.
    2.  **Prompt Management:** Move the detailed CoT prompt from `reasoning_tool_fn` to a separate template file.
    3.  **Agent Structure Simplification:** Given the rigid workflow (call tool -> handoff), consider replacing the `ReActAgent` with a simpler `FunctionAgent` that directly calls the `reasoning_tool` and formats the output before handing off. Alternatively, evaluate if the `reasoning_tool` logic could be integrated as a direct LLM call within agents that need CoT (like `planner_agent`), potentially removing the need for a separate `reasoning_agent` altogether, unless its specific CoT prompt/model (`o4-mini`) is crucial.

*   **Diff Patch (Illustrative - Configuration & Prompt Loading):**

    ```diff
    --- a/reasoning_agent.py
    +++ b/reasoning_agent.py
    @@ -1,10 +1,19 @@
     import os
+    import logging
 
     from llama_index.core.agent.workflow import ReActAgent
     from llama_index.llms.google_genai import GoogleGenAI
     from llama_index.core.tools import FunctionTool
     from llama_index.llms.openai import OpenAI
 
+    logger = logging.getLogger(__name__)
+
+    def load_prompt_from_file(filename="reasoning_tool_prompt.txt") -> str:
+        try:
+            with open(filename, "r") as f:
+                return f.read()
+        except FileNotFoundError:
+            logger.error(f"Prompt file {filename} not found.")
+            return "Perform chain-of-thought reasoning on the context: {context}"
+
 def reasoning_tool_fn(context: str) -> str:
     """
     Perform end-to-end chain-of-thought reasoning over the full multi-agent workflow context,
@@ -17,45 +26,12 @@
         str: A structured reasoning trace with numbered thought steps, intermediate checks,
              and a concise final recommendation or conclusion.
     """
-    prompt = f"""You are an expert reasoning engine.  You have the following full context of a multi-agent workflow:
-
-    {context}
-    
-    Your job is to:
-    1. **Comprehension**  
-       - Read the entire question or problem statement carefully.  
-       - Identify key terms, constraints, and desired outcomes.
-
-    2. **Decomposition**  
-       - Break down the problem into logical sub-steps or sub-questions.  
-       - Ensure each sub-step is necessary and sufficient to progress toward a solution.
-
-    3. **Chain-of-Thought**  
-       - Articulate your internal reasoning in clear, numbered steps.  
-       - At each step, state your assumptions, derive implications, and check for consistency.
-
-    4. **Intermediate Verification**  
-       - After each reasoning step, validate your conclusion against the problem’s constraints.  
-       - If a contradiction or uncertainty arises, revisit and refine the previous step.
-
-    5. **Synthesis**  
-       - Once all sub-steps are resolved, integrate the intermediate results into a cohesive answer.  
-       - Ensure the final answer directly addresses the user’s request and all specified criteria.
-
-    6. **Clarity & Precision**  
-       - Use formal, precise language.  
-       - Avoid ambiguity: define any technical terms you introduce.  
-       - Provide just enough detail to justify each conclusion without digression.
-
-    7. **Final Answer**  
-       - Present a concise, well-structured response.  
-       - If appropriate, include a brief summary of your reasoning steps.
-    
-    Respond with your reasoning steps followed by the final recommendation.
-    """
+    prompt_template = load_prompt_from_file()
+    prompt = prompt_template.format(context=context)
 
+    reasoning_llm_model = os.getenv("REASONING_TOOL_LLM_MODEL", "o4-mini")
+    # Use specific API key if needed, e.g., ALPAFLOW_OPENAI_API_KEY
+    reasoning_api_key_env = os.getenv("REASONING_TOOL_API_KEY_ENV", "ALPAFLOW_OPENAI_API_KEY")
+    reasoning_api_key = os.getenv(reasoning_api_key_env)
     llm = OpenAI(
-        model="o4-mini",
-        api_key=os.getenv("ALPAFLOW_OPENAI_API_KEY"),
+        model=reasoning_llm_model,
+        api_key=reasoning_api_key,
         reasoning_effort="high"
     )
     response = llm.complete(prompt)
@@ -74,9 +50,10 @@
     """
     Create a pure reasoning agent with no tools, relying solely on chain-of-thought.
     """
+    agent_llm_model = os.getenv("REASONING_AGENT_LLM_MODEL", "models/gemini-1.5-pro")
     llm = GoogleGenAI(
         api_key=os.getenv("GEMINI_API_KEY"),
-        model="models/gemini-1.5-pro",
+        model=agent_llm_model,
     )
 
     system_prompt = """\
    ```


### 3.8. `planner_agent.py` Refactoring

*   **Rationale:** To improve configuration management and prompt handling.
*   **Proposals:**
    1.  **Configuration:** Move the hardcoded LLM model name (`models/gemini-1.5-pro`) to environment variables or a configuration file.
    2.  **Prompt Management:** Move the system prompt and the prompts within the `plan` and `synthesize_and_respond` functions to separate template files for better readability and maintainability.

*   **Diff Patch (Illustrative - Configuration & Prompt Loading):**

    ```diff
    --- a/planner_agent.py
    +++ b/planner_agent.py
    @@ -1,10 +1,19 @@
     import os
+    import logging
     from typing import List, Any
 
     from llama_index.core.agent.workflow import FunctionAgent, ReActAgent
     from llama_index.core.tools import FunctionTool
     from llama_index.llms.google_genai import GoogleGenAI
 
+    logger = logging.getLogger(__name__)
+
+    def load_prompt_from_file(filename: str, default_prompt: str) -> str:
+        try:
+            with open(filename, "r") as f:
+                return f.read()
+        except FileNotFoundError:
+            logger.warning(f"Prompt file {filename} not found. Using default.")
+            return default_prompt
+
 def plan(objective: str) -> List[str]:
     """
     Generate a list of sub-questions from the given objective.
@@ -15,14 +24,16 @@
     Returns:
         List[str]: A list of sub-steps as strings.
     """
-    input_prompt: str = (
+    default_plan_prompt = (
         "You are a research assistant. "
         "Given an objective, break it down into a list of concise, actionable sub-steps.\n"
         f"Objective: {objective}\n"
         "Sub-steps (one per line):"
     )
+    plan_prompt_template = load_prompt_from_file("planner_plan_prompt.txt", default_plan_prompt)
+    input_prompt = plan_prompt_template.format(objective=objective)
 
+    llm_model_name = os.getenv("PLANNER_TOOL_LLM_MODEL", "models/gemini-1.5-pro")
     llm = GoogleGenAI(
         api_key=os.getenv("GEMINI_API_KEY"),
-        model="models/gemini-1.5-pro",
+        model=llm_model_name,
     )
 
 
@@ -44,13 +55,16 @@
     Returns:
         str: A unified, well-structured response addressing the original objective.
     """
-    # Join each ready-made QA block directly
     summary_blocks = "\n".join(results)
-    input_prompt = f"""You are an expert synthesizer. Given the following sub-questions and their answers,
+    default_synth_prompt = f"""You are an expert synthesizer. Given the following sub-questions and their answers,
     produce a single, coherent, comprehensive report that addresses the original objective:
     
     {summary_blocks}
     
     Final Report:
     """
+    synth_prompt_template = load_prompt_from_file("planner_synthesize_prompt.txt", default_synth_prompt)
+    input_prompt = synth_prompt_template.format(summary_blocks=summary_blocks)
+
+    llm_model_name = os.getenv("PLANNER_TOOL_LLM_MODEL", "models/gemini-1.5-pro") # Can use same model as plan
     llm = GoogleGenAI(
         api_key=os.getenv("GEMINI_API_KEY"),
-        model="models/gemini-1.5-pro",
+        model=llm_model_name,
     )
     response = llm.complete(input_prompt)
     return response.text
@@ -77,9 +91,10 @@
     """
     Initialize a LlamaIndex agent specialized in research planning and question engineering.
     """
+    agent_llm_model = os.getenv("PLANNER_AGENT_LLM_MODEL", "models/gemini-1.5-pro")
     llm = GoogleGenAI(
         api_key=os.getenv("GEMINI_API_KEY"),
-        model="models/gemini-1.5-pro",
+        model=agent_llm_model,
     )
 
     system_prompt = """\
@@ -108,6 +123,7 @@
     **Completion & Synthesis**  
     If the final result fully completes the original objective, produce a consolidated synthesis of the roadmap and send it as your concluding output.
     """
+    system_prompt = load_prompt_from_file("planner_system_prompt.txt", system_prompt) # Load from file if exists
 
     agent = ReActAgent(
         name="planner_agent",
    ```


### 3.9. `code_agent.py` Refactoring

*   **Rationale:** To address the critical security vulnerability of the `SimpleCodeExecutor`, improve configuration management, and align code execution with safer practices.
*   **Proposals:**
    1.  **Remove `SimpleCodeExecutor`:** This class and its `execute` method using `subprocess` with raw code strings are fundamentally insecure and **must be removed entirely**.
    2.  **Use `CodeInterpreterToolSpec`:** Rely *exclusively* on the `code_interpreter` tool derived from LlamaIndex's `CodeInterpreterToolSpec` for code execution. This tool is designed for safer, sandboxed execution.
    3.  **Update `CodeActAgent` Initialization:** Remove the `code_execute_fn` parameter when initializing `CodeActAgent`, as the agent should use the provided `code_interpreter` tool for execution via the standard ReAct/Act loop, not a direct execution function.
    4.  **Configuration:** Move hardcoded LLM model names (`o4-mini`, `models/gemini-1.5-pro`) and the API key environment variable name (`ALPAFLOW_OPENAI_API_KEY`) to configuration.
    5.  **Prompt Management:** Move the `generate_python_code` prompt to a separate template file.

*   **Diff Patch (Illustrative - Security Fix & Configuration):**

    ```diff
    --- a/code_agent.py
    +++ b/code_agent.py
    @@ -1,5 +1,6 @@
     import os
     import subprocess
+    import logging
 
     from llama_index.core.agent.workflow import ReActAgent, CodeActAgent
     from llama_index.core.tools import FunctionTool
@@ -7,6 +8,16 @@
     from llama_index.llms.openai import OpenAI
     from llama_index.tools.code_interpreter import CodeInterpreterToolSpec
 
+    logger = logging.getLogger(__name__)
+
+    def load_prompt_from_file(filename: str, default_prompt: str) -> str:
+        try:
+            with open(filename, "r") as f:
+                return f.read()
+        except FileNotFoundError:
+            logger.warning(f"Prompt file {filename} not found. Using default.")
+            return default_prompt
+
 def generate_python_code(prompt: str) -> str:
     """
     Generate valid Python code from a natural language description.
@@ -27,7 +38,7 @@
           it before execution.
         - This function only generates code and does not execute it.
     """
-
-    input_prompt = f"""You are also a helpful assistant that writes Python code. 
+    default_gen_prompt = f"""You are also a helpful assistant that writes Python code. 
     You will be given a prompt and you must generate Python code based on that prompt. 
     You must only generate Python code and nothing else. 
     Do not include any explanations or any other text. 
@@ -40,10 +51,14 @@
     Code:\n
     """
 
+    gen_prompt_template = load_prompt_from_file("code_gen_prompt.txt", default_gen_prompt)
+    input_prompt = gen_prompt_template.format(prompt=prompt)
+
+    gen_llm_model = os.getenv("CODE_GEN_LLM_MODEL", "o4-mini")
+    gen_api_key_env = os.getenv("CODE_GEN_API_KEY_ENV", "ALPAFLOW_OPENAI_API_KEY")
+    gen_api_key = os.getenv(gen_api_key_env)
     llm = OpenAI(
-        model="o4-mini",
-        api_key=os.getenv("ALPAFLOW_OPENAI_API_KEY")
+        model=gen_llm_model,
+        api_key=gen_api_key
     )
 
     generated_code = llm.complete(input_prompt)
@@ -74,60 +89,11 @@
     ),
 )
 
-from typing import Any, Dict, Tuple
-import io
-import contextlib
-import ast
-import traceback
-
-
-class SimpleCodeExecutor:
-    """
-    A simple code executor that runs Python code with state persistence.
-
-    This executor maintains a global and local state between executions,
-    allowing for variables to persist across multiple code runs.
-
-    NOTE: not safe for production use! Use with caution.
-    """
-
-    def __init__(self):
-        pass
-
-    def execute(self, code: str) -> str:
-        """
-        Execute Python code and capture output and return values.
-
-        Args:
-            code: Python code to execute
-
-        Returns:
-            Dict with keys `success`, `output`, and `return_value`
-        """
-        print(f"Executing code: {code}")
-        try:
-            result = subprocess.run(
-                ["python", code],
-                stdout=subprocess.PIPE,
-                stderr=subprocess.PIPE,
-                text=True,
-                timeout=60
-            )
-            if result.returncode != 0:
-                print(f"Execution failed with error: {result.stderr.strip()}")
-                return f"Error: {result.stderr.strip()}"
-            else:
-                output = result.stdout.strip()
-                print(f"Captured Output: {output}")
-                return output
-        except subprocess.TimeoutExpired:
-            print("Execution timed out.")
-            return "Error: Timeout"
-        except Exception as e:
-            print(f"Execution failed with error: {e}")
-            return f"Error: {e}"
-
 def initialize_code_agent() -> CodeActAgent:
-    code_executor = SimpleCodeExecutor()
+    # DO NOT USE SimpleCodeExecutor - it is insecure.
+    # Rely on the code_interpreter tool provided below.
 
+    agent_llm_model = os.getenv("CODE_AGENT_LLM_MODEL", "models/gemini-1.5-pro")
     llm = GoogleGenAI(
         api_key=os.getenv("GEMINI_API_KEY"),
-        model="models/gemini-1.5-pro",
+        model=agent_llm_model,
     )
 
     system_prompt = """\
@@ -151,6 +117,7 @@
        - If further logical reasoning or verification is needed, delegate to **reasoning_agent**.  
        - Otherwise, once you have the final code or execution result, pass your output to **planner_agent** for overall synthesis and presentation.
     """
+    system_prompt = load_prompt_from_file("code_agent_system_prompt.txt", system_prompt)
 
     agent = CodeActAgent(
         name="code_agent",
@@ -161,7 +128,7 @@
             "pipelines, and library development, CodeAgent delivers production-ready Python solutions."
         ),
         # REMOVED: code_execute_fn=code_executor.execute, # Use code_interpreter tool instead
-        code_execute_fn=code_executor.execute,
         tools=[
             python_code_generator_tool,
             code_interpreter_tool,
    ```


### 3.10. `math_agent.py` Refactoring

*   **Rationale:** To improve configuration management and potentially simplify the tool interface for the LLM.
*   **Proposals:**
    1.  **Configuration:** Move the hardcoded agent LLM model name (`models/gemini-1.5-pro`) to configuration. Ensure the WolframAlpha App ID is configured via environment variable (`WOLFRAM_ALPHA_APP_ID`) as intended.
    2.  **Tool Granularity:** The current approach creates a separate tool for almost every single math function (solve, derivative, integral, add, multiply, inverse, mean, median, etc.). While explicit, this results in a very large number of tools for the `ReActAgent` to manage. Consider:
        *   **Grouping:** Group related functions under fewer tools. For example, a `symbolic_math_tool` that takes the operation type (solve, diff, integrate) as a parameter, or a `matrix_ops_tool`.
        *   **Natural Language Interface:** Create a single `calculate` tool that takes a natural language math query (e.g., "solve x**2 - 4 = 0 for x", "mean of [1, 2, 3]") and uses an LLM (or rule-based parsing) internally to dispatch to the appropriate NumPy/SciPy/SymPy function. This simplifies the interface for the main agent LLM but adds complexity within the tool.
        *   **WolframAlpha Prioritization:** Evaluate if WolframAlpha can handle many of these requests directly, potentially reducing the need for numerous specific SymPy/NumPy tools, especially for symbolic tasks.
    3.  **Truncated File:** Since the original file was truncated, ensure the full file is reviewed if possible, as there might be other issues or tools not seen.

*   **Diff Patch (Illustrative - Configuration):**

    ```diff
    --- a/math_agent.py
    +++ b/math_agent.py
    @@ -1,5 +1,6 @@
     import os
     from typing import List, Optional, Union
+    import logging
     import sympy as sp
     import numpy as np
     from llama_index.core.agent.workflow import ReActAgent
    @@ -12,6 +13,8 @@
     from scipy.integrate import odeint
     import numpy.fft as fft
 
+    logger = logging.getLogger(__name__)
+
     # --- Symbolic math functions ---
 
 
    @@ -451,10 +454,11 @@
 
 
 def initialize_math_agent() -> ReActAgent:
+    agent_llm_model = os.getenv("MATH_AGENT_LLM_MODEL", "models/gemini-1.5-pro")
     llm = GoogleGenAI(
         api_key=os.getenv("GEMINI_API_KEY"),
-        model="models/gemini-1.5-pro",
+        model=agent_llm_model,
     )
 
     # Ensure WolframAlpha App ID is set
    ```

*(Refactoring proposals section complete)*


## 4. New Feature Designs

This section outlines the design for the new features requested: YouTube Ingestion and Generic Audio Transcription.

### 4.1. YouTube Ingestion

*   **Rationale:** To enable the framework to process YouTube videos by extracting audio, transcribing it, and summarizing the content, as requested by the user.
*   **Design Proposal:**
    *   **Implementation:** Introduce a new dedicated agent, `youtube_agent`, or add tools to the existing `research_agent` or `text_analyzer_agent`. A dedicated agent seems cleaner given the specific multi-step workflow.
    *   **Agent (`youtube_agent`):**
        *   **Purpose:** Manages the end-to-end process of downloading YouTube audio, chunking, transcribing, and summarizing.
        *   **Tools:**
            1.  `download_youtube_audio`: Takes a YouTube URL, uses a library like `yt-dlp` (or potentially `pytube`) to download the audio stream into a temporary file (e.g., `.mp3` or `.opus`). Returns the path to the audio file.
            2.  `chunk_audio_file`: Takes an audio file path and a maximum chunk duration (e.g., 60 seconds). Uses a library like `pydub` or `librosa`+`soundfile` to split the audio into smaller, sequentially numbered temporary files. Returns a list of chunk file paths.
            3.  `transcribe_audio_chunk_gemini`: Takes an audio file path (representing a chunk). Uses the Google Generative AI SDK (`google.generativeai`) to call the Gemini 1.5 Pro model with the audio file for transcription. Returns the transcribed text.
            4.  `summarize_transcript`: Takes the full concatenated transcript text. Uses a Gemini model (e.g., 1.5 Pro or Flash) with a specific prompt to generate a one-paragraph summary. Returns the summary text.
        *   **Workflow (ReAct or Function sequence):**
            1.  Receive YouTube URL.
            2.  Call `download_youtube_audio`.
            3.  Call `chunk_audio_file` with the downloaded audio path.
            4.  Iterate through the list of chunk paths:
                *   Call `transcribe_audio_chunk_gemini` for each chunk.
                *   Collect transcribed text segments.
            5.  Concatenate all transcribed text segments into a full transcript.
            6.  Call `summarize_transcript` with the full transcript.
            7.  Return the full transcript and the summary.
            8.  Clean up temporary audio files (downloaded and chunks).
        *   **Handoff:** Could hand off the transcript and summary to `planner_agent` or `text_analyzer_agent` for further processing or integration.
    *   **Dependencies:** `yt-dlp`, `pydub` (requires `ffmpeg` or `libav`), `google-generativeai`.
    *   **Configuration:** Gemini API Key, chunk duration.


### 4.2. Generic Audio Transcription

*   **Rationale:** To provide a flexible audio transcription capability for local files or remote URLs, using Gemini Pro for quality/latency tolerance and Whisper.cpp as a fallback, exposing it via a Python API as requested.
*   **Design Proposal:**
    *   **Implementation:** Introduce a new dedicated agent, `transcription_agent`, or add tools to `text_analyzer_agent`. A dedicated agent allows for clearer separation of concerns, especially managing the Whisper.cpp dependency and logic.
    *   **Agent (`transcription_agent`):**
        *   **Purpose:** Transcribes audio from various sources (local path, URL) using either Gemini or Whisper.cpp based on latency requirements or availability.
        *   **Tools:**
            1.  `prepare_audio_source`: Takes a source string (URL or local path). If it's a URL, downloads it to a temporary file using `requests`. Validates the local file path. Returns the path to the local audio file.
            2.  `transcribe_gemini`: Takes an audio file path. Uses the `google-generativeai` SDK to call Gemini 1.5 Pro for transcription. Returns the transcribed text. This is the preferred method when latency is acceptable.
            3.  `transcribe_whisper_cpp`: Takes an audio file path. Uses a Python wrapper around `whisper.cpp` (e.g., installing `whisper.cpp` via `apt` or compiling from source, then using `subprocess` or a dedicated Python binding if available) to perform local transcription. Returns the transcribed text. This is the fallback or low-latency option.
            4.  `choose_transcription_method`: (Internal logic or a simple tool) Takes latency preference (e.g., 'high_quality' vs 'low_latency') or checks Gemini availability/quota. Decides whether to use `transcribe_gemini` or `transcribe_whisper_cpp`.
        *   **Workflow (ReAct or Function sequence):**
            1.  Receive audio source (URL/path) and potentially a latency preference.
            2.  Call `prepare_audio_source` to get a local file path.
            3.  Call `choose_transcription_method` (or execute internal logic) to decide between Gemini and Whisper.
            4.  If Gemini: Call `transcribe_gemini`.
            5.  If Whisper: Call `transcribe_whisper_cpp`.
            6.  Return the resulting transcript.
            7.  Clean up temporary downloaded audio file if applicable.
        *   **Handoff:** Could hand off the transcript to `planner_agent` or `text_analyzer_agent`.
    *   **Python API:**
        *   Define a simple Python function (e.g., in a `transcription_api.py` module) that encapsulates the agent's logic or directly calls the underlying transcription functions.
        ```python
        # Example API function in transcription_api.py
        from .transcription_agent import transcribe_audio # Assuming agent logic is refactored

        def get_transcript(source: str, prefer_gemini: bool = True) -> str:
            """Transcribes audio from a local path or URL.

            Args:
                source: Path to the local audio file or URL.
                prefer_gemini: If True, attempts to use Gemini Pro first.
                               If False or Gemini fails, falls back to Whisper.cpp.

            Returns:
                The transcribed text.

            Raises:
                TranscriptionError: If transcription fails.
            """
            # Implementation would call the agent or its refactored functions
            try:
                # Simplified logic - actual implementation needs error handling,
                # Gemini/Whisper selection based on preference/availability
                transcript = transcribe_audio(source, prefer_gemini)
                return transcript
            except Exception as e:
                # Log error
                raise TranscriptionError(f"Failed to transcribe {source}: {e}") from e

        class TranscriptionError(Exception):
            pass
        ```
    *   **Dependencies:** `requests`, `google-generativeai`, `whisper.cpp` (requires separate installation/compilation), potentially Python bindings for `whisper.cpp`.
    *   **Configuration:** Gemini API Key, path to `whisper.cpp` executable or library, Whisper model selection.


## 5. Extra Agent Designs

This section proposes three additional specialized agents designed to enhance performance on the GAIA benchmark by addressing common challenges like complex fact verification, interpreting visual data representations, and handling long contexts.

### 5.1. Agent Design 1: Advanced Validation Agent (`validation_agent`)

*   **Purpose:** To perform rigorous validation of factual claims or intermediate results generated by other agents, going beyond the simple contradiction check of the current `verifier_agent`. This agent aims to improve the accuracy and trustworthiness of the final answer by cross-referencing information and performing checks.
*   **Key Tool Calls:**
    *   `web_search` (from `research_agent` or similar): To find external evidence supporting or refuting a claim.
    *   `browse_and_extract` (from `research_agent` or similar): To access specific URLs found during search and extract relevant text snippets.
    *   `code_interpreter` (from `code_agent`): To perform calculations or simple data manipulations needed for verification (e.g., checking unit conversions, calculating percentages).
    *   `knowledge_base_lookup` (New Tool - Optional): Interface with a structured knowledge base (e.g., Wikidata, internal DB) to verify entities, relationships, or properties.
    *   `llm_check_consistency` (New Tool or LLM call): Use a powerful LLM with a specific prompt to assess the logical consistency between a claim and a set of provided evidence snippets or existing context.
*   **Agent Loop Sketch (ReAct style):**
    1.  **Input:** A specific claim or statement to validate, along with relevant context or source information.
    2.  **Thought:** Identify the core assertion in the claim. Determine the best validation strategy (e.g., web search for current events, calculation for numerical claims, consistency check for logical statements).
    3.  **Action:** Call the appropriate tool (`web_search`, `code_interpreter`, `llm_check_consistency`).
    4.  **Observation:** Analyze the tool's output (search results, calculation result, consistency assessment).
    5.  **Thought:** Does the observation confirm, refute, or remain inconclusive about the claim? Is more information needed? (e.g., need to browse a specific search result).
    6.  **Action (if needed):** Call another tool (`browse_and_extract`, `llm_check_consistency` with new evidence).
    7.  **Observation:** Analyze new output.
    8.  **Thought:** Synthesize findings. Assign a final validation status (e.g., Confirmed, Refuted, Uncertain) and provide supporting evidence or reasoning.
    9.  **Output:** Validation status and justification.
    10. **Handoff:** Return result to `planner_agent` or `verifier_agent` (if this agent replaces the contradiction part).

### 5.2. Agent Design 2: Figure Interpretation Agent (`figure_interpretation_agent`)

*   **Purpose:** To specialize in extracting structured data and meaning from figures, charts, graphs, and tables embedded within images or documents, which are common in GAIA tasks and often require more than just a textual description.
*   **Key Tool Calls:**
    *   `image_ocr` (New Tool or enhanced `image_analyzer_agent` capability): High-precision OCR focused on extracting text specifically from figures, including axes labels, legends, titles, and data points.
    *   `chart_data_extractor` (New Tool): Utilizes specialized vision models (e.g., DePlot, ChartOCR, or similar fine-tuned models) designed to parse chart types (bar, line, pie) and extract underlying data series or key values.
    *   `table_parser` (New Tool): Uses vision or document AI models to detect table structures in images/PDFs and extract cell content into a structured format (e.g., list of lists, Pandas DataFrame via code execution).
    *   `code_interpreter` (from `code_agent`): To process extracted data (e.g., load into DataFrame, perform simple analysis, re-plot for verification).
    *   `llm_interpret_figure` (New Tool or LLM call): Takes extracted text, data, and potentially the image itself (multimodal) to provide a semantic interpretation of the figure's message or trends.
*   **Agent Loop Sketch (Function sequence or ReAct):**
    1.  **Input:** An image or document page containing a figure/table, potentially with context or a specific question about it.
    2.  **Action:** Call `image_ocr` to get all text elements.
    3.  **Action:** Call `chart_data_extractor` or `table_parser` based on visual analysis (or try both) to get structured data.
    4.  **Action (Optional):** Call `code_interpreter` to load structured data into a DataFrame for easier handling.
    5.  **Action:** Call `llm_interpret_figure`, providing the extracted text, data (raw or DataFrame), and potentially the original image, asking it to answer the specific question or summarize the figure's key insights.
    6.  **Output:** Structured data (if requested) and/or the semantic interpretation/answer.
    7.  **Handoff:** Return results to `planner_agent` or `reasoning_agent`.

### 5.3. Agent Design 3: Long Context Management Agent (`long_context_agent`)

*   **Purpose:** To effectively manage and query information from very long documents or conversation histories that exceed the context window limits of standard models or require efficient information retrieval techniques.
*   **Key Tool Calls:**
    *   `document_chunker` (New Tool): Splits long text into semantically meaningful chunks (e.g., using `SentenceSplitter` from LlamaIndex or more advanced methods).
    *   `vector_store_builder` (New Tool): Takes text chunks and builds an in-memory or persistent vector index (using libraries like `llama-index`, `langchain`, `faiss`, `chromadb`).
    *   `vector_retriever` (New Tool): Queries the built vector index with a specific question to find the most relevant chunks.
    *   `summarizer_tool` (New Tool or LLM call): Generates summaries of long text or selected chunks, potentially using different levels of detail.
    *   `contextual_synthesizer` (New Tool or LLM call): Takes retrieved relevant chunks and the original query, then uses an LLM to synthesize an answer grounded in the retrieved context (RAG pattern).
*   **Agent Loop Sketch (Can be stateful):**
    1.  **Input:** A long document (text or path) or a long conversation history, and a specific query or task related to it.
    2.  **(Initialization/First Use):**
        *   **Action:** Call `document_chunker`.
        *   **Action:** Call `vector_store_builder` to create an index from the chunks. Store the index reference.
    3.  **(Querying):**
        *   **Action:** Call `vector_retriever` with the user's query to get relevant chunks.
        *   **Action:** Call `contextual_synthesizer`, providing the query and retrieved chunks, to generate the final answer.
    4.  **(Alternative: Summarization Task):**
        *   **Action:** Call `summarizer_tool` on the full text (if feasible for the tool) or on retrieved chunks based on a high-level query.
    5.  **Output:** The synthesized answer or the summary.
    6.  **Handoff:** Return results to `planner_agent`.


## 6. Migration Plan

This section details the recommended steps for applying the proposed changes, lists new dependencies, and outlines minimal validation tests.

### 6.1. Order of Implementation

It is recommended to apply changes in the following order to minimize disruption and build upon stable foundations:

1.  **Core Refactoring (`app.py`, Configuration, Logging):**
    *   Implement centralized configuration (e.g., `.env` file) and update all agents to use it for API keys, model names, etc.
    *   Integrate Python's `logging` module throughout `app.py` and all agent files, replacing `print` statements.
    *   Refactor `app.py`: Implement singleton agent initialization and break down `run_and_submit_all`.
    *   Apply structural refactors to agents (class-based structure, avoiding globals) like `role_agent`, `verifier_agent`, `research_agent`.
2.  **Critical Security Fix (`code_agent`):**
    *   Immediately remove the `SimpleCodeExecutor` and modify `code_agent` to rely solely on the `code_interpreter` tool.
3.  **Core Functionality Refactoring (`verifier_agent`, `math_agent`):**
    *   Improve `verifier_agent`'s contradiction detection (e.g., using an LLM or NLI model).
    *   Refactor `math_agent` tools if choosing to group them or use a natural language interface.
4.  **New Feature: Generic Audio Transcription (`transcription_agent`):**
    *   Install `whisper.cpp` and its dependencies.
    *   Implement the `transcription_agent` and its tools (`prepare_audio_source`, `transcribe_gemini`, `transcribe_whisper_cpp`).
    *   Implement the Python API function `get_transcript`.
5.  **New Feature: YouTube Ingestion (`youtube_agent`):**
    *   Install `yt-dlp` and `pydub` (and `ffmpeg`).
    *   Implement the `youtube_agent` and its tools (`download_youtube_audio`, `chunk_audio_file`, `transcribe_audio_chunk_gemini`, `summarize_transcript`).
6.  **New Agent Implementation (Validation, Figure, Long Context):**
    *   Implement `validation_agent` and its tools.
    *   Implement `figure_interpretation_agent` and its tools (requires sourcing/installing chart/table parsing models/libraries).
    *   Implement `long_context_agent` and its tools (requires vector DB setup like `faiss` or `chromadb`).
7.  **Integration and Workflow Adjustments:**
    *   Update `planner_agent`'s system prompt and handoff logic to incorporate the new agents.
    *   Update other agents' handoff targets as needed.
    *   Update `app.py` if the overall agent initialization or workflow invocation changes.

### 6.2. New Dependencies (`requirements.txt`)

Based on the refactoring and new features, the following dependencies might need to be added or updated in `requirements.txt` (or managed via environment setup):

*   `python-dotenv`: For loading configuration from `.env` files.
*   `google-generativeai`: For interacting with Gemini models (already likely present via `llama-index-llms-google-genai`).
*   `yt-dlp`: For downloading YouTube videos.
*   `pydub`: For audio manipulation (chunking). Requires `ffmpeg` or `libav` system dependency.
*   `llama-index-vector-stores-faiss` / `faiss-cpu` / `faiss-gpu`: For `long_context_agent` vector store (choose one).
*   `chromadb` / `llama-index-vector-stores-chroma`: Alternative vector store for `long_context_agent`.
*   `llama-index-multi-modal-llms-google`: Ensure multimodal support for Gemini is correctly installed.
*   *Possibly*: Libraries for NLI models (e.g., `transformers`, `torch`) if used in `validation_agent`.
*   *Possibly*: Libraries for chart/table parsing (e.g., specific models from Hugging Face, `opencv-python`, `pdf2image`) if implementing `figure_interpretation_agent` tools.
*   *Possibly*: Python bindings for `whisper.cpp` if not using `subprocess`.

**System Dependencies:**

*   `ffmpeg` or `libav`: Required by `pydub`.
*   `whisper.cpp`: Needs to be compiled or installed separately. Follow its specific instructions.

### 6.3. Validation Tests

Minimal tests should be implemented to validate key changes:

1.  **Configuration:** Test loading of API keys and model names from the configuration source.
2.  **Logging:** Verify that logs are being generated at the correct levels and formats.
3.  **`code_agent` Security:** Test that `code_agent` uses `code_interpreter` and *not* the removed `SimpleCodeExecutor`. Attempt a malicious code execution via prompt to ensure it fails safely within the interpreter's sandbox.
4.  **`verifier_agent` Contradiction:** Test the improved contradiction detection with sample pairs of contradictory and non-contradictory statements.
5.  **`transcription_agent`:**
    *   Test with a short local audio file using both Gemini and Whisper.cpp, comparing output quality/speed.
    *   Test with an audio URL.
    *   Test the Python API function `get_transcript`.
6.  **`youtube_agent`:**
    *   Test with a short YouTube video URL.
    *   Verify audio download, chunking, transcription of chunks, and final summary generation.
    *   Check cleanup of temporary files.
7.  **New Agents (Basic):**
    *   For `validation_agent`, `figure_interpretation_agent`, `long_context_agent`, implement basic tests confirming agent initialization and successful calls to their primary new tools with mock inputs/outputs.
8.  **End-to-End Smoke Test:** Run `app.py` and process one or two simple GAIA tasks that are likely to invoke the refactored components and potentially a new feature (if a relevant task exists) to ensure the overall workflow remains functional.

*(Implementation plan complete. Ready for user confirmation.)*