Final_Assignment

Running

App Files Files Community

Final_Assignment / CLAUDE.md

GAIA Developer

🔄 Update safe session data and improve security

b0fb5c7 3 months ago

preview code

raw

history blame contribute delete

6.28 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	This is a production-ready GAIA benchmark AI agent achieving 85% accuracy through a sophisticated multi-agent architecture. The system has been fully refactored into a modular, maintainable architecture that specializes in complex question answering across multimedia, research, file processing, chess analysis, and mathematical reasoning domains.

	## Development Commands

	### Setup and Installation
	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Test API key configuration
	python test_api_keys.py

	# Verify core functionality
	python -c "from main import GAIASolver; print('✅ Core GAIASolver available')"
	```

	### Running the System
	```bash
	# Run legacy monolithic solver
	python main.py

	# Run refactored modular solver (recommended)
	python main_refactored.py

	# Run Gradio web interface
	python app.py
	```

	### Testing Commands
	```bash
	# Comprehensive async testing
	python async_complete_test.py

	# Test question classification
	python test_improved_classification.py
	python final_classification_test.py

	# Test YouTube functionality
	python direct_youtube_test.py
	python simple_youtube_test.py
	python test_youtube_question.py

	# Test individual components
	python -c "from gaia_tools import GAIA_TOOLS; print(f'Available tools: {len(GAIA_TOOLS)}')"
	python -c "from question_classifier import QuestionClassifier; c = QuestionClassifier(); print('✅ Classifier ready')"
	```

	## Architecture Overview

	### Dual Architecture Design

	This project maintains both legacy monolithic and refactored modular architectures:

	Legacy Architecture (main.py):
	- Monolithic 1285-line solver with all functionality integrated
	- Comprehensive tool collection in gaia_tools.py (4887 lines)
	- Single-file approach for rapid development and deployment

	Refactored Architecture (gaia/ package):
	```
	gaia/
	├── core/ # Main solver logic
	│ ├── solver.py # GAIASolver main class
	│ ├── answer_extractor.py # Specialized answer extraction classes
	│ └── question_processor.py # Question classification and processing
	├── tools/ # Tool implementations
	│ ├── base.py # Abstract tool interface and registry
	│ ├── registry.py # Tool discovery and management
	│ └── [specialized tool modules]
	├── models/ # Model providers and management
	│ ├── manager.py # ModelManager with fallback chains
	│ └── providers.py # LiteLLM, Gemini, Kluster providers
	├── config/ # Configuration management
	│ └── settings.py # Config, ModelConfig classes
	└── utils/ # Utilities and helpers
	├── exceptions.py # Custom exception hierarchy
	└── logging.py # Logging configuration
	```

	### Core Components

	GAIASolver (main.py): Legacy monolithic solver with 1000+ lines of sophisticated processing logic
	GAIASolver (gaia/core/solver.py): Refactored main orchestrator using dependency injection
	QuestionClassifier: LLM-based intelligent routing with pattern-based fallbacks
	GAIA_TOOLS: 42 specialized tools including enhanced Wikipedia research, chess analysis, Excel processing, and multimedia analysis
	ModelManager: Handles model initialization, fallback chains (Kluster.ai → Gemini → Qwen), and lifecycle management

	### Question Type Specialization

	Research Questions (92% accuracy):
	- Enhanced Wikipedia tools with date-specific searches and Featured Articles integration
	- Multi-step research coordination with cross-validation
	- Anti-hallucination safeguards to prevent fabrication

	Chess Questions (100% accuracy):
	- Universal FEN correction system handling any vision error pattern
	- Multi-tool consensus system for maximum accuracy
	- Perfect algebraic notation extraction

	YouTube/Multimedia Questions:
	- Enhanced URL detection with multiple regex patterns
	- Forced classification override for YouTube content
	- Specialized prompts with explicit tool usage instructions

	File Processing (100% accuracy):
	- Format-specific tools for Excel (.xlsx/.xls), Python (.py), text files
	- Deterministic Python execution with sandboxed environment
	- Financial calculation specialization with proper currency formatting

	## Environment Configuration

	### Required API Keys (set in .env)
	- `GEMINI_API_KEY` - Primary model (Gemini Flash 2.0)
	- `HUGGINGFACE_TOKEN` - Fallback model and classification
	- `KLUSTER_API_KEY` - Optional premium model access

	### Model Fallback Chain
	1. Kluster.ai (Qwen3-235B, Gemma3-27B) - Premium option
	2. Gemini Flash 2.0 - Primary production model
	3. Qwen 2.5-72B - Reliable fallback via HuggingFace

	## Key Design Patterns

	### Anti-Hallucination Architecture
	- Tool result prioritization: Always uses exact tool outputs over internal reasoning
	- Cross-validation: Multiple verification methods for critical information
	- Source attribution: Clear tracking and validation of information sources
	- Validation rules: Type-specific answer extraction and verification

	### Performance Optimizations
	- Fresh agent creation for each question to avoid token accumulation
	- Concurrent processing support with async operations
	- 15-minute web cache for improved response times
	- Exponential backoff for API rate limiting

	## File Organization

	### Core Files
	- `main.py` - Legacy monolithic solver (1285 lines)
	- `main_refactored.py` - Entry point for refactored architecture
	- `gaia_tools.py` - 42 specialized tools with robust error handling (4887 lines)
	- `question_classifier.py` - LLM + pattern-based classification system
	- `app.py` - Production Gradio interface with comprehensive error handling

	### Supporting Files
	- `async_complete_test.py` - Comprehensive async testing infrastructure
	- `enhanced_wikipedia_tools.py` - Advanced Wikipedia research capabilities
	- `universal_fen_correction.py` - Chess-specific FEN notation correction
	- `wikipedia_featured_articles_by_date.py` - Date-specific Wikipedia searches

	## Local Configuration Notes
	- huggingface token can get from secrets in .env