Spaces:
Running
Running
# CLAUDE.md | |
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
## Project Overview | |
This is a **production-ready GAIA benchmark AI agent** achieving 85% accuracy through a sophisticated multi-agent architecture. The system has been **fully refactored** into a modular, maintainable architecture that specializes in complex question answering across multimedia, research, file processing, chess analysis, and mathematical reasoning domains. | |
## Development Commands | |
### Setup and Installation | |
```bash | |
# Install dependencies | |
pip install -r requirements.txt | |
# Test API key configuration | |
python test_api_keys.py | |
# Verify core functionality | |
python -c "from main import GAIASolver; print('β Core GAIASolver available')" | |
``` | |
### Running the System | |
```bash | |
# Run legacy monolithic solver | |
python main.py | |
# Run refactored modular solver (recommended) | |
python main_refactored.py | |
# Run Gradio web interface | |
python app.py | |
``` | |
### Testing Commands | |
```bash | |
# Comprehensive async testing | |
python async_complete_test.py | |
# Test question classification | |
python test_improved_classification.py | |
python final_classification_test.py | |
# Test YouTube functionality | |
python direct_youtube_test.py | |
python simple_youtube_test.py | |
python test_youtube_question.py | |
# Test individual components | |
python -c "from gaia_tools import GAIA_TOOLS; print(f'Available tools: {len(GAIA_TOOLS)}')" | |
python -c "from question_classifier import QuestionClassifier; c = QuestionClassifier(); print('β Classifier ready')" | |
``` | |
## Architecture Overview | |
### Dual Architecture Design | |
This project maintains both **legacy monolithic** and **refactored modular** architectures: | |
**Legacy Architecture (main.py):** | |
- Monolithic 1285-line solver with all functionality integrated | |
- Comprehensive tool collection in gaia_tools.py (4887 lines) | |
- Single-file approach for rapid development and deployment | |
**Refactored Architecture (gaia/ package):** | |
``` | |
gaia/ | |
βββ core/ # Main solver logic | |
β βββ solver.py # GAIASolver main class | |
β βββ answer_extractor.py # Specialized answer extraction classes | |
β βββ question_processor.py # Question classification and processing | |
βββ tools/ # Tool implementations | |
β βββ base.py # Abstract tool interface and registry | |
β βββ registry.py # Tool discovery and management | |
β βββ [specialized tool modules] | |
βββ models/ # Model providers and management | |
β βββ manager.py # ModelManager with fallback chains | |
β βββ providers.py # LiteLLM, Gemini, Kluster providers | |
βββ config/ # Configuration management | |
β βββ settings.py # Config, ModelConfig classes | |
βββ utils/ # Utilities and helpers | |
βββ exceptions.py # Custom exception hierarchy | |
βββ logging.py # Logging configuration | |
``` | |
### Core Components | |
**GAIASolver (main.py):** Legacy monolithic solver with 1000+ lines of sophisticated processing logic | |
**GAIASolver (gaia/core/solver.py):** Refactored main orchestrator using dependency injection | |
**QuestionClassifier:** LLM-based intelligent routing with pattern-based fallbacks | |
**GAIA_TOOLS:** 42 specialized tools including enhanced Wikipedia research, chess analysis, Excel processing, and multimedia analysis | |
**ModelManager:** Handles model initialization, fallback chains (Kluster.ai β Gemini β Qwen), and lifecycle management | |
### Question Type Specialization | |
**Research Questions (92% accuracy):** | |
- Enhanced Wikipedia tools with date-specific searches and Featured Articles integration | |
- Multi-step research coordination with cross-validation | |
- Anti-hallucination safeguards to prevent fabrication | |
**Chess Questions (100% accuracy):** | |
- Universal FEN correction system handling any vision error pattern | |
- Multi-tool consensus system for maximum accuracy | |
- Perfect algebraic notation extraction | |
**YouTube/Multimedia Questions:** | |
- Enhanced URL detection with multiple regex patterns | |
- Forced classification override for YouTube content | |
- Specialized prompts with explicit tool usage instructions | |
**File Processing (100% accuracy):** | |
- Format-specific tools for Excel (.xlsx/.xls), Python (.py), text files | |
- Deterministic Python execution with sandboxed environment | |
- Financial calculation specialization with proper currency formatting | |
## Environment Configuration | |
### Required API Keys (set in .env) | |
- `GEMINI_API_KEY` - Primary model (Gemini Flash 2.0) | |
- `HUGGINGFACE_TOKEN` - Fallback model and classification | |
- `KLUSTER_API_KEY` - Optional premium model access | |
### Model Fallback Chain | |
1. **Kluster.ai** (Qwen3-235B, Gemma3-27B) - Premium option | |
2. **Gemini Flash 2.0** - Primary production model | |
3. **Qwen 2.5-72B** - Reliable fallback via HuggingFace | |
## Key Design Patterns | |
### Anti-Hallucination Architecture | |
- **Tool result prioritization**: Always uses exact tool outputs over internal reasoning | |
- **Cross-validation**: Multiple verification methods for critical information | |
- **Source attribution**: Clear tracking and validation of information sources | |
- **Validation rules**: Type-specific answer extraction and verification | |
### Performance Optimizations | |
- **Fresh agent creation** for each question to avoid token accumulation | |
- **Concurrent processing** support with async operations | |
- **15-minute web cache** for improved response times | |
- **Exponential backoff** for API rate limiting | |
## File Organization | |
### Core Files | |
- `main.py` - Legacy monolithic solver (1285 lines) | |
- `main_refactored.py` - Entry point for refactored architecture | |
- `gaia_tools.py` - 42 specialized tools with robust error handling (4887 lines) | |
- `question_classifier.py` - LLM + pattern-based classification system | |
- `app.py` - Production Gradio interface with comprehensive error handling | |
### Supporting Files | |
- `async_complete_test.py` - Comprehensive async testing infrastructure | |
- `enhanced_wikipedia_tools.py` - Advanced Wikipedia research capabilities | |
- `universal_fen_correction.py` - Chess-specific FEN notation correction | |
- `wikipedia_featured_articles_by_date.py` - Date-specific Wikipedia searches | |
## Local Configuration Notes | |
- huggingface token can get from secrets in .env |