ramsi-k commited on
Commit
7a28b51
Β·
1 Parent(s): bce4c09

docs: update and add memory handling and tech specs

Browse files
Files changed (4) hide show
  1. README.md +187 -131
  2. memory_handling.md +337 -0
  3. run_pipeline.py +155 -0
  4. tech_specs.md +25 -58
README.md CHANGED
@@ -1,171 +1,227 @@
1
- # Agentic Comic Generator
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ![Python](https://img.shields.io/badge/language-python-blue)
4
  ![Gradio](https://img.shields.io/badge/frontend-Gradio-orange)
5
- ![Modal](https://img.shields.io/badge/backend-Modal-lightgrey)
6
-
7
- > 🎨 Multi-agent AI system for generating comic panels from story prompts
8
-
9
- A multi-agent AI system that transforms user prompts into illustrated comic panels. Agent Brown handles narrative logic and dialogue. Agent Bayko renders the visuals. Designed as an experiment in agent collaboration, creative storytelling, and generative visuals.
10
-
11
- ## πŸŽ—οΈKey Features
12
-
13
- - Modular agents for dialogue and image generation
14
- - Prompt-to-panel storytelling pipeline
15
- - Gradio-powered web interface
16
- - Easily extendable for TTS, styles, or emotion tagging
17
-
18
- ## ✍️ Status
19
-
20
- Currently under active development for experimentation and portfolio.
21
-
22
- ## πŸ“ Directory Structure
23
-
24
- ```text
25
- project-root/
26
- β”œβ”€β”€ app.py # Entrypoint for Gradio
27
- β”œβ”€β”€ api/ # FastAPI routes and logic
28
- β”œβ”€β”€ agents/
29
- β”‚ β”œβ”€β”€ brown.py
30
- β”‚ └── bayko.py
31
- β”œβ”€β”€ plugins/
32
- β”‚ β”œβ”€β”€ base.py
33
- β”‚ └── tts_plugin.py
34
- β”œβ”€β”€ services/
35
- β”‚ └── ai_service.py
36
- β”œβ”€β”€ config.py
37
- β”œβ”€β”€ modal_app.py
38
- β”œβ”€β”€ storyboard/ # Where all output sessions go
39
- β”‚ └── session_xxx/
40
- β”œβ”€β”€ requirements.txt
41
- β”œβ”€β”€ README.md
42
- └── tech_specs.md
43
- ```
44
 
45
- ## πŸ’‘ Use Case
46
 
47
- A user enters a storytelling prompt via a secure WebUI.
48
- The system responds with:
49
 
50
- - Stylized dialogue
51
- - Rendered comic panels
52
- - Optional voiceover narration
53
 
54
- Behind the scenes, two agents β€” Bayko and Brown β€” process and generate the comic collaboratively while remaining isolated via network boundaries.
 
 
 
 
 
 
 
55
 
56
  ---
57
 
58
- ## πŸ“ž Agent Communication & Storage
 
 
59
 
60
- ## πŸ‘₯ Agent Roles
61
 
62
- Two core agents form the backbone of this system:
 
 
 
63
 
64
- - πŸ€– **Agent Brown** – The front-facing orchestrator. It receives the user’s prompt, tags the style, validates inputs, and packages the story plan for execution.
65
- - 🧠 **Agent Bayko** – The creative engine. It handles image, audio, and subtitle generation based on the structured story plan from Brown.
66
 
67
- Each agent operates in isolation but contributes to the shared goal of generating cohesive, stylized comic outputs.
 
 
 
68
 
69
- ### Agent Brown
70
 
71
- - πŸ”Ή Input validator, formatter, and storyboard author
72
- - ✨ Adds style tags ("Ghibli", "tragedy", etc.)
73
- - πŸ“¦ Writes JSON packages for Bayko
74
- - πŸ›‘οΈ Includes moderation tools, profanity filter
75
 
76
- ### Agent Bayko
77
 
78
- - 🧠 Reads storyboard.json and routes via MCP
79
- - πŸ› οΈ Toolchain orchestration (SDXL, TTS, Subtitler)
80
- - 🎞️ Output assembly logic
81
- - πŸ”„ Writes final output + metadata
82
 
83
- Brown and Bayko operate in a feedback loop, refining outputs collaboratively across multiple turns, simulating human editorial workflows.
 
 
 
 
84
 
85
- ## πŸ” Agent Feedback Loop
86
 
87
- This system features a multi-turn agent interaction flow, where Brown and Bayko collaborate via structured JSON messaging.
 
 
 
88
 
89
- ### Step-by-Step Collaboration
90
 
91
- 1. **User submits prompt via WebUI**
92
- β†’ Brown tags style, checks profanity, and prepares a `storyboard.json`.
93
 
94
- 2. **Brown sends JSON to Bayko via shared storage**
95
- β†’ Includes panel count, style tags, narration request, and subtitles config.
96
 
97
- 3. **Bayko processes each panel sequentially**
98
- β†’ For each, it generates:
99
 
100
- - `panel_X.png` (image)
101
- - `panel_X.mp3` (narration)
102
- - `panel_X.vtt` (subtitles)
 
 
103
 
104
- 4. **Brown reviews Bayko’s output against the prompt**
105
 
106
- - If all panels match: compile final comic.
107
- - If mismatch: returns annotated JSON with `refinement_request`.
108
 
109
- 5. **UI reflects agent decisions**
110
- β†’ Shows messages like β€œWaiting on Bayko…” or β€œRefining… hang tight!”
111
 
112
- This feedback loop allows for **multi-turn refinement**, **moderation hooks**, and extensibility (like emotion tagging or memory-based rejections).
 
 
 
113
 
114
- ### User Interaction
115
 
116
- - When the user submits a prompt, the system enters a "processing" state.
117
- - If Brown flags an issue, the UI displays a message such as β€œRefining content… please wait.”
118
- - This feedback loop can be extended for multi-turn interactions, allowing further refinement for higher-quality outputs.
 
119
 
120
- This modular design not only demonstrates the agentic behavior of the system but also allows for future expansions such as incorporating memory and adaptive feedback over multiple turns.
121
 
122
- ## βš™οΈ Example Prompt
123
 
124
- ```text
125
- Prompt: β€œA moody K-pop idol finds a puppy on the street. It changes everything.”
126
- Style: 4-panel, Studio Ghibli, whisper-soft lighting
127
- Language: Korean with English subtitles
128
- Extras: Narration + backing music
129
- ```
130
 
131
- For detailed multi-turn logic and JSON schemas, see [Feedback Loop Implementation](./tech_specs.md#-multi-turn-agent-communication).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
 
133
  ---
134
 
135
- ## 🧠 System Architecture
136
-
137
- ### πŸ—οΈ Technical Overview
138
-
139
- The system combines **FastAPI** backend services, **Gradio** frontend, **Modal** compute scaling, and **LlamaIndex** agent orchestration to create a sophisticated multi-agent workflow.
140
-
141
- ```mermaid
142
- graph TD
143
- A[πŸ‘€ User Input<br/>Gradio Interface] --> B[πŸ€– Agent Brown<br/>Orchestrator]
144
- B --> C[🧠 LlamaIndex<br/>Memory & State]
145
- B --> D[πŸ“¨ JSON Message Queue<br/>Agent Communication]
146
- D --> E[🎨 Agent Bayko<br/>Content Generator]
147
- E --> F[☁️ Modal Inference<br/>Compute Layer]
148
-
149
- subgraph "🎯 Sponsor Tool Integration"
150
- G[πŸ€– OpenAI API<br/>Dialogue Generation]
151
- H[πŸ¦™ Mistral API<br/>Style & Tone]
152
- I[πŸ€— HuggingFace<br/>SDXL Models]
153
- J[⚑ Modal Labs<br/>Serverless Compute]
154
- end
155
-
156
- F --> G
157
- F --> H
158
- F --> I
159
- E --> J
160
-
161
- E --> K[βœ… Content Validation]
162
- K --> L{Quality Check}
163
- L -->|❌ Needs Refinement| D
164
- L -->|βœ… Approved| M[πŸ“¦ Final Assembly]
165
- M --> N[🎨 Comic Output<br/>Gradio Display]
166
-
167
- style A fill:#e1f5fe
168
- style B fill:#f3e5f5
169
- style E fill:#e8f5e8
170
- style F fill:#fff3e0
171
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Agentic Comic Generator - Bayko & Brown
3
+ emoji: πŸ¦™πŸŽ¨
4
+ colorFrom: blue
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ tags:
10
+ - agent-demo-track
11
+ - mcp-server-track
12
+ - llamaindex
13
+ - multi-agent
14
+ - comic-generation
15
+ pinned: false
16
+ ---
17
+
18
+ πŸ“« [LinkedIn](https://www.linkedin.com/in/ramsikalia/)
19
+ πŸ”— [GitHub](https://github.com/Ramsi-K)
20
+ πŸ“¬ Drop me a message if you want to collaborate or hire!
21
+
22
+ # 🎨 Bayko & Brown: The Agentic Comic Generator
23
+
24
+ > ✨ **An ambitious multi-agent system for the [Hugging Face Hackathon](https://huggingface.co/competitions/llamaindex-hackathon)**
25
+ >
26
+ > πŸš€ **Demonstrating advanced agent coordination, LlamaIndex workflows, and creative AI storytelling**
27
+
28
+ **⚠️ HACKATHON TRANSPARENCY:** This is a complex, experimental system that pushes the boundaries of what's possible with current AI infrastructure. While some components face integration challenges (Modal deployment, OpenAI rate limits, LlamaIndex workflow complexity), the architecture and implementation represent significant technical achievement and innovation.
29
 
30
  ![Python](https://img.shields.io/badge/language-python-blue)
31
  ![Gradio](https://img.shields.io/badge/frontend-Gradio-orange)
32
+ ![Modal](https://img.shields.io/badge/running-Modal-lightgrey)
33
+ ![LlamaIndex](https://img.shields.io/badge/orchestrator-LlamaIndex-9cf)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ ---
36
 
37
+ ### πŸ’‘ Tech Sponsors
 
38
 
39
+ This project integrates all key hackathon sponsors:
 
 
40
 
41
+ | Tool | Used For |
42
+ | -------------- | ---------------------------------------------- |
43
+ | πŸ¦™ LlamaIndex | ReActAgent + FunctionTools |
44
+ | πŸ€– OpenAI | GPT-4o reasoning and multimodal |
45
+ | 🧠 Mistral | Code Generation and Execution in Modal Sandbox |
46
+ | 🎨 HuggingFace | SDXL image generation on Modal |
47
+ | ⚑ Modal | Serverless compute + sandbox exec |
48
+ | πŸ’» Claude | Coding Assistant |
49
 
50
  ---
51
 
52
+ ## 🎯 What This Project Achieves
53
+
54
+ **This is a sophisticated exploration of multi-agent AI systems** that demonstrates:
55
 
56
+ ### πŸ—οΈ **Advanced Architecture**
57
 
58
+ - **Dual-Agent Coordination**: Brown (orchestrator) and Bayko (generator) with distinct roles
59
+ - **LlamaIndex Workflows**: Custom event-driven workflows with `ComicGeneratedEvent`, `CritiqueStartEvent`, `WorkflowPauseEvent`
60
+ - **ReAct Agent Pattern**: Visible Thought/Action/Observation cycles for transparent reasoning
61
+ - **Async/Sync Integration**: Complex Modal function calls within async LlamaIndex workflows
62
 
63
+ ### 🧠 **Technical Innovation**
 
64
 
65
+ - **Custom Event System**: Built sophisticated workflow control beyond basic LlamaIndex patterns
66
+ - **Multi-Modal Processing**: GPT-4o for image analysis, SDXL for generation, Mistral for enhancement
67
+ - **Memory Management**: Persistent conversation history across agent interactions
68
+ - **Error Handling**: Robust fallback systems and rate limit management
69
 
70
+ ### 🎨 **Creative AI Pipeline**
71
 
72
+ - **Prompt Enhancement**: Brown intelligently expands user prompts with narrative structure
73
+ - **Style-Aware Generation**: Automatic tagging and style consistency across panels
74
+ - **Quality Assessment**: Brown critiques Bayko's output with approval/refinement cycles
75
+ - **Multi-Format Output**: Images, subtitles, and interactive code generation
76
 
77
+ ## 🚧 **Hackathon Reality Check**
78
 
79
+ **What Works:**
 
 
 
80
 
81
+ - βœ… Complete agent architecture and workflow design
82
+ - βœ… LlamaIndex integration with custom events and memory
83
+ - βœ… Gradio interface with real-time progress updates
84
+ - βœ… Modal function definitions for SDXL and code execution
85
+ - βœ… Comprehensive error handling and fallback systems
86
 
87
+ **Current Challenges:**
88
 
89
+ - ⚠️ Modal deployment complexity in hackathon timeframe
90
+ - ⚠️ OpenAI rate limiting (3 requests/minute) affecting workflow
91
+ - ⚠️ LlamaIndex workflow async/sync integration edge cases
92
+ - ⚠️ Infrastructure coordination between multiple cloud services
93
 
94
+ **The Achievement:** Building a working multi-agent system with this level of sophistication in a hackathon timeframe represents significant technical accomplishment, even with deployment challenges.
95
 
96
+ ## πŸ“Έ Example Prompt
 
97
 
98
+ > β€œA moody K-pop idol finds a puppy. Studio Ghibli style. 4 panels.”
 
99
 
100
+ **What happens:**
 
101
 
102
+ 1. Brown validates the prompt and tags it with style metadata.
103
+ 2. Brown uses LlamaIndex tools to call Bayko.
104
+ 3. Bayko generates 4 images + optional(future) TTS/subtitles.
105
+ 4. Brown reviews and decides to approve/refine.
106
+ 5. Output is saved in `storyboard/session_xxx/`.
107
 
108
+ ---
109
 
110
+ ## 🧱 Agent Roles
 
111
 
112
+ ### πŸ€– Agent Brown
 
113
 
114
+ - Built with `LlamaIndex ReActAgent`
115
+ - Calls tools like `validate_input`, `process_request`, `review_output`
116
+ - Uses GPT-4 or GPT-4V for reasoning
117
+ - Controls the flow: validation β†’ generation β†’ quality review
118
 
119
+ ### 🎨 Agent Bayko
120
 
121
+ - Deterministic generation engine
122
+ - Uses Modal to run SDXL (via Hugging Face Diffusers)
123
+ - Can generate: images, TTS audio, subtitles
124
+ - Responds to structured messages only – no LLM inside
125
 
126
+ ---
127
 
128
+ ## 🧠 LlamaIndex Memory & Workflow Highlights
129
 
130
+ This project integrates **LlamaIndex** to power both agent memory and the ReAct workflow. Brown and Bayko share a persistent memory buffer so decisions can be reviewed across multiple iterations. LlamaIndex also provides the FunctionTool and workflow abstractions that make the agent interactions transparent and replayable. The [`memory_handling.md`](./memory_handling.md) document covers the integration in detail and shows how messages are stored and evaluated.
 
 
 
 
 
131
 
132
+ Additional highlights:
133
+
134
+ - **Multi-modal GPT-4o** is used by Brown for image analysis and tool calling.
135
+ - **ReActAgent** drives Bayko's creative process with visible Thought/Action/Observation steps.
136
+ - **Modal** functions run heavy generation jobs (SDXL image creation, Codestral code execution) on serverless GPUs.
137
+ - A **unified memory** service combines in-memory chat logs with SQLite persistence for easy debugging and replay.
138
+ - Comprehensive tests under `tests/` demonstrate LLM integration, session management and end-to-end generation.
139
+
140
+ ---
141
+
142
+ ## πŸ’‘ Use Cases
143
+
144
+ The system is designed for quick story prototyping and creative experiments.
145
+ Typical scenarios include:
146
+
147
+ - Generating short comics from a single prompt with automatic style tagging.
148
+ - Running demo stories such as _"K-pop Idol & Puppy"_ via `run_pipeline.py`.
149
+ - Creating custom panels with narration and subtitles for accessibility.
150
+ - Experimenting with the `tools/fries.py` script for fun ASCII art or code generation using Mistral Codestral.
151
 
152
  ---
153
 
154
+ ## πŸš€ Future Enhancements
155
+
156
+ - **Richer Memory Backends** – plug in Redis or Postgres for cross-session persistence.
157
+ - **Advanced Evaluation** – leverage multimodal scoring to automatically rate image quality and narrative flow.
158
+ - **Interactive Web App** – combine the FastAPI backend and Gradio interface for real-time progress updates.
159
+ - **Additional Tools** – new Modal functions for style transfer, video exports and interactive AR panels.
160
+
161
+ ---
162
+
163
+ ## πŸ“‚ File Layout
164
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  ```
166
+ agents/
167
+ β”œβ”€β”€ brown.py # AgentBrown core class
168
+ β”œβ”€β”€ brown_tools.py # LlamaIndex tool wrappers
169
+ β”œβ”€β”€ brown_workflow.py # ReActAgent setup and toolflow
170
+ β”œβ”€β”€ bayko.py # AgentBayko executor
171
+ services/
172
+ β”œβ”€β”€ agent_memory.py # LlamaIndex memory wrapper
173
+ β”œβ”€β”€ simple_evaluator.py # Refinement logic
174
+ β”œβ”€β”€ session_manager.py # Handles session IDs and state
175
+ demo_pipeline.py # Run full Brown→Bayko test
176
+ app.py # Gradio interface
177
+ requirements.txt
178
+ ```
179
+
180
+ ---
181
+
182
+ ## 🏁 **Hackathon Submission Summary**
183
+
184
+ **Submitted for:**
185
+
186
+ - 🧠 **Track 1 – Agent Demo Track**
187
+ - πŸ“‘ **Track 2 – MCP Server Track**
188
+
189
+ **Key Innovation Highlights:**
190
+
191
+ ### πŸš€ **Technical Innovation**
192
+
193
+ - **Custom Workflow Events**: `ComicGeneratedEvent`, `CritiqueStartEvent`, `WorkflowPauseEvent`
194
+ - **Async Modal Integration**: Complex bridge between sync Modal functions and async LlamaIndex workflows
195
+ - **Multi-Modal Reasoning**: GPT-4V analyzing generated images for quality assessment
196
+ - **Agent Memory Persistence**: Cross-session conversation history with LlamaIndex Memory
197
+
198
+ ### 🎨 **Creative Vision**
199
+
200
+ - **Interactive Elements**: Code generation for comic viewers and interactive features
201
+ - **Accessibility Focus**: Multi-format output including subtitles and narration
202
+
203
+ ## 🌟 **Why This Matters**
204
+
205
+ **This isn't just a demo - it's a blueprint for sophisticated AI agent coordination.**
206
+
207
+ In a hackathon timeframe, building a system that:
208
+
209
+ - Coordinates multiple AI agents with distinct personalities and capabilities
210
+ - Integrates 5+ different AI services seamlessly
211
+ - Implements custom workflow patterns beyond existing frameworks
212
+ - Handles real-world challenges like rate limiting and async complexity
213
+ - Maintains code quality with comprehensive testing
214
+
215
+ **...represents significant technical achievement and innovation in the multi-agent AI space.**
216
+
217
+ ## 🎬 **Demo & Documentation**
218
+
219
+ - **Architecture Deep Dive**: [Memory Handling Guide](./memory_handling.md)
220
+ - **Test Suite**: Comprehensive tests in `tests/` directory
221
+ - **Modal Functions**: Production-ready SDXL and code execution in `tools/`
222
+
223
+ ---
224
+
225
+ _Let Bayko cook. Let Brown judge. Let comics happen._
226
+
227
+ **⭐ If you appreciate ambitious hackathon projects that push boundaries, this one's for you!**
memory_handling.md ADDED
@@ -0,0 +1,337 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Memory Handling for Bayko & Brown
2
+
3
+ ## Hackathon Implementation Guide
4
+
5
+ > 🎯 **Simple, real, shippable memory and evaluation for multi-agent comic generation**
6
+
7
+ ---
8
+
9
+ ## 🧠 LlamaIndex Memory Integration
10
+
11
+ ### Real Memory Class (Based on LlamaIndex Docs)
12
+
13
+ ```python
14
+ # services/agent_memory.py
15
+ from llama_index.core.memory import Memory
16
+ from llama_index.core.llms import ChatMessage
17
+
18
+ class AgentMemory:
19
+ """Simple wrapper around LlamaIndex Memory for agent conversations"""
20
+
21
+ def __init__(self, session_id: str, agent_name: str):
22
+ self.session_id = session_id
23
+ self.agent_name = agent_name
24
+
25
+ # Use LlamaIndex Memory with session-specific ID
26
+ self.memory = Memory.from_defaults(
27
+ session_id=f"{session_id}_{agent_name}",
28
+ token_limit=4000
29
+ )
30
+
31
+ def add_message(self, role: str, content: str):
32
+ """Add a message to memory"""
33
+ message = ChatMessage(role=role, content=content)
34
+ self.memory.put_messages([message])
35
+
36
+ def get_history(self):
37
+ """Get conversation history"""
38
+ return self.memory.get()
39
+
40
+ def clear(self):
41
+ """Clear memory for new session"""
42
+ self.memory.reset()
43
+ ```
44
+
45
+ ### Integration with Existing Agents
46
+
47
+ **Update Brown's memory (api/agents/brown.py):**
48
+
49
+ ```python
50
+ # Replace the LlamaIndexMemoryStub with real memory
51
+ from services.agent_memory import AgentMemory
52
+
53
+ class AgentBrown:
54
+ def __init__(self, max_iterations: int = 3):
55
+ self.max_iterations = max_iterations
56
+ self.session_id = None
57
+ self.iteration_count = 0
58
+
59
+ # Real LlamaIndex memory
60
+ self.memory = None # Initialize when session starts
61
+
62
+ # ... rest of existing code
63
+
64
+ def process_request(self, request: StoryboardRequest):
65
+ # Initialize memory for new session
66
+ self.session_id = f"session_{uuid.uuid4().hex[:8]}"
67
+ self.memory = AgentMemory(self.session_id, "brown")
68
+
69
+ # Log user request
70
+ self.memory.add_message("user", request.prompt)
71
+
72
+ # ... existing validation and processing logic
73
+
74
+ # Log Brown's decision
75
+ self.memory.add_message("assistant", f"Created generation request for Bayko")
76
+
77
+ return message
78
+ ```
79
+
80
+ **Update Bayko's memory (api/agents/bayko.py):**
81
+
82
+ ```python
83
+ # Add memory to Bayko
84
+ from services.agent_memory import AgentMemory
85
+
86
+ class AgentBayko:
87
+ def __init__(self):
88
+ # ... existing initialization
89
+ self.memory = None # Initialize when processing starts
90
+
91
+ async def process_generation_request(self, message: Dict[str, Any]):
92
+ session_id = message.get("context", {}).get("session_id")
93
+ self.memory = AgentMemory(session_id, "bayko")
94
+
95
+ # Log received request
96
+ self.memory.add_message("user", f"Received generation request: {message['payload']['prompt']}")
97
+
98
+ # ... existing generation logic
99
+
100
+ # Log completion
101
+ self.memory.add_message("assistant", f"Generated {len(panels)} panels successfully")
102
+
103
+ return result
104
+ ```
105
+
106
+ ### Optional: Sync with SQLite
107
+
108
+ ```python
109
+ # services/memory_sync.py
110
+ from services.turn_memory import AgentMemory as SQLiteMemory
111
+ from services.agent_memory import AgentMemory as LlamaMemory
112
+
113
+ def sync_to_sqlite(llama_memory: LlamaMemory, sqlite_memory: SQLiteMemory):
114
+ """Sync LlamaIndex memory to SQLite for persistence"""
115
+ history = llama_memory.get_history()
116
+
117
+ for message in history:
118
+ sqlite_memory.add_message(
119
+ session_id=llama_memory.session_id,
120
+ agent_name=llama_memory.agent_name,
121
+ content=message.content,
122
+ step_type="message"
123
+ )
124
+ ```
125
+
126
+ ---
127
+
128
+ ## βœ… Simple Evaluation Logic
129
+
130
+ ### Basic Evaluator Class
131
+
132
+ ```python
133
+ # services/simple_evaluator.py
134
+
135
+ class SimpleEvaluator:
136
+ """Basic evaluation logic for Brown's decision making"""
137
+
138
+ MAX_ATTEMPTS = 3 # Original + 2 revisions
139
+
140
+ def __init__(self):
141
+ self.attempt_count = 0
142
+
143
+ def evaluate(self, bayko_output: dict, original_prompt: str) -> dict:
144
+ """Evaluate Bayko's output and decide: approve, reject, or refine"""
145
+ self.attempt_count += 1
146
+
147
+ print(f"πŸ” Brown evaluating attempt {self.attempt_count}/{self.MAX_ATTEMPTS}")
148
+
149
+ # Rule 1: Auto-reject if dialogue in images
150
+ if self._has_dialogue_in_images(bayko_output):
151
+ return {
152
+ "decision": "reject",
153
+ "reason": "Images contain dialogue text - use subtitles instead",
154
+ "final": True
155
+ }
156
+
157
+ # Rule 2: Auto-reject if story is incoherent
158
+ if not self._is_story_coherent(bayko_output):
159
+ return {
160
+ "decision": "reject",
161
+ "reason": "Story panels don't follow logical sequence",
162
+ "final": True
163
+ }
164
+
165
+ # Rule 3: Force approve if max attempts reached
166
+ if self.attempt_count >= self.MAX_ATTEMPTS:
167
+ return {
168
+ "decision": "approve",
169
+ "reason": f"Max attempts ({self.MAX_ATTEMPTS}) reached - accepting current quality",
170
+ "final": True
171
+ }
172
+
173
+ # Rule 4: Check if output matches prompt intent
174
+ if self._matches_prompt_intent(bayko_output, original_prompt):
175
+ return {
176
+ "decision": "approve",
177
+ "reason": "Output matches prompt and quality is acceptable",
178
+ "final": True
179
+ }
180
+ else:
181
+ return {
182
+ "decision": "refine",
183
+ "reason": "Output needs improvement to better match prompt",
184
+ "final": False
185
+ }
186
+
187
+ def _has_dialogue_in_images(self, output: dict) -> bool:
188
+ """Check if panels mention dialogue in the image"""
189
+ panels = output.get("panels", [])
190
+
191
+ dialogue_keywords = [
192
+ "speech bubble", "dialogue", "talking", "saying",
193
+ "text in image", "speech", "conversation"
194
+ ]
195
+
196
+ for panel in panels:
197
+ description = panel.get("description", "").lower()
198
+ if any(keyword in description for keyword in dialogue_keywords):
199
+ print(f"❌ Found dialogue in image: {description}")
200
+ return True
201
+
202
+ return False
203
+
204
+ def _is_story_coherent(self, output: dict) -> bool:
205
+ """Basic check for story coherence"""
206
+ panels = output.get("panels", [])
207
+
208
+ if len(panels) < 2:
209
+ return True # Single panel is always coherent
210
+
211
+ # Check 1: All panels should have descriptions
212
+ descriptions = [p.get("description", "") for p in panels]
213
+ if any(not desc.strip() for desc in descriptions):
214
+ print("❌ Some panels missing descriptions")
215
+ return False
216
+
217
+ # Check 2: Panels shouldn't be identical (no progression)
218
+ if len(set(descriptions)) == 1:
219
+ print("❌ All panels are identical - no story progression")
220
+ return False
221
+
222
+ # Check 3: Look for obvious incoherence keywords
223
+ incoherent_keywords = [
224
+ "unrelated", "random", "doesn't make sense",
225
+ "no connection", "contradictory"
226
+ ]
227
+
228
+ full_text = " ".join(descriptions).lower()
229
+ if any(keyword in full_text for keyword in incoherent_keywords):
230
+ print("❌ Story contains incoherent elements")
231
+ return False
232
+
233
+ return True
234
+
235
+ def _matches_prompt_intent(self, output: dict, prompt: str) -> bool:
236
+ """Check if output generally matches the original prompt"""
237
+ panels = output.get("panels", [])
238
+
239
+ if not panels:
240
+ return False
241
+
242
+ # Simple keyword matching
243
+ prompt_words = set(prompt.lower().split())
244
+ panel_text = " ".join([p.get("description", "") for p in panels]).lower()
245
+ panel_words = set(panel_text.split())
246
+
247
+ # At least 20% of prompt words should appear in panel descriptions
248
+ overlap = len(prompt_words.intersection(panel_words))
249
+ match_ratio = overlap / len(prompt_words) if prompt_words else 0
250
+
251
+ print(f"πŸ“Š Prompt match ratio: {match_ratio:.2f}")
252
+ return match_ratio >= 0.2
253
+
254
+ def reset(self):
255
+ """Reset for new session"""
256
+ self.attempt_count = 0
257
+ ```
258
+
259
+ ### Integration with Brown
260
+
261
+ ```python
262
+ # Update Brown's review_output method
263
+ from services.simple_evaluator import SimpleEvaluator
264
+
265
+ class AgentBrown:
266
+ def __init__(self, max_iterations: int = 3):
267
+ # ... existing code
268
+ self.evaluator = SimpleEvaluator()
269
+
270
+ def review_output(self, bayko_response: Dict[str, Any], original_request: StoryboardRequest):
271
+ """Review Bayko's output using simple evaluation logic"""
272
+
273
+ print(f"πŸ€– Brown reviewing Bayko's output...")
274
+
275
+ # Use simple evaluator
276
+ evaluation = self.evaluator.evaluate(
277
+ bayko_response,
278
+ original_request.prompt
279
+ )
280
+
281
+ # Log to memory
282
+ self.memory.add_message(
283
+ "assistant",
284
+ f"Evaluation: {evaluation['decision']} - {evaluation['reason']}"
285
+ )
286
+
287
+ if evaluation["decision"] == "approve":
288
+ print(f"βœ… Brown approved: {evaluation['reason']}")
289
+ return self._create_approval_message(bayko_response, evaluation)
290
+
291
+ elif evaluation["decision"] == "reject":
292
+ print(f"❌ Brown rejected: {evaluation['reason']}")
293
+ return self._create_rejection_message(bayko_response, evaluation)
294
+
295
+ else: # refine
296
+ print(f"πŸ”„ Brown requesting refinement: {evaluation['reason']}")
297
+ return self._create_refinement_message(bayko_response, evaluation)
298
+ ```
299
+
300
+ ---
301
+
302
+ ## πŸš€ Implementation Steps
303
+
304
+ ### Day 1: Memory Integration
305
+
306
+ 1. **Install LlamaIndex**: `pip install llama-index`
307
+ 2. **Create `services/agent_memory.py`** with the Memory wrapper above
308
+ 3. **Update Brown and Bayko** to use real memory instead of stubs
309
+ 4. **Test**: Verify agents can store and retrieve conversation history
310
+
311
+ ### Day 2: Evaluation Logic
312
+
313
+ 1. **Create `services/simple_evaluator.py`** with the evaluation class above
314
+ 2. **Update Brown's `review_output` method** to use SimpleEvaluator
315
+ 3. **Test**: Verify 3-attempt limit and rejection rules work
316
+ 4. **Optional**: Add memory sync to SQLite for persistence
317
+
318
+ ### Day 3: Testing & Polish
319
+
320
+ 1. **End-to-end testing** with various prompts
321
+ 2. **Console logging** to show evaluation decisions
322
+ 3. **Bug fixes** and edge case handling
323
+ 4. **Demo preparation**
324
+
325
+ ---
326
+
327
+ ## πŸ“‹ Success Criteria
328
+
329
+ - [ ] **Memory Works**: Agents store multi-turn conversations using LlamaIndex
330
+ - [ ] **Evaluation Works**: Brown makes approve/reject/refine decisions
331
+ - [ ] **3-Attempt Limit**: System stops after original + 2 revisions
332
+ - [ ] **Auto-Rejection**: Dialogue-in-images and incoherent stories are rejected
333
+ - [ ] **End-to-End**: Complete user prompt β†’ comic generation β†’ evaluation cycle
334
+
335
+ ---
336
+
337
+ _Simple, real, shippable. Perfect for a hackathon demo._
run_pipeline.py ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Agentic Comic Generator - Main Pipeline
3
+ Hackathon demo showcasing Agent Brown with LlamaIndex ReActAgent
4
+ """
5
+
6
+ import os
7
+ import asyncio
8
+ from agents.brown_workflow import create_brown_workflow
9
+
10
+
11
+ def main():
12
+ """
13
+ Main pipeline for the Agentic Comic Generator
14
+ Demonstrates Agent Brown using LlamaIndex ReActAgent for hackathon
15
+ """
16
+
17
+ print("🎨 Agentic Comic Generator - Hackathon Demo")
18
+ print("πŸ† Powered by LlamaIndex ReActAgent")
19
+ print("=" * 60)
20
+
21
+ # Check for OpenAI API key
22
+ if not os.getenv("OPENAI_API_KEY"):
23
+ print("❌ Error: OPENAI_API_KEY environment variable not set")
24
+ print("Please set your OpenAI API key:")
25
+ print("export OPENAI_API_KEY='your-api-key-here'")
26
+ return
27
+
28
+ # Create Brown workflow
29
+ print("πŸ€– Initializing Agent Brown MultiModal ReAct Agent...")
30
+ workflow = create_brown_workflow(max_iterations=3)
31
+ print("βœ… Agent Brown ready!")
32
+
33
+ # Example prompts for demo
34
+ demo_prompts = [
35
+ {
36
+ "title": "K-pop Idol & Puppy Story",
37
+ "prompt": "A moody K-pop idol finds a puppy on the street. It changes everything. Use Studio Ghibli style with soft colors and 4 panels.",
38
+ },
39
+ {
40
+ "title": "Robot Artist Story",
41
+ "prompt": "A robot learns to paint in a post-apocalyptic world. Make it emotional and colorful with manga style.",
42
+ },
43
+ {
44
+ "title": "Magical Portal Adventure",
45
+ "prompt": "Two friends discover a magical portal in their school library. Adventure awaits! Use whimsical style with 6 panels.",
46
+ },
47
+ ]
48
+
49
+ print(f"\nπŸ“š Available Demo Stories ({len(demo_prompts)} options):")
50
+ for i, story in enumerate(demo_prompts, 1):
51
+ print(f" {i}. {story['title']}")
52
+
53
+ print("\n" + "=" * 60)
54
+
55
+ # Interactive mode
56
+ while True:
57
+ print("\n🎯 Choose an option:")
58
+ print("1-3: Run demo story")
59
+ print("4: Enter custom prompt")
60
+ print("q: Quit")
61
+
62
+ choice = input("\nYour choice: ").strip().lower()
63
+
64
+ if choice == "q":
65
+ print("πŸ‘‹ Thanks for trying the Agentic Comic Generator!")
66
+ break
67
+
68
+ elif choice in ["1", "2", "3"]:
69
+ story_idx = int(choice) - 1
70
+ story = demo_prompts[story_idx]
71
+
72
+ print(f"\n🎬 Running Demo: {story['title']}")
73
+ print("=" * 60)
74
+
75
+ # Process the story
76
+ result = workflow.process_comic_request(story["prompt"])
77
+ print(result)
78
+
79
+ elif choice == "4":
80
+ print("\n✏️ Enter your custom story prompt:")
81
+ custom_prompt = input("Prompt: ").strip()
82
+
83
+ if custom_prompt:
84
+ print(f"\n🎬 Processing Custom Story")
85
+ print("=" * 60)
86
+
87
+ result = workflow.process_comic_request(custom_prompt)
88
+ print(result)
89
+ else:
90
+ print("❌ Empty prompt. Please try again.")
91
+
92
+ else:
93
+ print("❌ Invalid choice. Please try again.")
94
+
95
+ print("\n" + "=" * 60)
96
+
97
+
98
+ async def async_demo():
99
+ """
100
+ Async demo version for testing async capabilities
101
+ """
102
+ print("🎨 Agentic Comic Generator - Async Demo")
103
+ print("=" * 60)
104
+
105
+ if not os.getenv("OPENAI_API_KEY"):
106
+ print("❌ Error: OPENAI_API_KEY environment variable not set")
107
+ return
108
+
109
+ # Create workflow
110
+ workflow = create_brown_workflow(max_iterations=3)
111
+
112
+ # Test prompt
113
+ prompt = "A moody K-pop idol finds a puppy on the street. It changes everything. Use Studio Ghibli style."
114
+
115
+ print("πŸ”„ Processing async request...")
116
+ result = await workflow.process_comic_request_async(prompt)
117
+ print(result)
118
+
119
+
120
+ def quick_test():
121
+ """
122
+ Quick test function for development
123
+ """
124
+ print("πŸ§ͺ Quick Test - Agent Brown ReAct Demo")
125
+ print("=" * 50)
126
+
127
+ if not os.getenv("OPENAI_API_KEY"):
128
+ print("❌ Error: OPENAI_API_KEY environment variable not set")
129
+ return
130
+
131
+ # Create workflow
132
+ workflow = create_brown_workflow(max_iterations=3)
133
+
134
+ # Test prompt
135
+ test_prompt = "A robot learns to paint. Make it emotional with 3 panels."
136
+
137
+ print(f"πŸ“ Test Prompt: {test_prompt}")
138
+ print("\nπŸ”„ Processing...")
139
+
140
+ result = workflow.process_comic_request(test_prompt)
141
+ print(result)
142
+
143
+
144
+ if __name__ == "__main__":
145
+ import sys
146
+
147
+ if len(sys.argv) > 1:
148
+ if sys.argv[1] == "test":
149
+ quick_test()
150
+ elif sys.argv[1] == "async":
151
+ asyncio.run(async_demo())
152
+ else:
153
+ print("Usage: python run_pipeline.py [test|async]")
154
+ else:
155
+ main()
tech_specs.md CHANGED
@@ -110,12 +110,14 @@ def generate_comic_panel(prompt: str, style: str) -> bytes:
110
 
111
  ### Sponsor API Integration
112
 
113
- - **OpenAI GPT-4**: Dialogue generation and character voice consistency
114
- - **Mistral**: Style adaptation and tone refinement
115
- - **HuggingFace**: SDXL model hosting and inference
116
- - **Modal**: Serverless GPU compute for image/audio generation
 
 
117
 
118
- > Mistral Agents: Investigated experimental client.beta.agents framework for dynamic task routing, but deferred due to limited stability at time of build.
119
 
120
  ### LlamaIndex Agent Memory
121
 
@@ -178,63 +180,28 @@ def create_comic_interface():
178
 
179
  ## πŸš€ Deployment Configuration
180
 
181
- ### HuggingFace Spaces Frontend
182
-
183
- ```yaml
184
- # spaces_config.yml
185
- title: Agentic Comic Generator
186
- emoji: 🎨
187
- colorFrom: blue
188
- colorTo: purple
189
- sdk: gradio
190
- sdk_version: '4.0.0'
191
- app_file: app.py
192
- pinned: false
193
- license: mit
194
- ```
195
-
196
- ### Modal Backend Services
197
-
198
- ```python
199
- # modal_app.py
200
- import modal
201
-
202
- app = modal.App("agentic-comic-generator")
203
 
204
- # Shared volume for agent state persistence
205
- volume = modal.Volume.from_name("comic-generator-storage")
206
-
207
- @app.function(
208
- image=modal.Image.debian_slim().pip_install_from_requirements("requirements.txt"),
209
- volumes={"/storage": volume},
210
- keep_warm=1
211
- )
212
- def agent_orchestrator():
213
- # Main agent coordination logic
214
- pass
215
- ```
216
 
217
- ### Environment Configuration
218
 
219
  ```python
220
- # config.py
221
- import os
222
- from pydantic import BaseSettings
223
-
224
- class Settings(BaseSettings):
225
- # Sponsor API Keys
226
- openai_api_key: str = os.getenv("OPENAI_API_KEY")
227
- mistral_api_key: str = os.getenv("MISTRAL_API_KEY")
228
- hf_token: str = os.getenv("HF_TOKEN")
229
-
230
- # Modal configuration
231
- modal_token_id: str = os.getenv("MODAL_TOKEN_ID")
232
- modal_token_secret: str = os.getenv("MODAL_TOKEN_SECRET")
233
-
234
- # Application settings
235
- max_iterations: int = 3
236
- timeout_seconds: int = 300
237
- debug_mode: bool = False
238
  ```
239
 
240
  ---
 
110
 
111
  ### Sponsor API Integration
112
 
113
+ | Service | Primary Use | Secondary Use |
114
+ | ---------------- | ------------------------------ | ------------------- |
115
+ | **OpenAI GPT-4** | Agent reasoning & tool calling | Dialogue generation |
116
+ | **Mistral** | Code generation & execution | Style adaptation |
117
+ | **HuggingFace** | SDXL model hosting | Model inference |
118
+ | **Modal** | Serverless GPU compute | Sandbox execution |
119
 
120
+ > **Note**: Investigated Mistral's experimental `client.beta.agents` framework for dynamic task routing, but deferred due to limited stability during hackathon timeframe.
121
 
122
  ### LlamaIndex Agent Memory
123
 
 
180
 
181
  ## πŸš€ Deployment Configuration
182
 
183
+ ### Multi-Service Architecture
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
 
185
+ | Component | Platform | Configuration |
186
+ | ----------------- | ------------------ | ------------------------------- |
187
+ | **Frontend** | HuggingFace Spaces | Gradio 4.0.0, Real-time UI |
188
+ | **Backend** | Modal Functions | GPU compute, persistent storage |
189
+ | **Orchestration** | LlamaIndex | Agent coordination & memory |
 
 
 
 
 
 
 
190
 
191
+ ### Environment Variables
192
 
193
  ```python
194
+ # Required API keys for sponsor integrations
195
+ OPENAI_API_KEY=your_openai_key
196
+ MISTRAL_API_KEY=your_mistral_key
197
+ HF_TOKEN=your_huggingface_token
198
+ MODAL_TOKEN_ID=your_modal_id
199
+ MODAL_TOKEN_SECRET=your_modal_secret
200
+
201
+ # Application settings
202
+ MAX_ITERATIONS=3
203
+ TIMEOUT_SECONDS=300
204
+ DEBUG_MODE=false
 
 
 
 
 
 
 
205
  ```
206
 
207
  ---