CHARIOT

Community Article Published October 6, 2025

A signal-based model router built inside the Augustus framework

Hi, I’m Marcus, founder of Augustus.
I build systems that make AI infrastructure faster, leaner, and more predictable.
My work usually sits somewhere between logic, geometry, and rhythm,
trying to make machines act with precision rather than probability.

Over the past weeks, I’ve been working on Chariot,
a small but ambitious project inside the Augustus framework.
It’s a pre-LLM router that decides which model to use (GPT-4.1-mini, GPT-5-mini, or GPT-5)
based purely on signal, not semantics.

🧠 Why I built it

Most routers today use an LLM to decide which LLM to use.
That’s circular logic, you spend tokens just to decide where to spend more tokens.
I wanted to break that pattern.

Chariot doesn’t read language, it reads form.
It analyzes a prompt as rhythm, density, and structure.
It turns words into numbers, and numbers into a waveform.
From that waveform comes one value, the signal score, that decides which model fits best.

To me, it feels like a bridge between human and machine.
A system that doesn’t interpret, it recognizes.

⚙️ The first results

The first version is built in Python, running on Flask and Gunicorn.
The entire API is under 10 KB, lightweight and minimal.
No dependencies, no extra logic, just signal analysis and a decision.

Example API call

POST /analyze
{
  "prompt": "Why do we fear death?"
}

Response

{
  "model": "gpt-5-mini",
  "signal_score": 1.72
}

The API processes requests with an average response time below 200 ms, even under load.
Decisions feel nearly instantaneous.
Across 1,000 diverse prompts, Chariot reduced GPT-5 usage by 55%
without any measurable loss in result quality.

Prompt Selected Model Signal Score
good morning GPT-4.1-mini 0.87
what is 2 + 2 GPT-4.1-mini 1.18
why do we fear death GPT-5-mini 1.72
can a computer learn love GPT-5 2.37

📊 Benchmarks

Test Without Chariot With Chariot Savings
1,000 mixed prompts 100% GPT-5 usage 44% GPT-5 usage −55% cost

Routing latency: <200 ms
Integration overhead: none

💬 What it’s teaching me

It’s not perfect, and it’s not finished.
But it’s alive, a deterministic system that listens to signal instead of meaning.
And maybe that’s enough for now.

If you’ve built something similar or have ideas on where this could evolve,
I’d really like to hear your thoughts.

Community

Sign up or log in to comment