Architect8999's picture
Switch Space SDK to docker; bump gradio for pillow 12 compat
a913714 verified
metadata
title: Rhodawk AI DevSecOps Engine
emoji: πŸ¦…
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0

Python Gradio OpenRouter HuggingFace Docker License


Languages MCP Servers Models Files



"The next generation of security tooling does not find known CVEs. It finds the assumptions that developers got wrong β€” before attackers do."



πŸš€ Mythos-Level Upgrade

A complete blueprint for elevating Rhodawk to Claude Mythos-class autonomous vulnerability research lives under mythos/ β€” see mythos/MYTHOS_PLAN.md for the full living plan (multi-agent framework, probabilistic reasoning, advanced static / dynamic / exploit tooling, RL self-improvement, new MCP servers, FastAPI productization). Enable with RHODAWK_MYTHOS=1 or hit the new productization API at POST /v1/analyze_target (run uvicorn mythos.api.fastapi_server:app).

Layer Module
Multi-agent (Planner / Explorer / Executor) mythos/agents/
Probabilistic hypothesis engine + attack graphs mythos/reasoning/
Static (Tree-sitter, Joern, CodeQL, Semgrep) mythos/static/
Dynamic (AFL++, KLEE, QEMU, Frida, GDB) mythos/dynamic/
Exploit (Pwntools, ROPGadget, heap, privesc) mythos/exploit/
Self-improvement (RL, MLflow, LoRA, curriculum, episodic) mythos/learning/
New MCP servers (5Γ—) mythos/mcp/ (registered in mcp_config.json)
Productization API mythos/api/

What Rhodawk Actually Is

Rhodawk is a fully autonomous code repair and vulnerability research system. Point it at any GitHub repository. It clones the code, runs the tests, generates fixes using state-of-the-art LLMs, passes every fix through a 7-layer security pipeline, and opens a verified pull request β€” with no human involvement in the loop unless you require it.

When tests are already passing, it switches into attack mode: it autonomously generates property-based fuzz tests, discovers invariant violations, and hands the crash payloads back to itself for patching. It is a self-healing system that simultaneously acts as its own red team.


The Full Autonomous Loop

flowchart TD
    A([🎯 Target Repository]) --> B[Clone & Fingerprint\nLanguage Detection]
    B --> C{Tests Passing?}

    C -->|FAILING| D[🧠 Retrieve Similar Fixes\nVector Memory β€” CodeBERT / MiniLM]
    D --> E[⚑ Dispatch Aider via MCP\nLLM Fix Generation]
    E --> F[βœ… Re-run Test Suite\nVerification Loop]
    F --> G{Fixed?}
    G -->|NO β€” retry| E
    G -->|YES| H

    C -->|PASSING| RT[πŸ”΄ Red Team CEGIS Engine\nAutonomous Attack Mode]
    RT --> RTA[AST Complexity Scoring\nAttack Surface Ranking]
    RTA --> RTB[Red Team LLM\nHypothesis PBT Synthesis]
    RTB --> RTC[Deterministic Fuzz Loop\nCounter-Example Extraction]
    RTC --> RTD[Crash Payload Found]
    RTD --> E

    H[πŸ”¬ SAST Gate\nBandit + 16-Pattern Secret Scanner]
    H --> I[πŸ”— Supply Chain Gate\npip-audit + Typosquatting Detection]
    I --> J[βš”οΈ Z3 Formal Verification\nInteger Overflow + Invariant Proofs]
    J --> K[πŸ—³οΈ 3-Model Adversarial Consensus\nDeepSeek-R1 βˆ₯ Llama-3.3-70B βˆ₯ Gemma-3-27B]
    K --> L{2/3 Majority?}
    L -->|REJECTED| E
    L -->|APPROVED| M[πŸ† Conviction Engine\n7-Criteria Auto-Merge Gate]
    M --> N[πŸ“‹ Open PR + Audit Trail\nSHA-256 Tamper-Evident Log]
    N --> O[πŸŽ“ Training Store\nData Flywheel β€” JSONL Export]

    style A fill:#e94560,color:#fff
    style RT fill:#d63031,color:#fff
    style K fill:#6C5CE7,color:#fff
    style M fill:#00b894,color:#fff
    style O fill:#0984e3,color:#fff

Architecture at a Glance

graph LR
    subgraph UI["πŸ–₯️  Gradio Control Plane  (port 7860)"]
        APP[app.py\n2,311 lines]
    end

    subgraph ORCH["🧠  Intelligence Layer"]
        HO[hermes_orchestrator.py\nAutonomous Security Research]
        VL[verification_loop.py\nRetry-with-Context]
        CE[conviction_engine.py\nAuto-Merge Gate]
        AR[adversarial_reviewer.py\n3-Model Consensus]
    end

    subgraph ANALYSIS["πŸ”¬  Analysis Engines"]
        LR2[language_runtime.py\n7 Languages]
        RTF[red_team_fuzzer.py\nCEGIS Engine]
        FV[formal_verifier.py\nZ3 SMT Solver]
        SYM[symbolic_engine.py\nAngr Symbolic Exec]
        TA[taint_analyzer.py\nDataflow Analysis]
        CI[cve_intel.py\nNVD + SSEC Algorithm]
    end

    subgraph SECURITY["πŸ›‘οΈ  Security Gates"]
        SG[sast_gate.py\nBandit + Secrets]
        SC[supply_chain.py\npip-audit + Typosquat]
        VC[vuln_classifier.py\nCWE β†’ CVSS]
    end

    subgraph MEMORY["πŸ’Ύ  Memory & Learning"]
        EM[embedding_memory.py\nSQLite / Qdrant]
        TS[training_store.py\nData Flywheel]
        LS[lora_scheduler.py\nFine-tune Export]
    end

    subgraph OUTPUT["πŸ“€  Output Layer"]
        BG[bounty_gateway.py\nHackerOne / Bugcrowd]
        WH[webhook_server.py\nPort 7861]
        AL[audit_logger.py\nSHA-256 Chain]
    end

    APP --> ORCH
    APP --> ANALYSIS
    APP --> SECURITY
    ORCH --> MEMORY
    ANALYSIS --> SECURITY
    SECURITY --> OUTPUT
    MEMORY --> TS --> LS

Five Custom Algorithms β€” Built From Scratch

VES
Vulnerability Entropy Score
Quantifies how surprising a code path is. Combines cyclomatic complexity (via Radon), dataflow depth, and deviation from the repository's own baseline. High-VES paths are statistically anomalous execution routes that warrant deeper analysis β€” the mathematical definition of "this shouldn't work this way."
TVG
Temporal Vulnerability Graph
A directed graph over commit history that models how a single faulty assumption propagates through the codebase as other developers build on top of it. Identifies the root-cause commit, computes blast radius, and scores the danger of downstream dependents β€” giving patches a priority order.
ACTS
Adversarial Consensus Trust Score
Bayesian aggregation of three independent LLM adversarial reviews run concurrently. Each model votes APPROVE / REJECT / CONDITIONAL. The final score weights vote consistency, argument specificity, and historical calibration of each model against this codebase's fix patterns. Requires 2/3 majority.
CAD
Commit Anomaly Detection
Statistical outlier detection over git history. Computes a distribution of diff characteristics (size, churn, file types touched, message entropy) and flags commits that pattern-match against known silent security patches β€” the ones developers push without saying what they really fixed.
SSEC
Semantic Similarity Exploit Chain
Embeds known CVE exploit patterns using microsoft/codebert-base and runs cosine similarity against repository code at the function level. Surfaces "structurally resembles CWE-X" findings even before any test failure or crash β€” pure static semantic matching against 100+ historical exploit primitives.

The Security Research Pipeline (Hermes)

Beyond repair, the Hermes orchestrator runs a full autonomous vulnerability research sweep in six phases:

sequenceDiagram
    participant H as 🧠 Hermes
    participant R as πŸ”­ RECON
    participant S as πŸ”¬ STATIC
    participant D as πŸ’₯ DYNAMIC
    participant E as βš”οΈ EXPLOIT
    participant C as πŸ—³οΈ CONSENSUS
    participant O as πŸ“‹ HUMAN OPERATOR

    H->>R: Clone + fingerprint + map attack surface
    R->>S: Attack surface map + complexity scores
    S->>S: Taint analysis + CWE matching + SSEC
    S->>D: Confirmed code paths + VES scores
    D->>D: Generate Hypothesis PBT harnesses
    D->>E: Crash payloads + stack traces
    E->>E: Classify primitives: overflow/UAF/race/injection
    E->>C: All findings + exploit chains
    C->>C: 3-model adversarial verdict
    C->>O: PENDING_HUMAN_APPROVAL
    O-->>O: Human reviews + clicks Approve
    O->>O: Submit to HackerOne / GitHub Advisory

Nothing is submitted to any bug bounty platform without a human clicking "Approve & Submit." The gate is enforced at the API call level in bounty_gateway.py β€” not just in the UI.


Supported Languages

Language Detection Test Runner SAST Tool Supply Chain
Python pytest.ini / setup.py pytest / uv Bandit + Semgrep pip-audit
JavaScript package.json Jest / Mocha / Vitest eslint-security npm audit
TypeScript tsconfig.json Same as JS + tsc Same as JS npm audit
Java pom.xml / build.gradle JUnit / TestNG Semgrep-Java OWASP dep-check
Go go.mod go test gosec govulncheck
Rust Cargo.toml cargo test clippy cargo-audit
Ruby Gemfile RSpec / Minitest brakeman bundle-audit

Language detection is automatic. No configuration required β€” Rhodawk fingerprints the cloned repository and selects the correct runtime, test runner, SAST tool, and dependency auditor.


The MCP Server Suite β€” 25 Integrated Tools

Click to expand the full MCP server manifest
Server Command What It Does
fetch-docs uvx mcp-server-fetch Fetch CVE advisories, exploit PoCs, vendor bulletins β€” 40+ security domains allowlisted
github-manager npx @modelcontextprotocol/server-github Create PRs, open security advisories, query commit history
filesystem-research npx @modelcontextprotocol/server-filesystem Read-only access to cloned repos and research scratch space
memory-store npx @modelcontextprotocol/server-memory Persistent knowledge graph β€” exploit chains, CWE patterns, cross-session memory
sequential-thinking npx @modelcontextprotocol/server-sequential-thinking Structured chain-of-thought for multi-step vulnerability reasoning
web-search npx @modelcontextprotocol/server-brave-search Search CVEs, exploit PoCs, bug bounty writeups, research papers
git-forensics npx @modelcontextprotocol/server-git Deep git history: silent patches (CAD), blame tracking, anomaly detection
postgres-intelligence npx @modelcontextprotocol/server-postgres Query findings DB, scan history, vulnerability intelligence store
sqlite-findings npx @modelcontextprotocol/server-sqlite Fast queries on vulnerability metadata, CVSS scores, bounty estimates
nuclei-scanner uvx mcp-server-shell (nuclei) Template-based DAST, CVE detection, misconfiguration scanning
semgrep-sast uvx mcp-server-shell (semgrep) Taint analysis, CWE pattern matching, secrets detection β€” 30+ languages
trufflehog-secrets uvx mcp-server-shell (trufflehog) High-signal secret scanning with 700+ detectors across git history
bandit-sast uvx mcp-server-shell (bandit) AST-level Python SAST: injection sinks, insecure APIs, dangerous patterns
pip-audit-sca uvx mcp-server-shell (pip-audit) SCA via OSV and PyPI Advisory DB β€” known vulnerabilities in Python deps
osv-scanner uvx mcp-server-shell (osv-scanner) Multi-ecosystem SCA using the Open Source Vulnerability database (Google)
z3-formal-verifier uvx mcp-server-shell (python3) Z3 SMT solver β€” formal verification of integer bounds and overflow invariants
hypothesis-fuzzer uvx mcp-server-shell (hypothesis) Property-based testing: arithmetic overflow, encoding bugs, aliasing
atheris-fuzzer uvx mcp-server-shell (atheris) Coverage-guided libFuzzer-backed Python fuzzing for parser bugs
angr-symbolic uvx mcp-server-shell (python3) angr symbolic execution β€” binary analysis, path exploration, constraint solving
radon-complexity uvx mcp-server-shell (radon) Cyclomatic complexity + Halstead metrics + attack surface ranking
ruff-linter uvx mcp-server-shell (ruff) Ultra-fast linter detecting anti-patterns that correlate with security bugs
aider-patcher uvx mcp-server-shell (aider) Applies LLM-generated patches with diff verification and test re-run
cve-intelligence uvx mcp-server-fetch (NVD) Full CVE details, CVSS vectors, CWE mappings, affected version ranges
bounty-platform uvx mcp-server-fetch HackerOne / Bugcrowd / Intigriti / YesWeHack report submission
supply-chain-monitor uvx mcp-server-fetch PyPI typosquatting, dependency confusion, malicious package detection

The Data Flywheel

Every fix attempt β€” successful or failed β€” is written to a structured training store. The schema captures the complete chain:

failing test β†’ memory retrieval query β†’ LLM prompt β†’
generated diff β†’ SAST results β†’ adversarial verdict β†’ test outcome β†’ human decision

This creates a proprietary fine-tuning dataset that compounds in value over time. After 50+ high-quality fixes accumulate, the LoRA scheduler exports a JSONL file ready for HuggingFace PEFT/TRL or AutoTrain:

{
  "messages": [
    {"role": "user", "content": "<test failure trace + repo context + retrieved similar fixes>"},
    {"role": "assistant", "content": "<verified diff that passed all 7 gates>"}
  ]
}

Each training cycle makes the model progressively better at fixing failures in your specific codebase. No external vendor has access to this data. It is yours.


Required API Keys

Variable Required Details
GITHUB_TOKEN βœ… Yes Personal Access Token with repo + security_events scopes. Used to clone repos, open PRs, and create GitHub Security Advisories. Create one here.
OPENROUTER_API_KEY βœ… Yes All LLM calls route through OpenRouter. Default models are on the free tier β€” you can run this system at zero LLM cost. Get a key here.
GITHUB_REPO ⬜ Optional Target in owner/repo format. Can also be supplied at runtime via the chat UI.
RHODAWK_AUTO_MERGE ⬜ Optional Default: false. Set to true to enable autonomous PR merge when all 7 conviction criteria pass.
RHODAWK_LORA_ENABLED ⬜ Optional Default: false. Set to true to activate the LoRA fine-tune export pipeline.
DB_BACKEND ⬜ Optional Default: sqlite. Set to postgres with DATABASE_URL for production persistence.
HACKERONE_API_KEY ⬜ Optional Enables HackerOne report submission from the bounty gateway (human approval still required).
NVD_API_KEY ⬜ Optional Unlocks higher rate limits on the NIST NVD CVE API. Free to request at nvd.nist.gov.
BRAVE_API_KEY ⬜ Optional Enables Brave Search MCP tool for the Hermes web search capability.

Running Locally

Step 1 β€” Clone

git clone https://github.com/Rhodawk-AI/Rhodawk-devops-engine.git
cd Rhodawk-devops-engine

Step 2 β€” Install Python dependencies

pip install -r requirements.txt

atheris is excluded from requirements β€” it requires Clang + libFuzzer at compile time, unavailable on most CI images. The system automatically falls back to hypothesis for all fuzzing tasks.

Step 3 β€” Install MCP servers

npm install -g \
  @modelcontextprotocol/server-github \
  @modelcontextprotocol/server-memory \
  @modelcontextprotocol/server-filesystem \
  @modelcontextprotocol/server-sequential-thinking \
  @modelcontextprotocol/server-brave-search \
  @modelcontextprotocol/server-git

Step 4 β€” Configure environment

export GITHUB_TOKEN="ghp_your_token_here"
export OPENROUTER_API_KEY="sk-or-your_key_here"
export GITHUB_REPO="owner/repo"      # optional β€” can set in UI
mkdir -p /data

Step 5 β€” Run

python -u app.py

Gradio UI: http://localhost:7860 Webhook server: http://localhost:7861


Docker

# Build
docker build -t rhodawk-ai .

# Run
docker run -d \
  -p 7860:7860 \
  -p 7861:7861 \
  -v rhodawk_data:/data \
  -e GITHUB_TOKEN="ghp_your_token_here" \
  -e OPENROUTER_API_KEY="sk-or-your_key_here" \
  -e GITHUB_REPO="owner/target-repo" \
  rhodawk-ai

HuggingFace Spaces Deployment

1. Go to:  huggingface.co/spaces/Architect8999/rhodawk-ai-devops-engine
2. Duplicate the Space (top-right button)
3. Add Secrets in Space Settings:
      GITHUB_TOKEN  β†’  your GitHub PAT
      OPENROUTER_API_KEY  β†’  your OpenRouter key
4. The Space builds and runs automatically via the included Dockerfile

Event-Driven Mode β€” GitHub Webhook

Make Rhodawk trigger automatically on every CI failure:

GitHub repo β†’ Settings β†’ Webhooks β†’ Add webhook

  Payload URL:   https://your-space.hf.space/webhook/github
  Content type:  application/json
  Secret:        (set RHODAWK_WEBHOOK_SECRET to the same value)
  Events:        Push, Check runs, Status

From this point forward, every failing CI run triggers the full autonomous repair loop with no manual intervention.

Supported webhook endpoints:

POST /webhook/github     GitHub push / check_run / status (HMAC-SHA256 validated)
POST /webhook/ci         Generic CI failure payload (any CI system)
POST /webhook/trigger    Manual trigger with repo + test path
GET  /webhook/health     Health check
GET  /webhook/queue      Current job queue status

Repository Structure

Click to expand β€” all 42 source files with descriptions
rhodawk-devops-engine/
β”‚
β”œβ”€β”€ πŸŽ›οΈ  CONTROL PLANE
β”‚   β”œβ”€β”€ app.py                      Main entry point. Gradio UI + full audit loop. (2,311 lines)
β”‚   └── webhook_server.py           Event-driven server on port 7861. GitHub/CI webhooks.
β”‚
β”œβ”€β”€ 🧠  INTELLIGENCE
β”‚   β”œβ”€β”€ hermes_orchestrator.py      6-phase autonomous security research agent. (715 lines)
β”‚   β”œβ”€β”€ adversarial_reviewer.py     3-model concurrent consensus code review.
β”‚   β”œβ”€β”€ verification_loop.py        Retry-with-context fix loop.
β”‚   └── conviction_engine.py        7-criteria auto-merge gate.
β”‚
β”œβ”€β”€ 🌐  LANGUAGE RUNTIMES
β”‚   └── language_runtime.py         Python/JS/TS/Java/Go/Rust/Ruby abstraction. (1,540 lines)
β”‚
β”œβ”€β”€ πŸ”΄  RED TEAM ENGINE
β”‚   └── red_team_fuzzer.py          CEGIS autonomous attack engine. (1,561 lines)
β”‚
β”œβ”€β”€ πŸ”¬  ANALYSIS
β”‚   β”œβ”€β”€ taint_analyzer.py           Dataflow taint: source-to-sink tracking.
β”‚   β”œβ”€β”€ symbolic_engine.py          Angr symbolic execution + path exploration.
β”‚   β”œβ”€β”€ formal_verifier.py          Z3 SMT: integer overflow + invariant proofs.
β”‚   β”œβ”€β”€ fuzzing_engine.py           Hypothesis PBT harness generator.
β”‚   β”œβ”€β”€ exploit_primitives.py       Overflow / UAF / race / injection classification.
β”‚   β”œβ”€β”€ harness_factory.py          PoC harness compiler for operator-reviewed gaps.
β”‚   β”œβ”€β”€ chain_analyzer.py           Multi-primitive vulnerability chain synthesizer.
β”‚   β”œβ”€β”€ commit_watcher.py           CAD: silent security patch detection.
β”‚   β”œβ”€β”€ repo_harvester.py           Autonomous target repository selection.
β”‚   └── semantic_extractor.py       AST-level feature extraction for VES scoring.
β”‚
β”œβ”€β”€ πŸ›‘οΈ  SECURITY GATES
β”‚   β”œβ”€β”€ sast_gate.py                Bandit + 16-pattern secret scanner.
β”‚   β”œβ”€β”€ supply_chain.py             pip-audit + typosquatting detection.
β”‚   β”œβ”€β”€ vuln_classifier.py          CWE taxonomy β†’ CVSS scoring β†’ severity.
β”‚   └── cve_intel.py                NVD/CVE API + SSEC algorithm.
β”‚
β”œβ”€β”€ πŸ’Ύ  MEMORY & LEARNING
β”‚   β”œβ”€β”€ embedding_memory.py         Dual-backend: SQLite/MiniLM or Qdrant/CodeBERT.
β”‚   β”œβ”€β”€ memory_engine.py            Fix outcome tracking + similarity retrieval.
β”‚   β”œβ”€β”€ training_store.py           SQLite/Postgres training data flywheel.
β”‚   └── lora_scheduler.py           LoRA fine-tune export scheduler.
β”‚
β”œβ”€β”€ πŸ“€  OUTPUT & DISCLOSURE
β”‚   β”œβ”€β”€ bounty_gateway.py           HackerOne / Bugcrowd / GitHub Advisories gateway.
β”‚   β”œβ”€β”€ disclosure_vault.py         90-day coordinated disclosure timeline vault.
β”‚   β”œβ”€β”€ audit_logger.py             Append-only SHA-256 tamper-evident audit trail.
β”‚   └── public_leaderboard.py       Fix success rate leaderboard.
β”‚
β”œβ”€β”€ βš™οΈ  INFRASTRUCTURE
β”‚   β”œβ”€β”€ github_app.py               GitHub App JWT authentication.
β”‚   β”œβ”€β”€ job_queue.py                Job queue with status tracking + metrics.
β”‚   β”œβ”€β”€ worker_pool.py              Parallel audit worker pool.
β”‚   β”œβ”€β”€ notifier.py                 Slack/webhook notification dispatch.
β”‚   └── swebench_harness.py         SWE-bench Verified evaluation harness.
β”‚
β”œβ”€β”€ πŸ“¦  CONFIGURATION
β”‚   β”œβ”€β”€ mcp_config.json             25-server MCP suite configuration (template, no secrets).
β”‚   β”œβ”€β”€ Dockerfile                  Two-stage build: Python 3.12-slim + Node.js for MCP.
β”‚   β”œβ”€β”€ requirements.txt            Python dependencies (31 packages).
β”‚   β”œβ”€β”€ FOUNDER_PLAYBOOK.md         Full technical + investor documentation. (1,119 lines)
β”‚   └── SECURITY_RESEARCH_PLAYBOOK.md  Ethical AVR operator guide.

Security by Design

Principle Implementation
No hardcoded secrets Every credential is loaded from environment variables. The codebase contains zero API keys.
MCP runtime injection mcp_config.json is a template. Secrets are written to /tmp/mcp_runtime.json at startup β€” never committed.
Tamper-evident audit trail audit_logger.py maintains a SHA-256 chain across all log entries. Any modification to historical records is detectable.
Human-gated disclosure bounty_gateway.py enforces approval at the API call level. Removing the UI button does not bypass the gate.
Formal patch verification Z3 proves bounded integer invariants on every AI-generated diff before any merge can occur.
SSRF prevention All MCP fetch tools operate against FETCH_ALLOWED_DOMAINS allowlists. Outbound requests are restricted to explicitly permitted security domains.
Coordinated disclosure 90-day Google Project Zero-standard disclosure timeline tracked per finding in disclosure_vault.py.

Default LLM Models

All default models are on OpenRouter's free tier. This system runs at zero LLM cost out of the box.

Role Default Model Override Variable
Code Fix Generation qwen/qwen-2.5-coder-32b-instruct:free RHODAWK_MODEL
Hermes Orchestrator deepseek/deepseek-r1:free HERMES_MODEL
Hermes Fast Tasks deepseek/deepseek-v3:free HERMES_FAST_MODEL
Adversarial Review #1 deepseek/deepseek-r1:free RHODAWK_ADVERSARY_MODEL
Adversarial Review #2 meta-llama/llama-3.3-70b-instruct:free hardcoded fallback
Adversarial Review #3 google/gemma-3-27b-it:free hardcoded fallback

Every feature in this README is implemented in the files above. No mocks. No stubs. No vaporware. The pipeline runs end-to-end.


HuggingFace Space


Rhodawk AI Β· Autonomous DevSecOps Control Plane v4.0 Β· Proprietary License