🛡️ Nexus-Forensic-MedGemma-4B

Project Banner or Logo

The Structural Compiler for Computable Medical Law

  • Computable Medical Law refers to clinical and policy requirements that can be expressed as deterministic, testable logic with explicit triggers, evidence, and thresholds.

Model Summary

  • Model Type: LoRA Adapter for google/medgemma-1.5-4b-it
  • Primary Task: Neurosymbolic Program Synthesis (Prose → Executable JSON)
  • Infrastructure Role: Layer 0 (Protocol Vault) of the Nexus Forensic Ecosystem

The Why: Why Fine-Tune MedGemma?

While base MedGemma models excel at conversational coherence and clinical explanation, they are unsuitable for executable medical law.

In our testing, Base MedGemma exhibited:

  • Structural drift
  • JSON non-termination
  • Implicit inference

All of which are unacceptable for a deterministic auditing engine.

Nexus-Forensic-MedGemma-4B is a specialized Structural Compiler designed to transform probabilistic clinical prose into deterministic symbolic logic. It utilizes Program Synthesis to map clinical requirements to strict JSON schemas, bridging the gap between medical language and deterministic code.


Performance & Structural Metrics

The model was trained using the custom TurboForensicTrainer, which upsamples EOS tokens by 200% to ensure structural completion.

Comparative Benchmarking: Base vs. Fine-Tuned (Structural Fidelity)

To validate the necessity of the Structural Compiler, we performed head-to-head testing against the base medgemma-1.5-4b-it model using a validation set of 100 complex clinical requirements from the MoH/WHO guidelines.

Benchmark Results

Capability Base MedGemma 4B IT Nexus-Forensic 4B (FT) Forensic Impact
Schema Validation 68.4% 99.2% Eliminates downstream parse errors in Layer 2 gates
JSON Termination 74.1% (High Drift) 100% Zero truncated logic objects; ensured by upsampled EOS
Logic Refusal 12.0% (Hallucinates rules) 98.5% Correctly refuses to “invent” logic for mission statements
Temporal Recall 81.2% 96.5% High-fidelity mapping of prerequisite clinical events

Contextual Baseline: Constrained Decoding vs. Structural Fine-Tuning

We compared our Fine-Tuned (FT) Compiler against a vanilla JSON–constrained decoding setup using the base MedGemma model.

Comparison Results

Method Logical Fidelity Latency (Avg) Refusal Rate (Accuracy)
Base + Constrained Decoding 84.1% 2.4 s 72.0%
Nexus-Forensic (FT) 96.5% 2.5 s 98.5%

Note:
While constrained decoding can enforce syntactic correctness, it cannot address logical hallucination. The fine-tuned compiler learns the relationship between medical law and executable logic, ensuring the content is as correct as the container.


Ablation Study: EOS Upsampling & Schema Depth

We isolated the impact of the TurboForensicTrainer by evaluating structural completion performance across varying schema depths.

Results

EOS Upsampling Schema Depth Completion Rate Parse Success
0% (Baseline) Depth 5 74.2% 61.0%
100% Depth 5 92.5% 88.4%
200% (Nexus) Depth 5 100% 99.2%

Observation: Upsampling EOS tokens by 200% forces the model to prioritize structural "closure" over creative continuation, specifically solving the JSON non-termination issue in complex protocols.


Benchmarking Methodology

Formal Definitions & Operative Constraints

To ground our evaluation, we define three critical failure and success modes identified during development:

Structural Drift

The stochastic decay of syntactic integrity in LLM outputs as schema depth increases (≥ 5 levels), leading to non-terminating objects or trailing commas.

Executable Authority

The property of a synthesized logic object to be directly ingested by a deterministic engine without human-in-the-loop correction.

Refusal Correctness

The model’s ability to identify non-executable prose (e.g., mission statements or visions) and emit a standard unsupported object rather than hallucinating phantom rules.


Reproducibility: Deterministic Validation Script

To ensure the auditability of these metrics, we include a standalone Benchmarking Runner in the repository. This script performs head-to-head inference between the Base Model and the Structural Compiler on a held-out dataset.

Location

scripts/verification.py

Function

  • Executes zero-shot transformation tasks
  • Computes automated Schema Validation Rates
  • Measures Average Inference Latency

Run Command

python scripts/verification.py \
  --dataset Nick-Maximillien/medgate-compiler-data.json \
  --limit 100

Research Discoveries & Failure Mode Analysis

The “Structural Drift” Discovery

During development, we discovered that while Base MedGemma is medically accurate, it exhibits structural drift when generating deeply nested JSON (exceeding five levels). Our fine-tuning stabilized the attention-to-syntax ratio, ensuring the model maintains schema integrity even for long-form clinical protocols.


Identified Weaknesses (Safety Boundaries)

  • Narrative Noise Sensitivity
    While the compiler is robust, extreme prose noise (e.g., more than 2,000 words of background history preceding a rule) can lead to extraction gaps.

Mitigation Strategy

This is addressed through Layer 1 (Context-Aware Retrieval), which pre-segments source documents into high-signal chunks prior to structural compilation.

Training Convergence

Step Training Loss Scope
10 2.3717 Initial Cross-Entropy
50 0.6777 Structural Stabilization
100 0.4365 Optimization Target Reached

Training Loss table Training Loss Curve

Technical Note:
Loss values correspond to token-level cross-entropy computed over structured JSON outputs only. Evaluation was performed on held-out guidelines not included in the training set to ensure generalization.


Case Study: National Malaria Policy 2024

This document represents a high-complexity stress test for the Structural Compiler due to its mix of narrative vision and specific clinical mandates.

Structural Compiler: Success Case

1. Symbolic Logic Extraction (Layer 0)

The compiler identified 10 Deterministic Forensic Rules within the text.

Maternal Health (Rule 4.1.2.1)

  • Prose: All pregnant women in areas of moderate to high malaria transmission receive intermittent preventive treatment (IPTp).
  • Logic:
    Trigger: Pregnancy AND High_Transmission_Zone → Mandatory_Action: IPTp

Emergency Response (Rule 4.3)

  • Prose: Counties prone to epidemics must have adequate early warning and detection systems.
  • Logic:
    Condition: Epidemic_Prone → Requirement: Integrated_Surveillance_Detection

2. Retrieval & Context (Layer 1)

The system used medlm-embeddings-v1 to index the resulting Knowledge Graph.

This specialized medical embedding ensures that an auditor querying “supply chain risk” is deterministically routed to Rule 4.7, which governs commodity stockouts and quality-assured access.


Value Across Three Distinct Domains


1. Engineering: High-Fidelity Logic Synthesis

From Prose to Proof
The Structural Compiler achieves a 99.2% Schema Validation Rate, effectively automating the translation of unstructured PDF protocols into machine-executable Knowledge Graphs.

Systemic Reliability
By upsampling the EOS token by 200%, we eliminated JSON non-termination, ensuring the auditing pipeline never stalls on malformed data.

Deterministic Scalability
The system supports Time-Travel Audits through immutable versioning of the Knowledge Graph, allowing claims to be adjudicated against the exact legal logic active at the time of care—a critical requirement in regulated healthcare environments.


2. Research: Eliminating the “Verification Gap”

Zero-Drift Grounding
Unlike base models that hallucinate “clinical defaults,” the fine-tuned compiler enters a Refusal State when encountering non-executable mission statements, maintaining a hallucination rate of < 0.1%.

Neurosymbolic Integrity
By separating semantic interpretation (Layer 0) from deterministic adjudication (Layer 2), the system bridges the gap between probabilistic AI and absolute clinical truth.

Generalization Power
Evaluation on held-out guidelines—such as the National Malaria Policy 2024—confirms the model’s ability to extract Temporal Logic and Threshold Constraints with 94.5% recall.


3. Business: Operational Outcomes

60% Workflow Acceleration
By automating the Logic Ingestion phase, the fine-tuned MedGemma model converts unstructured protocols into computable logic, reducing manual overhead for county auditors by approximately 60%. This transformation allows stakeholders to shift from tedious document parsing to high-value clinical interventions, auditable policy research, and real-time IoT facility monitoring mediated through the Nexus Forensic Web Dashboard

Resource Optimization
Quantization to 8-bit GGUF enables high-speed, offline forensic auditing in rural facilities (Edge AI), removing the cost and latency barriers of cloud-only infrastructures.

Closed-Loop Accountability
Integration of WhatsApp-based Forensic Alerts ensures that critical safety violations (e.g., IoT infrastructure failure) are reported to human supervisors within sub-second latency.


Known Limitations & Safety Behavior (Refusal States)

Structural Compiler: Refusal Case

To ensure system safety, the Structural Compiler enters a refusal state when encountering ambiguous or non-executable text.

Refusal Case: Non-Executable Mission Statements

  • Source Prose:
    “To provide a framework to guide the achievement of a malaria-free Kenya by fostering inclusive, collaborative, coordinated actions...”
  • Reasoning:
    This text outlines organizational mission rather than an enforceable clinical, facility, or legal obligation.
  • Result:
    Compiler categorizes under rule_type: "unsupported", preventing the Agentic Workflow from auditing patient records against vision statements.

Refusal Case: Implicit Clinical Defaults

  • Source Prose:
    “Access to tools and technologies for health to the last mile is critical to assuring UHC.”
  • Reasoning:
    The text does not define specific Evidence Sufficiency artifacts (e.g., reports) or hard thresholds (>95% availability).
  • Result:
    Model emits a structured refusal object, signaling the standard is too abstract for Computable Medical Law without further domain mapping.

Methodology

  • Base Model: medgemma-1.5-4b-it
  • Technique: LoRA + NF4 (targeting all linear projections)
  • Data: 600+ hand-curated mappings from NASCOP, MoH, and KQMH standards
  • Safety: Integrated into a Neurosymbolic Feedback Loop; invalid JSON is automatically rejected by Python Layer 2 gates

Deployment (Cloud ↔ Edge)

  • Cloud: Hosted on Google Vertex AI for national auditing and policy analysis
  • Edge: Quantized to 8-bit GGUF for offline inference in rural clinics via llama-cpp

Project Resources


Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nick-Maximillien/nexus_forensic_medgemma_adapter_v2

Adapter
(36)
this model

Dataset used to train Nick-Maximillien/nexus_forensic_medgemma_adapter_v2