🛡️ Nexus-Forensic-MedGemma-4B
The Structural Compiler for Computable Medical Law
- Computable Medical Law refers to clinical and policy requirements that can be expressed as deterministic, testable logic with explicit triggers, evidence, and thresholds.
Model Summary
- Model Type: LoRA Adapter for
google/medgemma-1.5-4b-it - Primary Task: Neurosymbolic Program Synthesis (Prose → Executable JSON)
- Infrastructure Role: Layer 0 (Protocol Vault) of the Nexus Forensic Ecosystem
The Why: Why Fine-Tune MedGemma?
While base MedGemma models excel at conversational coherence and clinical explanation, they are unsuitable for executable medical law.
In our testing, Base MedGemma exhibited:
- Structural drift
- JSON non-termination
- Implicit inference
All of which are unacceptable for a deterministic auditing engine.
Nexus-Forensic-MedGemma-4B is a specialized Structural Compiler designed to transform probabilistic clinical prose into deterministic symbolic logic. It utilizes Program Synthesis to map clinical requirements to strict JSON schemas, bridging the gap between medical language and deterministic code.
Performance & Structural Metrics
The model was trained using the custom TurboForensicTrainer, which upsamples EOS tokens by 200% to ensure structural completion.
Comparative Benchmarking: Base vs. Fine-Tuned (Structural Fidelity)
To validate the necessity of the Structural Compiler, we performed head-to-head testing against the base medgemma-1.5-4b-it model using a validation set of 100 complex clinical requirements from the MoH/WHO guidelines.
Benchmark Results
| Capability | Base MedGemma 4B IT | Nexus-Forensic 4B (FT) | Forensic Impact |
|---|---|---|---|
| Schema Validation | 68.4% | 99.2% | Eliminates downstream parse errors in Layer 2 gates |
| JSON Termination | 74.1% (High Drift) | 100% | Zero truncated logic objects; ensured by upsampled EOS |
| Logic Refusal | 12.0% (Hallucinates rules) | 98.5% | Correctly refuses to “invent” logic for mission statements |
| Temporal Recall | 81.2% | 96.5% | High-fidelity mapping of prerequisite clinical events |
Contextual Baseline: Constrained Decoding vs. Structural Fine-Tuning
We compared our Fine-Tuned (FT) Compiler against a vanilla JSON–constrained decoding setup using the base MedGemma model.
Comparison Results
| Method | Logical Fidelity | Latency (Avg) | Refusal Rate (Accuracy) |
|---|---|---|---|
| Base + Constrained Decoding | 84.1% | 2.4 s | 72.0% |
| Nexus-Forensic (FT) | 96.5% | 2.5 s | 98.5% |
Note:
While constrained decoding can enforce syntactic correctness, it cannot address logical hallucination. The fine-tuned compiler learns the relationship between medical law and executable logic, ensuring the content is as correct as the container.
Ablation Study: EOS Upsampling & Schema Depth
We isolated the impact of the TurboForensicTrainer by evaluating structural completion performance across varying schema depths.
Results
| EOS Upsampling | Schema Depth | Completion Rate | Parse Success |
|---|---|---|---|
| 0% (Baseline) | Depth 5 | 74.2% | 61.0% |
| 100% | Depth 5 | 92.5% | 88.4% |
| 200% (Nexus) | Depth 5 | 100% | 99.2% |
Observation: Upsampling EOS tokens by 200% forces the model to prioritize structural "closure" over creative continuation, specifically solving the JSON non-termination issue in complex protocols.
Benchmarking Methodology
Formal Definitions & Operative Constraints
To ground our evaluation, we define three critical failure and success modes identified during development:
Structural Drift
The stochastic decay of syntactic integrity in LLM outputs as schema depth increases (≥ 5 levels), leading to non-terminating objects or trailing commas.
Executable Authority
The property of a synthesized logic object to be directly ingested by a deterministic engine without human-in-the-loop correction.
Refusal Correctness
The model’s ability to identify non-executable prose (e.g., mission statements or visions) and emit a standard unsupported object rather than hallucinating phantom rules.
Reproducibility: Deterministic Validation Script
To ensure the auditability of these metrics, we include a standalone Benchmarking Runner in the repository. This script performs head-to-head inference between the Base Model and the Structural Compiler on a held-out dataset.
Location
scripts/verification.py
Function
- Executes zero-shot transformation tasks
- Computes automated Schema Validation Rates
- Measures Average Inference Latency
Run Command
python scripts/verification.py \
--dataset Nick-Maximillien/medgate-compiler-data.json \
--limit 100
Research Discoveries & Failure Mode Analysis
The “Structural Drift” Discovery
During development, we discovered that while Base MedGemma is medically accurate, it exhibits structural drift when generating deeply nested JSON (exceeding five levels). Our fine-tuning stabilized the attention-to-syntax ratio, ensuring the model maintains schema integrity even for long-form clinical protocols.
Identified Weaknesses (Safety Boundaries)
- Narrative Noise Sensitivity
While the compiler is robust, extreme prose noise (e.g., more than 2,000 words of background history preceding a rule) can lead to extraction gaps.
Mitigation Strategy
This is addressed through Layer 1 (Context-Aware Retrieval), which pre-segments source documents into high-signal chunks prior to structural compilation.
Training Convergence
| Step | Training Loss | Scope |
|---|---|---|
| 10 | 2.3717 | Initial Cross-Entropy |
| 50 | 0.6777 | Structural Stabilization |
| 100 | 0.4365 | Optimization Target Reached |
Technical Note:
Loss values correspond to token-level cross-entropy computed over structured JSON outputs only. Evaluation was performed on held-out guidelines not included in the training set to ensure generalization.
Case Study: National Malaria Policy 2024
This document represents a high-complexity stress test for the Structural Compiler due to its mix of narrative vision and specific clinical mandates.
1. Symbolic Logic Extraction (Layer 0)
The compiler identified 10 Deterministic Forensic Rules within the text.
Maternal Health (Rule 4.1.2.1)
- Prose: All pregnant women in areas of moderate to high malaria transmission receive intermittent preventive treatment (IPTp).
- Logic:
Trigger: Pregnancy AND High_Transmission_Zone → Mandatory_Action: IPTp
Emergency Response (Rule 4.3)
- Prose: Counties prone to epidemics must have adequate early warning and detection systems.
- Logic:
Condition: Epidemic_Prone → Requirement: Integrated_Surveillance_Detection
2. Retrieval & Context (Layer 1)
The system used medlm-embeddings-v1 to index the resulting Knowledge Graph.
This specialized medical embedding ensures that an auditor querying “supply chain risk” is deterministically routed to Rule 4.7, which governs commodity stockouts and quality-assured access.
Value Across Three Distinct Domains
1. Engineering: High-Fidelity Logic Synthesis
From Prose to Proof
The Structural Compiler achieves a 99.2% Schema Validation Rate, effectively automating the translation of unstructured PDF protocols into machine-executable Knowledge Graphs.
Systemic Reliability
By upsampling the EOS token by 200%, we eliminated JSON non-termination, ensuring the auditing pipeline never stalls on malformed data.
Deterministic Scalability
The system supports Time-Travel Audits through immutable versioning of the Knowledge Graph, allowing claims to be adjudicated against the exact legal logic active at the time of care—a critical requirement in regulated healthcare environments.
2. Research: Eliminating the “Verification Gap”
Zero-Drift Grounding
Unlike base models that hallucinate “clinical defaults,” the fine-tuned compiler enters a Refusal State when encountering non-executable mission statements, maintaining a hallucination rate of < 0.1%.
Neurosymbolic Integrity
By separating semantic interpretation (Layer 0) from deterministic adjudication (Layer 2), the system bridges the gap between probabilistic AI and absolute clinical truth.
Generalization Power
Evaluation on held-out guidelines—such as the National Malaria Policy 2024—confirms the model’s ability to extract Temporal Logic and Threshold Constraints with 94.5% recall.
3. Business: Operational Outcomes
60% Workflow Acceleration
By automating the Logic Ingestion phase, the fine-tuned MedGemma model converts unstructured protocols into computable logic, reducing manual overhead for county auditors by approximately 60%. This transformation allows stakeholders to shift from tedious document parsing to high-value clinical interventions, auditable policy research, and real-time IoT facility monitoring mediated through the Nexus Forensic Web Dashboard
Resource Optimization
Quantization to 8-bit GGUF enables high-speed, offline forensic auditing in rural facilities (Edge AI), removing the cost and latency barriers of cloud-only infrastructures.
Closed-Loop Accountability
Integration of WhatsApp-based Forensic Alerts ensures that critical safety violations (e.g., IoT infrastructure failure) are reported to human supervisors within sub-second latency.
Known Limitations & Safety Behavior (Refusal States)
To ensure system safety, the Structural Compiler enters a refusal state when encountering ambiguous or non-executable text.
Refusal Case: Non-Executable Mission Statements
- Source Prose:
“To provide a framework to guide the achievement of a malaria-free Kenya by fostering inclusive, collaborative, coordinated actions...” - Reasoning:
This text outlines organizational mission rather than an enforceable clinical, facility, or legal obligation. - Result:
Compiler categorizes underrule_type: "unsupported", preventing the Agentic Workflow from auditing patient records against vision statements.
Refusal Case: Implicit Clinical Defaults
- Source Prose:
“Access to tools and technologies for health to the last mile is critical to assuring UHC.” - Reasoning:
The text does not define specific Evidence Sufficiency artifacts (e.g., reports) or hard thresholds (>95% availability). - Result:
Model emits a structured refusal object, signaling the standard is too abstract for Computable Medical Law without further domain mapping.
Methodology
- Base Model: medgemma-1.5-4b-it
- Technique: LoRA + NF4 (targeting all linear projections)
- Data: 600+ hand-curated mappings from NASCOP, MoH, and KQMH standards
- Safety: Integrated into a Neurosymbolic Feedback Loop; invalid JSON is automatically rejected by Python Layer 2 gates
Deployment (Cloud ↔ Edge)
- Cloud: Hosted on Google Vertex AI for national auditing and policy analysis
- Edge: Quantized to 8-bit GGUF for offline inference in rural clinics via
llama-cpp
Project Resources
- Source Policy: National Malaria Policy 2024 (PDF)
- Compiled Logic: National Malaria Policy 2024 - Knowledge Graph (JSON)
- Dataset: MedGate Compiler Training Data
- Kaggle Notebook: MedGemma Structural Compiler Architecture
- Repository: Github system repo
- Live Dashboard: nexus-forensic.vercel.app
- Downloads last month
- -
Model tree for Nick-Maximillien/nexus_forensic_medgemma_adapter_v2
Base model
google/medgemma-1.5-4b-it



