File size: 6,338 Bytes
063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 063bf99 cdb68f1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
---
license: mit
base_model: emilyalsentzer/Bio_ClinicalBERT
tags:
- medical
- healthcare
- clinical-notes
- medical-coding
- few-shot-learning
- prototypical-networks
- deployment-ready
- self-contained
language:
- en
metrics:
- accuracy
library_name: transformers
pipeline_tag: text-classification
widget:
- text: "Patient presents with chest pain and shortness of breath. ECG shows abnormalities."
---
# MediCoder AI v4 Complete π₯β¨
## Model Description
**MediCoder AI v4 Complete** is a fully self-contained medical coding system with **57,768 embedded prototypes** that predicts ICD/medical codes from clinical notes with **46.3% Top-1 accuracy**. This model requires **no external dataset** for inference.
## π― Performance
- **Top-1 Accuracy**: 46.3%
- **Top-5 Accuracy**: ~54% (estimated)
- **Medical Codes**: 57,768 supported codes
- **Prototypes**: 57,768 embedded prototype vectors
- **Deployment**: Fully self-contained
## β¨ What's New in Complete Version
- β
**57,768 Prototypes Embedded**: All medical codes have learned representations
- β
**No Dataset Required**: Completely self-contained for deployment
- β
**Production Ready**: Direct inference without external dependencies
- β
**Full 46.3% Accuracy**: Complete performance preservation
- β
**Memory Optimized**: Efficient prototype storage and retrieval
## ποΈ Architecture
- **Base Model**: Bio_ClinicalBERT (specialized for medical text)
- **Approach**: Few-shot Prototypical Networks with Embedded Prototypes
- **Embedding Dimension**: 768
- **Prototype Storage**: 57,768 Γ 768 learned medical code representations
- **Optimization**: Conservative incremental improvements (Phase 2)
## π Quick Start
```python
import torch
from transformers import AutoTokenizer
# Load the complete model
tokenizer = AutoTokenizer.from_pretrained("sshan95/medicoder-ai-v4-model")
# Load model with embedded prototypes
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
prototypes = checkpoint['prototypes'] # Shape: [57768, 768]
prototype_codes = checkpoint['prototype_codes'] # Shape: [57768]
print(f"Loaded {prototypes.shape[0]:,} medical code prototypes!")
```
## π Usage Example
```python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer
# Initialize
tokenizer = AutoTokenizer.from_pretrained("sshan95/medicoder-ai-v4-model")
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
# Load model architecture (your ConservativePrototypicalNetwork)
model = load_your_model_architecture()
model.load_state_dict(checkpoint['model_state_dict'])
# Load embedded prototypes
prototypes = checkpoint['prototypes']
prototype_codes = checkpoint['prototype_codes']
# Example prediction
clinical_note = "Patient presents with acute chest pain, diaphoresis, and dyspnea..."
# Tokenize
inputs = tokenizer(clinical_note, return_tensors="pt", truncation=True, max_length=512)
# Get embedding
with torch.no_grad():
query_embedding = model.encode_text(inputs['input_ids'], inputs['attention_mask'])
# Compute similarities to all prototypes
similarities = torch.mm(query_embedding, prototypes.t())
# Get top-5 predictions
top_5_scores, top_5_indices = torch.topk(similarities, k=5)
predicted_codes = prototype_codes[top_5_indices[0]]
print("Top 5 predicted medical codes:", predicted_codes.tolist())
```
## π Model Contents
When you load this model, you get:
```python
checkpoint = torch.load("pytorch_model.bin")
# Available keys:
checkpoint['model_state_dict'] # Neural network weights
checkpoint['prototypes'] # [57768, 768] prototype embeddings
checkpoint['prototype_codes'] # [57768] medical code mappings
checkpoint['accuracies'] # Performance metrics
checkpoint['config'] # Training configuration
```
## π― Key Features
### β
**Self-Contained Deployment**
- No external dataset required
- All medical knowledge embedded in prototypes
- Direct inference capability
### β
**Production Ready**
- Optimized for CPU and GPU inference
- Memory-efficient prototype storage
- Stable, tested architecture
### β
**Full Performance**
- Complete 46.3% Top-1 accuracy preserved
- All 57,768 medical codes supported
- Conservative optimization approach
## π Training Details
- **Base Model**: Bio_ClinicalBERT
- **Training Data**: Clinical notes with medical code annotations
- **Approach**: Few-shot prototypical learning
- **Optimization**: Conservative incremental improvements
- **Phase 1**: Enhanced embeddings (+5.7pp)
- **Phase 2**: Ensemble prototypes (+1.1pp)
- **Final Step**: Prototype extraction and embedding
## π Deployment Options
### **Option 1: Hugging Face Spaces**
Perfect for demos and testing with built-in UI.
### **Option 2: Local Deployment**
Download and run locally for production use.
### **Option 3: API Integration**
Integrate into existing healthcare systems.
## β οΈ Usage Guidelines
- **Purpose**: Research and educational use, medical coding assistance
- **Validation**: Always require human expert validation
- **Scope**: English clinical text, general medical domains
- **Limitations**: Performance varies by medical specialty
## π Real-world Impact
This model helps by:
- **Reducing coding time**: Hours β Minutes
- **Improving consistency**: Standardized predictions
- **Narrowing choices**: 57,768 codes β Top suggestions
- **Supporting workflow**: Integration-ready format
## π¬ Technical Specifications
- **Model Size**: ~1.2 GB (with prototypes)
- **Inference Speed**: 3-8 seconds (CPU), <1 second (GPU)
- **Memory Usage**: ~3-4 GB during inference
- **Dependencies**: PyTorch, Transformers, NumPy
## π Citation
```bibtex
@misc{medicoder-ai-v4-complete,
title={MediCoder AI v4 Complete: Self-Contained Medical Coding with Embedded Prototypes},
author={MediCoder Team},
year={2025},
url={https://huggingface.co/sshan95/medicoder-ai-v4-model},
note={57,768 embedded prototypes, 46.3% Top-1 accuracy}
}
```
## π₯ Community
Built for the medical coding community. For questions, issues, or collaborations, please use the repository discussions.
---
**π Ready for production medical coding assistance!**
*This complete model contains all necessary components for deployment without external dependencies.*
|