File size: 7,129 Bytes
ed1aeb3 e1393d5 ed1aeb3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
# RobustMedCLIP: On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?
> **Accepted at [Medical Image Understanding and Analysis (MIUA) 2025]**
[](LICENSE)
[](https://arxiv.org/abs/2505.15425)
[](https://huggingface.co/datasets/razaimam45/MediMeta-C)
[](https://huggingface.co/razaimam45/RobustMedCLIP)
[](https://github.com/BioMedIA-MBZUAI/RobustMedCLIP)
---
## π Highlights
- π§ **MVLM Benchmarking**: Evaluate 5 major and recent MVLMs across **5 modalities**, **7 corruption types**, and **5 severity levels**
- π **Corruption Evaluation**: Analyze degradation under Gaussian noise, motion blur, pixelation, etc.
- π¬ **MediMeta-C**: A new benchmark simulating real-world OOD shifts in high-res medical images
- π§ͺ **Few-shot Robustness**: **RobustMedCLIP** uses just 1-10% of clean data for adaptation
- π§ **LoRA Efficient Tuning**: Low-rank fine-tuning in transformer attention layers
<p align="center">
<img src="assets/pipeline.png" width="750" alt="Pipeline Overview">
</p>
<p align="center">
Overview of the RobustMedCLIP pipeline: A) Few-shot Sampling of Clean Samples from MediMeta and MedMNIST across 5 modalities; B) Fine-tuning LoRA adapters using Few-shot samples; C) Distribution Shifts of MediMeta-C compared to Clean samples; D) Evaluation Results across Top-1 Accuracy and Corruption Error for 4 baselines and RobustMedCLIP.
</p>
---
## π¦ Installation
```bash
git clone https://github.com/BioMedIA-MBZUAI/RobustMedCLIP.git
cd RobustMedCLIP
conda create -n robustmedclip python=3.12.7
conda activate robustmedclip
pip install -r requirements.txt
pip install hugginface_hub
````
You will also need `<YOUR-HUGGINGFACE-TOKEN>` with your personal Hugging Face access token, to directly download Datasets and Model Weights.\
To create an access token, go to your Huggingface `Settings`, then click on the `Access Tokens` tab. Click on the New token button to create a new User Access Token.
---
## π§ Models
All baseline and RobustMedCLIP model checkpoints are available for direct download via Hugging Face at [RobustMedCLIP](https://huggingface.co/razaimam45/RobustMedCLIP/tree/main):
```bash
huggingface-cli download razaimam45/RobustMedCLIP \
--local-dir ./outputs \
--repo-type model \
--token <YOUR-HUGGINGFACE-TOKEN>
```
π `Outputs` Folder Structure: The `outputs/` folder (should be in root folder) contains all trained model weights and evaluation results:
```bash
outputs/
βββ checkpoints/ # Baseline MVLMs (MedCLIP, UniMedCLIP)
βββ exp-rank-8/ # RobustMedCLIP (LoRA Rank = 8) for ViT and ResNet across few-shots (1/3/7/10)%
βββ exp-rank-16/ # RobustMedCLIP (LoRA Rank = 16) for ViT and ResNet across few-shots (1/3/7/10)%
βββ results/ # Evaluation logs across mCE/Accuracy metrics
```
---
## 𧬠Datasets
This project proposes MediMeta-C as corruption benchmark; and evaluates MVLMs on MedMNIST-C and MediMeta-C benchmarks.
| Dataset | Modality | Clean Samples | Corruption Sets | Resolution |
|----------------|------------------|----------------|------------------|-------------|
| **MediMeta-C** | Multi-modality | 5 Modalities | 7 corruptions Γ 5 levels | High-res |
| **MedMNIST-C** | Public Benchmark | 5 Modalities | 7 corruptions Γ 5 levels | Low-res |
### π Dataset Structure
The MediMeta-C dataset is hosted on HuggingFace and organized as follows:
```bash
MediMeta-C/
βββ pbc/ # Blood Cell modality
β βββ test/ # Test set
β β βββ clean.npz # Clean samples
β β βββ brightness_severity_1.npz
β β βββ brightness_severity_2.npz
β β βββ ... # Other severity levels
β β βββ brightness_severity_5.npz
β βββ val/ # Validation set
β βββ clean.npz
β βββ contrast_severity_1.npz
β βββ contrast_severity_2.npz
β βββ ... # Other severity levels
β βββ contrast_severity_5.npz
βββ fundus/ # Fundus modality
β βββ test/
β βββ val/
β βββ ... # Similar structure as above
βββ ... # Other modalities
βββ README.md # Dataset description
```
You can download the dataset from: [MediMeta-C](https://huggingface.co/datasets/razaimam45/MediMeta-C/tree/main), and [MedMNIST-C](https://github.com/francescodisalvo05/medmnistc-api). The downloaded folder `data/MediMeta-C` should be in the root of the project folder.
```bash
huggingface-cli download razaimam45/MediMeta-C --local-dir ./data/MediMeta-C --repo-type dataset --token <YOUR-HUGGINGFACE-TOKEN>
````
---
## π§ Usage
### 1. Few-Shot Tuning
You can fine-tune RobustMedCLIP with either ViT or ResNet backbones:
```bash
# Fine-tune with ViT backbone (e.g., BioMedCLIP)
bash scripts/run_finetune_vit.sh
# Fine-tune with ResNet backbone (e.g., MedCLIP)
bash scripts/run_finetune_resnet.sh
```
### 2. Evaluation
Evaluate a fine-tuned or pretrained MVLM (including RMedCLIP):
```bash
# Evaluation for RobustMedCLIP (RMC)
bash scripts/run_eval_rmed.sh
# Custom evaluation on other models (rmedclip, biomedclip, unimedclip, medclip, clip)
python evaluate.py --model rmedclip \
--backbone vit \
--gpu 0 --corruptions all --collection medimeta
```
---
## π Results
RobustMedCLIP consistently outperforms prior MVLMs under corruptions across all modalities:
| Model | Clean Error β | mCE β (avg) |
| ------------ | ------------- | ----------- |
| CLIP | 100.0 | 100.0 |
| MedCLIP | 106.4 | 112.5 |
| BioMedCLIP | 116.3 | 126.8 |
| UniMedCLIP | 111.8 | 98.87 |
| **RMedCLIP** | **62.8** | **81.0** |
Detailed benchmarks available in `Results and Discussions`.
---
## βοΈ Citation
If you find this repository helpful, please cite our paper:
```bibtex
@misc{imam2025robustnessmedicalvisionlanguagemodels,
title={On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?},
author={Raza Imam and Rufael Marew and Mohammad Yaqub},
year={2025},
eprint={2505.15425},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.15425},
}
```
---
## π€ Acknowledgements
* Built on top of [BioMedCLIP](https://arxiv.org/abs/2303.00915) and [MedCLIP](https://arxiv.org/abs/2210.10163)
* MediMeta-C corruption designs are inspired by [ImageNet-C](https://arxiv.org/abs/1903.12261) and [MedMNIST-C](https://arxiv.org/abs/2406.17536)
For questions, contact: **[[email protected]](mailto:[email protected])**
|