|
--- |
|
license: mit |
|
language: |
|
- en |
|
--- |
|
# πͺ LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers |
|
|
|
[](https://shantanu-ai.github.io/projects/ACL-2025-Ladder/index.html) |
|
[](https://aclanthology.org/2025.findings-acl.1177/) |
|
[](https://github.com/batmanlab/Ladder) |
|
[](https://huggingface.co/shawn24/Ladder/tree/main) |
|
|
|
--- |
|
|
|
## π Summary |
|
|
|
**LADDER** is a general framework that enables vision classifiers to automatically discover subpopulations (or "slices") of data where the model is underperforming β without requiring group annotations. It leverages **vision-language representations** and the **reasoning capabilities of large language models (LLMs)** to detect and rectify bias-inducing features in both natural and medical imaging domains. |
|
|
|
--- |
|
|
|
## π§ Architecture & Components |
|
|
|
- π **Slice Discovery** using: |
|
- CLIP, Mammo-CLIP, and CXR-CLIP features |
|
- BLIP and GPT-4o-generated captions |
|
- π§ **Hypothesis Generation** using: |
|
- GPT-4o, Claude, Gemini, LLaMA |
|
- β
**Bias Mitigation** via reweighting & pseudo-labeling |
|
|
|
--- |
|
|
|
## π Datasets Used |
|
|
|
- **Natural Images**: Waterbirds, CelebA, MetaShift |
|
- **Medical Images**: NIH ChestX-ray, RSNA Mammograms, VinDr Mammograms |
|
|
|
--- |
|
|
|
## π¦ Files Included |
|
|
|
| File | Description | |
|
|------|-------------| |
|
| `model.pt` | Pretrained model checkpoint | |
|
| `feature_cache.pkl` | Cached representations (CLIP/Mammo-CLIP/CXR-CLIP) | |
|
| `metadata.csv` | Metadata with discovered slice labels | |
|
| `caption_blip.json` | BLIP-generated captions | |
|
| `caption_gpt4o.json` | GPT-4o-generated captions | |
|
| `predictions.json` | Model predictions on test set | |
|
|
|
--- |
|
|
|
|
|
## π§ͺ Benchmarks |
|
|
|
LADDER outperforms traditional slice discovery methods (Domino, FACTS) across 6 datasets and >200 classifiers. It is especially effective in: |
|
|
|
- Discovering hidden biases without explicit attribute labels |
|
- Reasoning about non-visual factors (e.g., preprocessing artifacts) |
|
- Operating without human-written captions |
|
|
|
--- |
|
|
|
## π Citation |
|
|
|
```bibtex |
|
@article{ghosh2024ladder, |
|
title={LADDER: Language Driven Slice Discovery and Error Rectification}, |
|
author={Ghosh, Shantanu and Syed, Rayan and Wang, Chenyu and Poynton, Clare B and Visweswaran, Shyam and Batmanghelich, Kayhan}, |
|
journal={arXiv preprint arXiv:2408.07832}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
--- |
|
|
|
## π€ Acknowledgements |
|
|
|
Boston University, Stanford University, BUMC, and the University of Pittsburgh. |