Ladder / README.md
shawn24's picture
Update README.md
93b794d verified
---
license: mit
language:
- en
---
# πŸͺœ LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers
[![Project](https://img.shields.io/badge/Project-%23dfb317)](https://shantanu-ai.github.io/projects/ACL-2025-Ladder/index.html)
[![Paper](https://img.shields.io/badge/Paper-ACL%202025-%23dfb317)](https://aclanthology.org/2025.findings-acl.1177/)
[![Code](https://img.shields.io/badge/GitHub-batmanlab%2FLADDER-%2312100e)](https://github.com/batmanlab/Ladder)
[![Model](https://img.shields.io/badge/HuggingFace-Pretrained--Checkpoints-blue)](https://huggingface.co/shawn24/Ladder/tree/main)
---
## πŸ“Œ Summary
**LADDER** is a general framework that enables vision classifiers to automatically discover subpopulations (or "slices") of data where the model is underperforming β€” without requiring group annotations. It leverages **vision-language representations** and the **reasoning capabilities of large language models (LLMs)** to detect and rectify bias-inducing features in both natural and medical imaging domains.
---
## 🧠 Architecture & Components
- πŸ” **Slice Discovery** using:
- CLIP, Mammo-CLIP, and CXR-CLIP features
- BLIP and GPT-4o-generated captions
- 🧠 **Hypothesis Generation** using:
- GPT-4o, Claude, Gemini, LLaMA
- βœ… **Bias Mitigation** via reweighting & pseudo-labeling
---
## πŸ“Š Datasets Used
- **Natural Images**: Waterbirds, CelebA, MetaShift
- **Medical Images**: NIH ChestX-ray, RSNA Mammograms, VinDr Mammograms
---
## πŸ“¦ Files Included
| File | Description |
|------|-------------|
| `model.pt` | Pretrained model checkpoint |
| `feature_cache.pkl` | Cached representations (CLIP/Mammo-CLIP/CXR-CLIP) |
| `metadata.csv` | Metadata with discovered slice labels |
| `caption_blip.json` | BLIP-generated captions |
| `caption_gpt4o.json` | GPT-4o-generated captions |
| `predictions.json` | Model predictions on test set |
---
## πŸ§ͺ Benchmarks
LADDER outperforms traditional slice discovery methods (Domino, FACTS) across 6 datasets and >200 classifiers. It is especially effective in:
- Discovering hidden biases without explicit attribute labels
- Reasoning about non-visual factors (e.g., preprocessing artifacts)
- Operating without human-written captions
---
## πŸ“œ Citation
```bibtex
@article{ghosh2024ladder,
title={LADDER: Language Driven Slice Discovery and Error Rectification},
author={Ghosh, Shantanu and Syed, Rayan and Wang, Chenyu and Poynton, Clare B and Visweswaran, Shyam and Batmanghelich, Kayhan},
journal={arXiv preprint arXiv:2408.07832},
year={2024}
}
```
---
## 🀝 Acknowledgements
Boston University, Stanford University, BUMC, and the University of Pittsburgh.