erjui
/

dho

 ---
 license: apache-2.0
+tags:
+- vision
+- image-classification
+- clip
+- knowledge-distillation
+- semi-supervised-learning
+- imagenet
 datasets:
+- imagenet-1k
+library_name: pytorch
 pipeline_tag: image-classification
+---
+# DHO: Simple Few-shot Semi-supervised Knowledge Distillation
+[![arXiv](https://img.shields.io/badge/arXiv-2505.07675v1-b31b1b.svg)](https://arxiv.org/abs/2505.07675v1)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/simple-semi-supervised-knowledge-distillation/semi-supervised-image-classification-on-1)](https://paperswithcode.com/sota/semi-supervised-image-classification-on-1?p=simple-semi-supervised-knowledge-distillation)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/simple-semi-supervised-knowledge-distillation/semi-supervised-image-classification-on-2)](https://paperswithcode.com/sota/semi-supervised-image-classification-on-2?p=simple-semi-supervised-knowledge-distillation)
+This repository contains pretrained checkpoints for **DHO (Dual-Head Optimization)**, a simple yet effective approach for semi-supervised knowledge distillation from Vision-Language Models.
+## Model Description
+DHO introduces a dual-head optimization strategy that enables efficient knowledge transfer from large Vision-Language Models (e.g., CLIP) to smaller student models using minimal labeled data.
+The method achieves state-of-the-art performance on ImageNet semi-supervised learning benchmarks with only 1% and 10% labeled data.
+**Paper:** [Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization](https://arxiv.org/abs/2505.07675)
+**Authors:** Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Sung Ju Hwang
+## Key Features
+- ✨ **Dual-head optimization** strategy for semi-supervised distillation
+- 🏆 **State-of-the-art** performance on ImageNet with 1% and 10% labeled data
+- 🔄 Efficient transfer from VLMs (e.g., CLIP) to smaller student models
+- 🧩 Simple, scalable, and easy to integrate into existing pipelines
+## Available Checkpoints
+| Checkpoint Name | Student Model | Teacher Model | Labeled Data | Top-1 Acc. | Parameters |
+|:----------------|:--------------|:--------------|:-------------|:-----------|:-----------|
+| `vit_b_1.pt` | ViT-B/16 | ViT-H/14 (DFN5B) | 1% | 81.6% | 86M |
+| `vit_b_10.pt` | ViT-B/16 | ViT-H/14 (DFN5B) | 10% | 82.8% | 86M |
+| `vit_l_1.pt` | ViT-L/14 | ViT-H/14 (DFN5B) | 1% | 84.6% | 304M |
+| `vit_l_10.pt` | ViT-L/14 | ViT-H/14 (DFN5B) | 10% | 85.9% | 304M |
+## Usage
+### Loading a Checkpoint
+```python
+import torch
+import clip
+# Load the student model architecture
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# For ViT-B/16 checkpoints
+model, preprocess = clip.load("ViT-B-16", device=device)
+# Load DHO checkpoint
+checkpoint = torch.hub.load_state_dict_from_url(
+    "https://huggingface.co/erjui/dho/resolve/main/vit_b_10.pt",
+    map_location=device
+)
+# Load the state dict
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval()
+# Use the model for inference
+from PIL import Image
+image = preprocess(Image.open("path/to/image.jpg")).unsqueeze(0).to(device)
+with torch.no_grad():
+    image_features = model.encode_image(image)
+    # ... your inference code
+```
+### Training Your Own Model
+To train your own DHO model, please visit the [official GitHub repository](https://github.com/yourusername/DHO) for detailed instructions and training scripts.
+**Example training command:**
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 --master_port=29500 train_imgnet_semi.py \
+    --teacher_model "apple/DFN5B-CLIP-ViT-H-14-378" \
+    --student_model "ViT-B-16" \
+    --lr 5e-5 \
+    --train_epoch 32 \
+    --batch_size 256 \
+    --percent 10.0 \
+    | tee ./logs/imagenet/imgnet_lowshot.log
+```
+## Model Architecture
+The DHO student model consists of:
+- **Backbone:** CLIP Vision Transformer (ViT-B/16 or ViT-L/14)
+- **Two parallel heads:**
+  - **CE Head:** Optimized with cross-entropy loss on labeled data
+  - **KD Head:** Optimized with knowledge distillation loss from teacher predictions
+During inference, predictions from both heads are combined using learned weighting parameters (alpha, beta).
+## Performance
+### ImageNet Semi-supervised Learning
+| Student | Teacher | Labeled Data | Top-1 Accuracy |
+|:--------|:--------|:-------------|:---------------|
+| ViT-B/16 | ViT-H/14 | 1% | **81.6%** |
+| ViT-B/16 | ViT-H/14 | 10% | **82.8%** |
+| ViT-L/14 | ViT-H/14 | 1% | **84.6%** |
+| ViT-L/14 | ViT-H/14 | 10% | **85.9%** |
+These results establish new state-of-the-art benchmarks for semi-supervised learning on ImageNet-1K.
+## Citation
+If you use these models in your research, please cite:
+```bibtex
+@article{kang2025simple,
+  title={Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization},
+  author={Kang, Seongjae and Lee, Dong Bok and Jang, Hyungjoon and Hwang, Sung Ju},
+  journal={arXiv preprint arXiv:2505.07675},
+  year={2025}
+}
+```
+## License
+This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
+## Acknowledgments
+We appreciate the open-source implementations from:
+- [Tip-Adapter](https://github.com/gaopengcuhk/Tip-Adapter)
+- [CLIP](https://github.com/openai/CLIP)
+- [OpenCLIP](https://github.com/mlfoundations/open_clip)
+## Contact
+For questions or issues, please open an issue on the [GitHub repository](https://github.com/yourusername/DHO) or contact the authors.