erjui commited on
Commit
4e5de1c
·
verified ·
1 Parent(s): bba8cd2

Update README with model card

Browse files
Files changed (1) hide show
  1. README.md +144 -2
README.md CHANGED
@@ -1,6 +1,148 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
3
  datasets:
4
- - ILSVRC/imagenet-1k
 
5
  pipeline_tag: image-classification
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - vision
5
+ - image-classification
6
+ - clip
7
+ - knowledge-distillation
8
+ - semi-supervised-learning
9
+ - imagenet
10
  datasets:
11
+ - imagenet-1k
12
+ library_name: pytorch
13
  pipeline_tag: image-classification
14
+ ---
15
+
16
+ # DHO: Simple Few-shot Semi-supervised Knowledge Distillation
17
+
18
+ [![arXiv](https://img.shields.io/badge/arXiv-2505.07675v1-b31b1b.svg)](https://arxiv.org/abs/2505.07675v1)
19
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/simple-semi-supervised-knowledge-distillation/semi-supervised-image-classification-on-1)](https://paperswithcode.com/sota/semi-supervised-image-classification-on-1?p=simple-semi-supervised-knowledge-distillation)
20
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/simple-semi-supervised-knowledge-distillation/semi-supervised-image-classification-on-2)](https://paperswithcode.com/sota/semi-supervised-image-classification-on-2?p=simple-semi-supervised-knowledge-distillation)
21
+
22
+ This repository contains pretrained checkpoints for **DHO (Dual-Head Optimization)**, a simple yet effective approach for semi-supervised knowledge distillation from Vision-Language Models.
23
+
24
+ ## Model Description
25
+
26
+ DHO introduces a dual-head optimization strategy that enables efficient knowledge transfer from large Vision-Language Models (e.g., CLIP) to smaller student models using minimal labeled data.
27
+ The method achieves state-of-the-art performance on ImageNet semi-supervised learning benchmarks with only 1% and 10% labeled data.
28
+
29
+ **Paper:** [Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization](https://arxiv.org/abs/2505.07675)
30
+
31
+ **Authors:** Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Sung Ju Hwang
32
+
33
+ ## Key Features
34
+
35
+ - ✨ **Dual-head optimization** strategy for semi-supervised distillation
36
+ - 🏆 **State-of-the-art** performance on ImageNet with 1% and 10% labeled data
37
+ - 🔄 Efficient transfer from VLMs (e.g., CLIP) to smaller student models
38
+ - 🧩 Simple, scalable, and easy to integrate into existing pipelines
39
+
40
+ ## Available Checkpoints
41
+
42
+ | Checkpoint Name | Student Model | Teacher Model | Labeled Data | Top-1 Acc. | Parameters |
43
+ |:----------------|:--------------|:--------------|:-------------|:-----------|:-----------|
44
+ | `vit_b_1.pt` | ViT-B/16 | ViT-H/14 (DFN5B) | 1% | 81.6% | 86M |
45
+ | `vit_b_10.pt` | ViT-B/16 | ViT-H/14 (DFN5B) | 10% | 82.8% | 86M |
46
+ | `vit_l_1.pt` | ViT-L/14 | ViT-H/14 (DFN5B) | 1% | 84.6% | 304M |
47
+ | `vit_l_10.pt` | ViT-L/14 | ViT-H/14 (DFN5B) | 10% | 85.9% | 304M |
48
+
49
+ ## Usage
50
+
51
+ ### Loading a Checkpoint
52
+
53
+ ```python
54
+ import torch
55
+ import clip
56
+
57
+ # Load the student model architecture
58
+ device = "cuda" if torch.cuda.is_available() else "cpu"
59
+
60
+ # For ViT-B/16 checkpoints
61
+ model, preprocess = clip.load("ViT-B-16", device=device)
62
+
63
+ # Load DHO checkpoint
64
+ checkpoint = torch.hub.load_state_dict_from_url(
65
+ "https://huggingface.co/erjui/dho/resolve/main/vit_b_10.pt",
66
+ map_location=device
67
+ )
68
+
69
+ # Load the state dict
70
+ model.load_state_dict(checkpoint['model_state_dict'])
71
+ model.eval()
72
+
73
+ # Use the model for inference
74
+ from PIL import Image
75
+
76
+ image = preprocess(Image.open("path/to/image.jpg")).unsqueeze(0).to(device)
77
+ with torch.no_grad():
78
+ image_features = model.encode_image(image)
79
+ # ... your inference code
80
+ ```
81
+
82
+ ### Training Your Own Model
83
+
84
+ To train your own DHO model, please visit the [official GitHub repository](https://github.com/yourusername/DHO) for detailed instructions and training scripts.
85
+
86
+ **Example training command:**
87
+ ```bash
88
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 --master_port=29500 train_imgnet_semi.py \
89
+ --teacher_model "apple/DFN5B-CLIP-ViT-H-14-378" \
90
+ --student_model "ViT-B-16" \
91
+ --lr 5e-5 \
92
+ --train_epoch 32 \
93
+ --batch_size 256 \
94
+ --percent 10.0 \
95
+ | tee ./logs/imagenet/imgnet_lowshot.log
96
+ ```
97
+
98
+ ## Model Architecture
99
+
100
+ The DHO student model consists of:
101
+ - **Backbone:** CLIP Vision Transformer (ViT-B/16 or ViT-L/14)
102
+ - **Two parallel heads:**
103
+ - **CE Head:** Optimized with cross-entropy loss on labeled data
104
+ - **KD Head:** Optimized with knowledge distillation loss from teacher predictions
105
+
106
+ During inference, predictions from both heads are combined using learned weighting parameters (alpha, beta).
107
+
108
+ ## Performance
109
+
110
+ ### ImageNet Semi-supervised Learning
111
+
112
+ | Student | Teacher | Labeled Data | Top-1 Accuracy |
113
+ |:--------|:--------|:-------------|:---------------|
114
+ | ViT-B/16 | ViT-H/14 | 1% | **81.6%** |
115
+ | ViT-B/16 | ViT-H/14 | 10% | **82.8%** |
116
+ | ViT-L/14 | ViT-H/14 | 1% | **84.6%** |
117
+ | ViT-L/14 | ViT-H/14 | 10% | **85.9%** |
118
+
119
+ These results establish new state-of-the-art benchmarks for semi-supervised learning on ImageNet-1K.
120
+
121
+ ## Citation
122
+
123
+ If you use these models in your research, please cite:
124
+
125
+ ```bibtex
126
+ @article{kang2025simple,
127
+ title={Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization},
128
+ author={Kang, Seongjae and Lee, Dong Bok and Jang, Hyungjoon and Hwang, Sung Ju},
129
+ journal={arXiv preprint arXiv:2505.07675},
130
+ year={2025}
131
+ }
132
+ ```
133
+
134
+ ## License
135
+
136
+ This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
137
+
138
+ ## Acknowledgments
139
+
140
+ We appreciate the open-source implementations from:
141
+ - [Tip-Adapter](https://github.com/gaopengcuhk/Tip-Adapter)
142
+ - [CLIP](https://github.com/openai/CLIP)
143
+ - [OpenCLIP](https://github.com/mlfoundations/open_clip)
144
+
145
+ ## Contact
146
+
147
+ For questions or issues, please open an issue on the [GitHub repository](https://github.com/yourusername/DHO) or contact the authors.
148
+