UniLIP / README.md
kanashi6's picture
Update README.md
2409825 verified
---
license: apache-2.0
datasets:
- BLIP3o/BLIP3o-Pretrain-Long-Caption
- BLIP3o/BLIP3o-Pretrain-Short-Caption
- BLIP3o/BLIP3o-Pretrain-JourneyDB
base_model:
- OpenGVLab/InternVL3-1B
---
This repository contains the model (**autoencoders**) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing.
UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a **two-stage and self-distillation training** for reconstruction, we empower CLIP to achieve excellent reconstruction results **without compromising its original understanding abilities**. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.
For more details, please refer to the original paper and the GitHub repository:
Paper: https://www.arxiv.org/abs/2507.23278
GitHub: https://github.com/nnnth/UniLIP