kanashi6
/

UniLIP

Model card Files Files and versions

UniLIP / README.md

kanashi6's picture

Update README.md

2409825 verified about 1 month ago

|

history blame contribute delete

937 Bytes

	---
	license: apache-2.0
	datasets:
	- BLIP3o/BLIP3o-Pretrain-Long-Caption
	- BLIP3o/BLIP3o-Pretrain-Short-Caption
	- BLIP3o/BLIP3o-Pretrain-JourneyDB
	base_model:
	- OpenGVLab/InternVL3-1B
	---
	This repository contains the model (autoencoders) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing.

	UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a two-stage and self-distillation training for reconstruction, we empower CLIP to achieve excellent reconstruction results without compromising its original understanding abilities. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.

	For more details, please refer to the original paper and the GitHub repository:

	Paper: https://www.arxiv.org/abs/2507.23278

	GitHub: https://github.com/nnnth/UniLIP