README / README.md
26hzhang's picture
Update README.md
b783a99 verified
metadata
title: README
emoji: πŸ“ˆ
colorFrom: pink
colorTo: pink
sdk: static
pinned: false
license: apache-2.0

Welcome to the LCO-Embedding project - Scaling Language-centric Omnimodal Representation Learning.

Highlights:

  • We introduce LCO-Embedding, a language-centric omnimodal representation learning method and the LCO-Embedding model families, setting a new state-of-the-art on MIEB (Massive Image Embedding Benchmark) while supporting audio and videos.
  • We introduce the Generation-Representation Scaling Law, and connect models' generative capabilities and their representation upper bound.
  • We introduce SeaDoc, a challenging visual document retrieval task in Southeast Asian languages, and show that continual generative pretraining before contrastive learning raises the representation upper bound.