Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
zhiliang's picture
4 28 12

zhiliang

zzliang
BreakLee's profile picture michelecafagna26's profile picture svjack's profile picture
·
  • pengzhiliang

AI & ML interests

multimodal

Organizations

podcast's profile picture

authored 9 papers 2 months ago

Generic-to-Specific Distillation of Masked Autoencoders

Paper • 2302.14771 • Published Feb 28, 2023

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Paper • 2310.02992 • Published Oct 4, 2023 • 4

Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Paper • 2205.09613 • Published May 19, 2022

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

Paper • 2208.06366 • Published Aug 12, 2022

Foundation Transformers

Paper • 2210.06423 • Published Oct 12, 2022

A Unified View of Masked Image Modeling

Paper • 2210.10615 • Published Oct 19, 2022

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

Paper • 2208.10442 • Published Aug 22, 2022

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 48

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123
authored a paper over 2 years ago

Kosmos-2: Grounding Multimodal Large Language Models to the World

Paper • 2306.14824 • Published Jun 26, 2023 • 34
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs