arxiv:2501.14818

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models

Published on Jan 20

Upvote

Authors:

Zhiqi Li ,

Guo Chen ,

Shilong Liu ,

Amala Sanjay Deshmukh ,

Ilia Karmanov ,

Abstract

Post-training data strategies significantly impact the performance of open-source vision-language models, enabling competitive results with fewer parameters.

AI-generated summary

Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 11

Browse 11 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.14818 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.