prolongvid_stage1_7B

Model Summary

The ProLongVid-v1 models are 7B parameter models trained on ProLongVid_data, based on our extended Qwen2.5 language model with a context window of 256K tokens.

This prolongvid_stage1_7B model is trained on stage-1 short-video data of ProLongVid_data, based on prolongvid_image_sft_7B.

We suggest testing this model with up to 32 frames.

Citation

@inproceedings{wang2025prolongvid,
  title={ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning},
  author={Wang, Rui and Li, Bohao and Dai, Xiyang and Yang, Jianwei and Chen, Yi-Ling and Xing, Zhen and Yang, Yifan and Chen, Dongdong and Qiu, Xipeng and Wu, Zuxuan and others},
  booktitle={EMNLP},
  year={2025}
}
Downloads last month
5
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prolongvid/prolongvid_stage1_7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(1)
this model
Finetunes
1 model

Dataset used to train prolongvid/prolongvid_stage1_7B

Collection including prolongvid/prolongvid_stage1_7B