It looks like there is incorrect limit on the model context length. The fp16 like the original one have 131072 length. Updating this value resolved errors while processing longer prompts.

#2
No description provided.
OpenVINO Toolkit org

This is a known issue and a current limitation of the INT4 model. When optimum-intel allows preserving the original max_position_embeddings, we will re-upload the model.

amokrov changed pull request status to closed

Sign up or log in to comment