It looks like there is incorrect limit on the model context length. The fp16 like the original one have 131072 length. Updating this value resolved errors while processing longer prompts.
#2
by
dtrawins
- opened
No description provided.
This is a known issue and a current limitation of the INT4 model. When optimum-intel allows preserving the original max_position_embeddings, we will re-upload the model.
amokrov
changed pull request status to
closed