Do we have vllm implementation for embedding models?

by Dibbla - opened 18 days ago

18 days ago

I can see from the doc we have granite instruction models implemented for vllm. Is it the same for the embedding models?

tombenninger

17 days ago

The vLLM v0.7.0 docs explictly stated they supported the XLMRobertaModel model architecture, but the newer docs don't say that specifically any more.

I do have ibm-granite/granite-embedding-278m-multilingual running in vLLM v0.9.0.1. Not a perfect match for what you're asking, but I suspect it would work.

bjhargrave

IBM Granite org 17 days ago

See https://github.com/bjhargrave/cog-models/blob/main/ibm-granite/granite-embedding-278m-multilingual/predict.py where I configure vLLM 0.8.5.post1 for serving granite-embedding-278m-multilingual. It should be the same for the latest vLLM release.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment