Do we have vllm implementation for embedding models?

#3
by Dibbla - opened

I can see from the doc we have granite instruction models implemented for vllm. Is it the same for the embedding models?

The vLLM v0.7.0 docs explictly stated they supported the XLMRobertaModel model architecture, but the newer docs don't say that specifically any more.

I do have ibm-granite/granite-embedding-278m-multilingual running in vLLM v0.9.0.1. Not a perfect match for what you're asking, but I suspect it would work.

IBM Granite org

See https://github.com/bjhargrave/cog-models/blob/main/ibm-granite/granite-embedding-278m-multilingual/predict.py where I configure vLLM 0.8.5.post1 for serving granite-embedding-278m-multilingual. It should be the same for the latest vLLM release.

Sign up or log in to comment