Instructions to use mixedbread-ai/mxbai-embed-large-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use mixedbread-ai/mxbai-embed-large-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers.js
How to use mixedbread-ai/mxbai-embed-large-v1 with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('feature-extraction', 'mixedbread-ai/mxbai-embed-large-v1'); - Transformers
How to use mixedbread-ai/mxbai-embed-large-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="mixedbread-ai/mxbai-embed-large-v1")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("mixedbread-ai/mxbai-embed-large-v1") model = AutoModel.from_pretrained("mixedbread-ai/mxbai-embed-large-v1") - llama-cpp-python
How to use mixedbread-ai/mxbai-embed-large-v1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="mixedbread-ai/mxbai-embed-large-v1", filename="gguf/mxbai-embed-large-v1-f16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use mixedbread-ai/mxbai-embed-large-v1 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mixedbread-ai/mxbai-embed-large-v1:F16 # Run inference directly in the terminal: llama-cli -hf mixedbread-ai/mxbai-embed-large-v1:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mixedbread-ai/mxbai-embed-large-v1:F16 # Run inference directly in the terminal: llama-cli -hf mixedbread-ai/mxbai-embed-large-v1:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf mixedbread-ai/mxbai-embed-large-v1:F16 # Run inference directly in the terminal: ./llama-cli -hf mixedbread-ai/mxbai-embed-large-v1:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf mixedbread-ai/mxbai-embed-large-v1:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf mixedbread-ai/mxbai-embed-large-v1:F16
Use Docker
docker model run hf.co/mixedbread-ai/mxbai-embed-large-v1:F16
- LM Studio
- Jan
- Ollama
How to use mixedbread-ai/mxbai-embed-large-v1 with Ollama:
ollama run hf.co/mixedbread-ai/mxbai-embed-large-v1:F16
- Unsloth Studio new
How to use mixedbread-ai/mxbai-embed-large-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mixedbread-ai/mxbai-embed-large-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mixedbread-ai/mxbai-embed-large-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mixedbread-ai/mxbai-embed-large-v1 to start chatting
- Docker Model Runner
How to use mixedbread-ai/mxbai-embed-large-v1 with Docker Model Runner:
docker model run hf.co/mixedbread-ai/mxbai-embed-large-v1:F16
- Lemonade
How to use mixedbread-ai/mxbai-embed-large-v1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull mixedbread-ai/mxbai-embed-large-v1:F16
Run and chat with the model
lemonade run user.mxbai-embed-large-v1-F16
List all available models
lemonade list
Add GGUF model file for llama.cpp (f16)
Thank you for the PR! Would be lovely if you could the move gguf file to gguf/mxbai-embed-large-v1-f16.gguf :)
Aamir.
Sorry, missed this earlier! Just pushed an update.
Why quantize at fp16? Being fp16 will not increase speed in anyway or memory usage, besides original fp16 can also be used on cpu, while quantization penalty degrades quality, even though a little, so why take the hit without any benefits. Why not have it quantized at int5 or int2 (cohere probably succeeded at 1 bit quantization, or maybe they retrained model with 1bit precision I don't know).
Well, fp16 will definitely decrease memory utilization relative to fp32, and at least on GPU will yield speed increases. The quality loss is indeed minimal. Since these models are pretty small, I haven't seen the need to go to lower quantization levels, but perhaps there are applications on small devices where this would be desirable.
With regard to the cohere 1bit quantization, keep in mind that they are talking about quantizing the output vectors, while here we are talking about quantizing the model weights. Regardless of how the model is quantized, we'll still get fp32 embedding vectors out, and it can definitely be useful to quantize those to reduce storage needs. I believe the new cohere models are specially trained to be able to yield good quality even with 1bit quantization of embeddings. Going down to 1bit on other models will usually result in pretty large quality losses.