Instructions to use limcheekin/snowflake-arctic-embed-l-v2.0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use limcheekin/snowflake-arctic-embed-l-v2.0-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="limcheekin/snowflake-arctic-embed-l-v2.0-GGUF",
	filename="snowflake-arctic-embed-l-v2.0.F16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use limcheekin/snowflake-arctic-embed-l-v2.0-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16
# Run inference directly in the terminal:
./llama-cli -hf limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16

Use Docker

docker model run hf.co/limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16

LM Studio
Jan
Ollama
How to use limcheekin/snowflake-arctic-embed-l-v2.0-GGUF with Ollama:
```
ollama run hf.co/limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16
```

Unsloth Studio

How to use limcheekin/snowflake-arctic-embed-l-v2.0-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for limcheekin/snowflake-arctic-embed-l-v2.0-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for limcheekin/snowflake-arctic-embed-l-v2.0-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for limcheekin/snowflake-arctic-embed-l-v2.0-GGUF to start chatting

Docker Model Runner
How to use limcheekin/snowflake-arctic-embed-l-v2.0-GGUF with Docker Model Runner:
```
docker model run hf.co/limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16
```

Lemonade

How to use limcheekin/snowflake-arctic-embed-l-v2.0-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull limcheekin/snowflake-arctic-embed-l-v2.0-GGUF:F16

Run and chat with the model

lemonade run user.snowflake-arctic-embed-l-v2.0-GGUF-F16

List all available models

lemonade list

Model Card: Snowflake Arctic Embed L v2.0 (GGUF Quantized)

Model Overview

This model is a GGUF-quantized version of Snowflake's Arctic Embed L v2.0, a state-of-the-art multilingual text embedding model designed for high-quality retrieval tasks. The quantization reduces the model's size and computational requirements, facilitating efficient deployment without significantly compromising performance.

Model Details

Model Name: snowflake-arctic-embed-l-v2.0-GGUF
Original Model: Snowflake's Arctic Embed L v2.0
Quantization Format: GGUF
Parameters: 568 million
Embedding Dimension: 1,024
Languages Supported: Multilingual (supports multiple languages)
Context Length: Supports up to 8,192 tokens
License: Apache 2.0

Quantization Details

GGUF (Gerganov's General Unified Format) is a binary format optimized for efficient loading and inference of large language models. Quantization involves reducing the precision of the model's weights, resulting in decreased memory usage and faster computation with minimal impact on accuracy.

Performance

The original Arctic Embed L v2.0 model achieves state-of-the-art performance on various retrieval benchmarks, including the MTEB Retrieval benchmark, with an NDCG@10 score of 55.98. The GGUF-quantized version aims to maintain this high performance while offering enhanced efficiency.

Usage

This quantized model is suitable for deployment in resource-constrained environments where memory and computational efficiency are critical. It can be utilized for tasks such as information retrieval, semantic search, and other applications requiring high-quality text embeddings.

Limitations

While quantization reduces resource requirements, it may introduce slight degradation in model performance. Users should evaluate the model in their specific use cases to ensure it meets the desired performance criteria.

Acknowledgements

This quantized model is based on Snowflake's Arctic Embed L v2.0. For more details on the original model, please refer to the official model card.

For a visual overview of Snowflake's Arctic Embed v2.0, you may find the following video informative: https://www.youtube.com/watch?v=CmSZZkzghhU

Downloads last month: 101

GGUF

Model size

0.6B params

Architecture

bert

Hardware compatibility

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for limcheekin/snowflake-arctic-embed-l-v2.0-GGUF

Base model

Snowflake/snowflake-arctic-embed-l-v2.0

Quantized

(13)

this model