Giga-Embeddings-instruct (4-bit NF4 Quantized)

This is a 4-bit quantized version of the original model ai-sage/Giga-Embeddings-instruct, created using bitsandbytes with the following configuration:

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

Giga-Embeddings-instruct

  • Base Decoder-only LLM: GigaChat-3b
  • Pooling Type: Latent-Attention
  • Embedding Dimension: 2048

⚠️ Note: This model is not fine-tuned — it is the original model loaded in 4-bit precision using transformers + bitsandbytes. It requires bitsandbytes and accelerate to run.

Usage

from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModel.from_pretrained(
    "iMiW/Giga-Embeddings-instruct-4bit-nf4", 
    quantization_config=bnb_cfg, 
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "iMiW/Giga-Embeddings-instruct-4bit-nf4", 
    trust_remote_code=True
)

def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery: {query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of Russia?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of Russia is Moscow.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

model.eval()
model.cuda()

max_length = 4096

# Tokenize the input texts
batch_dict = tokenizer(
    input_texts,
    padding=True,
    truncation=True,
    max_length=max_length,
    return_tensors="pt",
)
batch_dict.to(model.device)
embeddings = model(**batch_dict, return_embeddings=True)

scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
Downloads last month
163
Safetensors
Model size
4B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iMiW/Giga-Embeddings-instruct-4bit-nf4

Quantized
(1)
this model