Giga-Embeddings-instruct (4-bit NF4 Quantized)

This is a 4-bit quantized version of the original model ai-sage/Giga-Embeddings-instruct, created using bitsandbytes with the following configuration:

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

Giga-Embeddings-instruct

Base Decoder-only LLM: GigaChat-3b
Pooling Type: Latent-Attention
Embedding Dimension: 2048

⚠️ Note: This model is not fine-tuned — it is the original model loaded in 4-bit precision using transformers + bitsandbytes. It requires bitsandbytes and accelerate to run.

Usage

from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModel.from_pretrained(
    "iMiW/Giga-Embeddings-instruct-4bit-nf4", 
    quantization_config=bnb_cfg, 
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "iMiW/Giga-Embeddings-instruct-4bit-nf4", 
    trust_remote_code=True
)

def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery: {query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of Russia?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of Russia is Moscow.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

model.eval()
model.cuda()

max_length = 4096

# Tokenize the input texts
batch_dict = tokenizer(
    input_texts,
    padding=True,
    truncation=True,
    max_length=max_length,
    return_tensors="pt",
)
batch_dict.to(model.device)
embeddings = model(**batch_dict, return_embeddings=True)

scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())

Downloads last month: 163

Safetensors

Model size

4B params

Tensor type

F32

F16

Model tree for iMiW/Giga-Embeddings-instruct-4bit-nf4

Base model

ai-sage/Giga-Embeddings-instruct

Quantized

(1)

this model