Giga-Embeddings-instruct (4-bit NF4 Quantized)
This is a 4-bit quantized version of the original model ai-sage/Giga-Embeddings-instruct, created using bitsandbytes with the following configuration:
bnb_cfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
Giga-Embeddings-instruct
- Base Decoder-only LLM: GigaChat-3b
- Pooling Type: Latent-Attention
- Embedding Dimension: 2048
⚠️ Note: This model is not fine-tuned — it is the original model loaded in 4-bit precision using
transformers+bitsandbytes. It requiresbitsandbytesandaccelerateto run.
Usage
from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch
bnb_cfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModel.from_pretrained(
"iMiW/Giga-Embeddings-instruct-4bit-nf4",
quantization_config=bnb_cfg,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"iMiW/Giga-Embeddings-instruct-4bit-nf4",
trust_remote_code=True
)
def get_detailed_instruct(task_description: str, query: str) -> str:
return f'Instruct: {task_description}\nQuery: {query}'
# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'
queries = [
get_detailed_instruct(task, 'What is the capital of Russia?'),
get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
"The capital of Russia is Moscow.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents
model.eval()
model.cuda()
max_length = 4096
# Tokenize the input texts
batch_dict = tokenizer(
input_texts,
padding=True,
truncation=True,
max_length=max_length,
return_tensors="pt",
)
batch_dict.to(model.device)
embeddings = model(**batch_dict, return_embeddings=True)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
- Downloads last month
- 163
Model tree for iMiW/Giga-Embeddings-instruct-4bit-nf4
Base model
ai-sage/Giga-Embeddings-instruct