berg-embed/Qwen3_1.7B_Hessian_FSDP_ckpt2600
A sentence-transformers embedding model based on Qwen/Qwen3-1.7B. Maps sentences & paragraphs to a 2048-dimensional dense vector space using last-token pooling and L2 normalization.
Key Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3-1.7B |
| Output dimensions | 2048 |
| Max sequence length | 4096 tokens |
| Pooling | Last token |
| Normalization | L2 |
| Similarity | Cosine |
Input Format
This model uses a chat-template format for inputs. Both queries and documents must be wrapped:
<|im_start|>instruction
{instruction}
<|im_end|>
<|im_start|>content
{text}
<|im_end|>
Usage with SentenceTransformer (recommended)
pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("berg-embed/Qwen3_1.7B_Hessian_FSDP_ckpt2600")
def format_input(instruction: str, text: str) -> str:
return (
"<|im_start|>instruction\n"
f"{instruction}\n"
"<|im_end|>\n"
"<|im_start|>content\n"
f"{text}\n"
"<|im_end|>"
)
# Encode queries
queries = [format_input("Retrieve documents that answer this question", "What is photosynthesis?")]
query_emb = model.encode(queries, normalize_embeddings=True)
# Encode documents
docs = [format_input("Represent this document for retrieval", "Photosynthesis is the process by which plants convert sunlight into energy.")]
doc_emb = model.encode(docs, normalize_embeddings=True)
# Compute similarity
similarity = model.similarity(query_emb, doc_emb)
print(similarity)
SentenceTransformer handles tokenizer setup (left-padding), last-token pooling, and normalization automatically from the model config.
Usage with Transformers (AutoModel)
This is how the model is used in our internal evaluation pipeline (BRIGHT benchmark).
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
checkpoint = "berg-embed/Qwen3_1.7B_Hessian_FSDP_ckpt2600"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, padding_side="left", trust_remote_code=True)
model = AutoModel.from_pretrained(checkpoint, trust_remote_code=True, torch_dtype=torch.bfloat16)
model.eval().cuda()
def last_token_pool(hidden_states, attention_mask):
left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
if left_padding:
return hidden_states[:, -1]
seq_lens = attention_mask.sum(dim=1) - 1
return hidden_states[torch.arange(hidden_states.size(0), device=hidden_states.device), seq_lens]
def format_input(instruction: str, text: str) -> str:
return (
"<|im_start|>instruction\n"
f"{instruction}\n"
"<|im_end|>\n"
"<|im_start|>content\n"
f"{text}\n"
"<|im_end|>"
)
def encode(texts, max_length=4096):
inputs = tokenizer(texts, padding=True, truncation=True, max_length=max_length, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
emb = last_token_pool(outputs.last_hidden_state, inputs["attention_mask"])
return F.normalize(emb, p=2, dim=1)
# Encode
queries = [format_input("Retrieve documents that answer this question", "What is photosynthesis?")]
docs = [format_input("Represent this document for retrieval", "Photosynthesis is ...")]
q_emb = encode(queries)
d_emb = encode(docs)
scores = (q_emb @ d_emb.T).tolist()
Critical settings for AutoModel usage:
padding_side="left"on the tokenizer (required for last-token pooling)last_token_pool()to extract the embedding from the last non-padding tokenF.normalize()for L2 normalization
Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 4096, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
(1): Pooling({'word_embedding_dimension': 2048, 'pooling_mode_lasttoken': True})
(2): Normalize()
)
- Downloads last month
- -