You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

berg-embed/Qwen3_1.7B_Hessian_FSDP_ckpt2600

A sentence-transformers embedding model based on Qwen/Qwen3-1.7B. Maps sentences & paragraphs to a 2048-dimensional dense vector space using last-token pooling and L2 normalization.

Key Details

Property	Value
Base model	Qwen/Qwen3-1.7B
Output dimensions	2048
Max sequence length	4096 tokens
Pooling	Last token
Normalization	L2
Similarity	Cosine

Input Format

This model uses a chat-template format for inputs. Both queries and documents must be wrapped:

<|im_start|>instruction
{instruction}
<|im_end|>
<|im_start|>content
{text}
<|im_end|>

Usage with SentenceTransformer (recommended)

pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("berg-embed/Qwen3_1.7B_Hessian_FSDP_ckpt2600")

def format_input(instruction: str, text: str) -> str:
    return (
        "<|im_start|>instruction\n"
        f"{instruction}\n"
        "<|im_end|>\n"
        "<|im_start|>content\n"
        f"{text}\n"
        "<|im_end|>"
    )

# Encode queries
queries = [format_input("Retrieve documents that answer this question", "What is photosynthesis?")]
query_emb = model.encode(queries, normalize_embeddings=True)

# Encode documents
docs = [format_input("Represent this document for retrieval", "Photosynthesis is the process by which plants convert sunlight into energy.")]
doc_emb = model.encode(docs, normalize_embeddings=True)

# Compute similarity
similarity = model.similarity(query_emb, doc_emb)
print(similarity)

SentenceTransformer handles tokenizer setup (left-padding), last-token pooling, and normalization automatically from the model config.

Usage with Transformers (AutoModel)

This is how the model is used in our internal evaluation pipeline (BRIGHT benchmark).

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

checkpoint = "berg-embed/Qwen3_1.7B_Hessian_FSDP_ckpt2600"

tokenizer = AutoTokenizer.from_pretrained(checkpoint, padding_side="left", trust_remote_code=True)
model = AutoModel.from_pretrained(checkpoint, trust_remote_code=True, torch_dtype=torch.bfloat16)
model.eval().cuda()

def last_token_pool(hidden_states, attention_mask):
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return hidden_states[:, -1]
    seq_lens = attention_mask.sum(dim=1) - 1
    return hidden_states[torch.arange(hidden_states.size(0), device=hidden_states.device), seq_lens]

def format_input(instruction: str, text: str) -> str:
    return (
        "<|im_start|>instruction\n"
        f"{instruction}\n"
        "<|im_end|>\n"
        "<|im_start|>content\n"
        f"{text}\n"
        "<|im_end|>"
    )

def encode(texts, max_length=4096):
    inputs = tokenizer(texts, padding=True, truncation=True, max_length=max_length, return_tensors="pt").to("cuda")
    with torch.no_grad():
        outputs = model(**inputs)
    emb = last_token_pool(outputs.last_hidden_state, inputs["attention_mask"])
    return F.normalize(emb, p=2, dim=1)

# Encode
queries = [format_input("Retrieve documents that answer this question", "What is photosynthesis?")]
docs = [format_input("Represent this document for retrieval", "Photosynthesis is ...")]

q_emb = encode(queries)
d_emb = encode(docs)
scores = (q_emb @ d_emb.T).tolist()

Critical settings for AutoModel usage:

padding_side="left" on the tokenizer (required for last-token pooling)
last_token_pool() to extract the embedding from the last non-padding token
F.normalize() for L2 normalization

Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 4096, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
  (1): Pooling({'word_embedding_dimension': 2048, 'pooling_mode_lasttoken': True})
  (2): Normalize()
)

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for berg-embed/Qwen3_1.7B_Hessian_FSDP_ckpt2600

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(544)

this model