1M Paper SciBERT Embedding Model ONNX
This repository contains a merged ONNX export of the M1 scientific-paper embedding model.
The original adapter is PeytonT/1m-paper-embedding-model, which is a PEFT/LoRA adapter trained on scientific paper metadata. This export merges that adapter into allenai/scibert_scivocab_uncased and packages it for browser/static-app inference.
Files
| Path | Description |
|---|---|
onnx/model.onnx |
Float ONNX graph. Uses external weights. |
onnx/model.onnx.data |
External float ONNX weights. |
onnx/model.int8.onnx |
Dynamically quantized int8 ONNX model. Recommended for browser/WASM use. |
tokenizer/ |
SciBERT tokenizer files. |
manifest.json |
Export metadata used by the research library app. |
Embedding Format
The exported graph returns one embedding per input text:
- output name:
embedding - shape:
[batch, 768] - pooling: attention-mask mean pooling over the final hidden states
- normalization: L2 normalized
- max sequence length used for export: 256 tokens
The model inputs are:
input_idsattention_masktoken_type_ids
Intended Use
This model is intended for semantic embedding of scientific paper titles, abstracts, keywords, repository descriptions, and short research queries. It is designed for the static research-library viewer where client-side embedding can support local search, reranking, and graph navigation without calling a private server.
For browser usage, prefer onnx/model.int8.onnx with ONNX Runtime Web.
Python Example
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("./tokenizer")
encoded = tokenizer(
["graph neural retrieval over scientific paper abstracts"],
padding="max_length",
truncation=True,
max_length=256,
return_tensors="np",
)
encoded.setdefault("token_type_ids", np.zeros_like(encoded["input_ids"]))
session = ort.InferenceSession("./onnx/model.int8.onnx", providers=["CPUExecutionProvider"])
embedding = session.run(
None,
{
"input_ids": encoded["input_ids"].astype("int64"),
"attention_mask": encoded["attention_mask"].astype("int64"),
"token_type_ids": encoded["token_type_ids"].astype("int64"),
},
)[0]
print(embedding.shape)
print(np.linalg.norm(embedding[0]))
Export Details
Created with:
conda run -n ai python scripts/export_m1_scibert_onnx.py
Source export script:
scripts/export_m1_scibert_onnx.py
Limitations
This is an int8 dynamic quantization of a merged SciBERT encoder, not a BitNet model. BitNet-style inference would require training or distilling a compatible BitNet architecture rather than converting this SciBERT checkpoint directly.
Model tree for PeytonT/1m-paper-embedding-model-onnx
Base model
allenai/scibert_scivocab_uncased