1M Paper SciBERT Embedding Model ONNX

This repository contains a merged ONNX export of the M1 scientific-paper embedding model.

The original adapter is PeytonT/1m-paper-embedding-model, which is a PEFT/LoRA adapter trained on scientific paper metadata. This export merges that adapter into allenai/scibert_scivocab_uncased and packages it for browser/static-app inference.

Files

Path	Description
`onnx/model.onnx`	Float ONNX graph. Uses external weights.
`onnx/model.onnx.data`	External float ONNX weights.
`onnx/model.int8.onnx`	Dynamically quantized int8 ONNX model. Recommended for browser/WASM use.
`tokenizer/`	SciBERT tokenizer files.
`manifest.json`	Export metadata used by the research library app.

Embedding Format

The exported graph returns one embedding per input text:

output name: embedding
shape: [batch, 768]
pooling: attention-mask mean pooling over the final hidden states
normalization: L2 normalized
max sequence length used for export: 256 tokens

The model inputs are:

input_ids
attention_mask
token_type_ids

Intended Use

This model is intended for semantic embedding of scientific paper titles, abstracts, keywords, repository descriptions, and short research queries. It is designed for the static research-library viewer where client-side embedding can support local search, reranking, and graph navigation without calling a private server.

For browser usage, prefer onnx/model.int8.onnx with ONNX Runtime Web.

Python Example

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("./tokenizer")
encoded = tokenizer(
    ["graph neural retrieval over scientific paper abstracts"],
    padding="max_length",
    truncation=True,
    max_length=256,
    return_tensors="np",
)
encoded.setdefault("token_type_ids", np.zeros_like(encoded["input_ids"]))

session = ort.InferenceSession("./onnx/model.int8.onnx", providers=["CPUExecutionProvider"])
embedding = session.run(
    None,
    {
        "input_ids": encoded["input_ids"].astype("int64"),
        "attention_mask": encoded["attention_mask"].astype("int64"),
        "token_type_ids": encoded["token_type_ids"].astype("int64"),
    },
)[0]

print(embedding.shape)
print(np.linalg.norm(embedding[0]))

Export Details

Created with:

conda run -n ai python scripts/export_m1_scibert_onnx.py

Source export script:

scripts/export_m1_scibert_onnx.py

Limitations

This is an int8 dynamic quantization of a merged SciBERT encoder, not a BitNet model. BitNet-style inference would require training or distilling a compatible BitNet architecture rather than converting this SciBERT checkpoint directly.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for PeytonT/1m-paper-embedding-model-onnx

Base model

allenai/scibert_scivocab_uncased

Quantized

(4)

this model

PeytonT
/

1m-paper-embedding-model-onnx