1M Paper SciBERT Embedding Model ONNX

This repository contains a merged ONNX export of the M1 scientific-paper embedding model.

The original adapter is PeytonT/1m-paper-embedding-model, which is a PEFT/LoRA adapter trained on scientific paper metadata. This export merges that adapter into allenai/scibert_scivocab_uncased and packages it for browser/static-app inference.

Files

Path Description
onnx/model.onnx Float ONNX graph. Uses external weights.
onnx/model.onnx.data External float ONNX weights.
onnx/model.int8.onnx Dynamically quantized int8 ONNX model. Recommended for browser/WASM use.
tokenizer/ SciBERT tokenizer files.
manifest.json Export metadata used by the research library app.

Embedding Format

The exported graph returns one embedding per input text:

  • output name: embedding
  • shape: [batch, 768]
  • pooling: attention-mask mean pooling over the final hidden states
  • normalization: L2 normalized
  • max sequence length used for export: 256 tokens

The model inputs are:

  • input_ids
  • attention_mask
  • token_type_ids

Intended Use

This model is intended for semantic embedding of scientific paper titles, abstracts, keywords, repository descriptions, and short research queries. It is designed for the static research-library viewer where client-side embedding can support local search, reranking, and graph navigation without calling a private server.

For browser usage, prefer onnx/model.int8.onnx with ONNX Runtime Web.

Python Example

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("./tokenizer")
encoded = tokenizer(
    ["graph neural retrieval over scientific paper abstracts"],
    padding="max_length",
    truncation=True,
    max_length=256,
    return_tensors="np",
)
encoded.setdefault("token_type_ids", np.zeros_like(encoded["input_ids"]))

session = ort.InferenceSession("./onnx/model.int8.onnx", providers=["CPUExecutionProvider"])
embedding = session.run(
    None,
    {
        "input_ids": encoded["input_ids"].astype("int64"),
        "attention_mask": encoded["attention_mask"].astype("int64"),
        "token_type_ids": encoded["token_type_ids"].astype("int64"),
    },
)[0]

print(embedding.shape)
print(np.linalg.norm(embedding[0]))

Export Details

Created with:

conda run -n ai python scripts/export_m1_scibert_onnx.py

Source export script:

scripts/export_m1_scibert_onnx.py

Limitations

This is an int8 dynamic quantization of a merged SciBERT encoder, not a BitNet model. BitNet-style inference would require training or distilling a compatible BitNet architecture rather than converting this SciBERT checkpoint directly.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PeytonT/1m-paper-embedding-model-onnx

Quantized
(4)
this model

Datasets used to train PeytonT/1m-paper-embedding-model-onnx