michael-sigamani
/

nomic-embed-text-onnx

Sentence Similarity

Model card Files Files and versions

michael-sigamani commited on May 28

Commit

944c418

·

verified ·

1 Parent(s): 7be200c

Create README.md

Files changed (1) hide show

README.md +137 -0

README.md ADDED Viewed

	@@ -0,0 +1,137 @@

+# Nomic Embed Text V1 (ONNX)
+**Tags:** `text-embedding` `onnx` `nomic-embed-text` `sentence-transformers`
+---
+## Model Details
+- **Model Name:** Nomic Embed Text V1 (ONNX export)
+- **Original HF Repo:** [nomic-ai/nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1)
+- **ONNX File:** `model.onnx`
+- **Export Date:** 2025-05-27
+This model outputs:
+1. **token_embeddings** — per‐token embedding vectors (`[batch_size, seq_len, hidden_size]`)
+2. **sentence_embedding** — pooled sentence‐level embeddings (`[batch_size, hidden_size]`)
+---
+## Model Description
+Nomic Embed Text V1 is a BERT‐style encoder trained to generate high-quality dense representations of text. It is suitable for:
+- Semantic search
+- Text clustering
+- Recommendation systems
+- Downstream classification
+The ONNX export ensures compatibility with inference engines like [ONNX Runtime](https://www.onnxruntime.ai/) and NVIDIA Triton Inference Server.
+---
+## Usage
+### 1. Install Dependencies
+```bash
+pip install onnxruntime transformers numpy
+```
+### 2. Install Dependencies
+```python
+import onnxruntime as ort
+session = ort.InferenceSession("model.onnx")
+```
+### 3. Tokenize Inputs
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1")
+inputs = tokenizer(
+    ["Hello world", "Another sentence"],
+    padding=True,
+    truncation=True,
+    return_tensors="np"
+)
+```
+### 4. Run Inference
+```python
+outputs = session.run(
+    ["token_embeddings", "sentence_embedding"],
+    {
+        "input_ids": inputs["input_ids"],
+        "attention_mask": inputs["attention_mask"]
+    }
+)
+token_embeddings, sentence_embeddings = outputs
+```
+## Serving with Triton
+Place your model files under:
+models/
+└── nomic_embeddings/
+    └── 1/
+        ├── model.onnx
+        ├── config.pbtxt
+        └── (tokenizer files…)
+Create a config.pbtxt file that looks something like this:
+```protobuf
+name: "nomic_embeddings"
+backend: "onnxruntime"
+max_batch_size: 8
+input [
+  {
+    name: "input_ids"
+    data_type: TYPE_INT32
+    dims: [-1]
+  },
+  {
+    name: "attention_mask"
+    data_type: TYPE_INT32
+    dims: [-1]
+  }
+]
+output [
+  {
+    name: "token_embeddings"
+    data_type: TYPE_FP32
+    dims: [-1, 768]
+  },
+  {
+    name: "sentence_embedding"
+    data_type: TYPE_FP32
+    dims: [-1, 768]
+  }
+]
+instance_group [
+  {
+    kind: KIND_GPU
+    count: 1
+  }
+]
+```
+Start Triton:
+```bash
+tritonserver \
+  --model-repository=/path/to/models \
+  --model-control-mode=explicit \
+  --load-model=nomic_embeddings
+```