michael-sigamani commited on
Commit
944c418
·
verified ·
1 Parent(s): 7be200c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -0
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Nomic Embed Text V1 (ONNX)
2
+
3
+ **Tags:** `text-embedding` `onnx` `nomic-embed-text` `sentence-transformers`
4
+
5
+ ---
6
+
7
+ ## Model Details
8
+
9
+ - **Model Name:** Nomic Embed Text V1 (ONNX export)
10
+ - **Original HF Repo:** [nomic-ai/nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1)
11
+ - **ONNX File:** `model.onnx`
12
+ - **Export Date:** 2025-05-27
13
+
14
+ This model outputs:
15
+ 1. **token_embeddings** — per‐token embedding vectors (`[batch_size, seq_len, hidden_size]`)
16
+ 2. **sentence_embedding** — pooled sentence‐level embeddings (`[batch_size, hidden_size]`)
17
+
18
+ ---
19
+
20
+ ## Model Description
21
+
22
+ Nomic Embed Text V1 is a BERT‐style encoder trained to generate high-quality dense representations of text. It is suitable for:
23
+
24
+ - Semantic search
25
+ - Text clustering
26
+ - Recommendation systems
27
+ - Downstream classification
28
+
29
+ The ONNX export ensures compatibility with inference engines like [ONNX Runtime](https://www.onnxruntime.ai/) and NVIDIA Triton Inference Server.
30
+
31
+ ---
32
+
33
+ ## Usage
34
+
35
+ ### 1. Install Dependencies
36
+
37
+ ```bash
38
+ pip install onnxruntime transformers numpy
39
+ ```
40
+
41
+ ### 2. Install Dependencies
42
+
43
+ ```python
44
+ import onnxruntime as ort
45
+
46
+ session = ort.InferenceSession("model.onnx")
47
+ ```
48
+
49
+ ### 3. Tokenize Inputs
50
+
51
+ ```python
52
+ from transformers import AutoTokenizer
53
+
54
+ tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1")
55
+ inputs = tokenizer(
56
+ ["Hello world", "Another sentence"],
57
+ padding=True,
58
+ truncation=True,
59
+ return_tensors="np"
60
+ )
61
+ ```
62
+
63
+ ### 4. Run Inference
64
+
65
+ ```python
66
+ outputs = session.run(
67
+ ["token_embeddings", "sentence_embedding"],
68
+ {
69
+ "input_ids": inputs["input_ids"],
70
+ "attention_mask": inputs["attention_mask"]
71
+ }
72
+ )
73
+
74
+ token_embeddings, sentence_embeddings = outputs
75
+ ```
76
+
77
+ ## Serving with Triton
78
+
79
+ Place your model files under:
80
+
81
+ models/
82
+ └── nomic_embeddings/
83
+ └── 1/
84
+ ├── model.onnx
85
+ ├── config.pbtxt
86
+ └── (tokenizer files…)
87
+
88
+
89
+ Create a config.pbtxt file that looks something like this:
90
+
91
+ ```protobuf
92
+ name: "nomic_embeddings"
93
+ backend: "onnxruntime"
94
+ max_batch_size: 8
95
+
96
+ input [
97
+ {
98
+ name: "input_ids"
99
+ data_type: TYPE_INT32
100
+ dims: [-1]
101
+ },
102
+ {
103
+ name: "attention_mask"
104
+ data_type: TYPE_INT32
105
+ dims: [-1]
106
+ }
107
+ ]
108
+
109
+ output [
110
+ {
111
+ name: "token_embeddings"
112
+ data_type: TYPE_FP32
113
+ dims: [-1, 768]
114
+ },
115
+ {
116
+ name: "sentence_embedding"
117
+ data_type: TYPE_FP32
118
+ dims: [-1, 768]
119
+ }
120
+ ]
121
+
122
+ instance_group [
123
+ {
124
+ kind: KIND_GPU
125
+ count: 1
126
+ }
127
+ ]
128
+ ```
129
+
130
+ Start Triton:
131
+
132
+ ```bash
133
+ tritonserver \
134
+ --model-repository=/path/to/models \
135
+ --model-control-mode=explicit \
136
+ --load-model=nomic_embeddings
137
+ ```