Bed - Int8 Quantized Static Embeddings for Semantic Search
Ultra-fast int8 quantized static embeddings model for semantic search. Optimized for the gobed Go library.
Model Details
| Property | Value |
|---|---|
| Dimensions | 512 |
| Precision | int8 + scale factors |
| Vocabulary | 30,522 tokens |
| Model Size | 15 MB |
| Format | safetensors |
Performance
- Embedding latency: 0.16ms average
- Throughput: 6,200+ embeddings/sec
- Memory: 15 MB (7.9x smaller than float32 version)
- Compression ratio: 87.4% space reduction vs original
Usage with gobed (Go)
go get github.com/lee101/gobed
package main
import (
"fmt"
"log"
"github.com/lee101/gobed"
)
func main() {
engine, err := gobed.NewAutoSearchEngine()
if err != nil {
log.Fatal(err)
}
defer engine.Close()
docs := map[string]string{
"doc1": "machine learning and neural networks",
"doc2": "natural language processing",
}
engine.AddDocuments(docs)
results, _, _ := engine.SearchWithMetadata("AI research", 3)
for _, r := range results {
fmt.Printf("[%.3f] %s\n", r.Similarity, r.Content)
}
}
Download Model Manually
# Clone the model repository
git clone https://huggingface.co/lee101/bed
# Or download specific files
wget https://huggingface.co/lee101/bed/resolve/main/modelint8_512dim.safetensors
wget https://huggingface.co/lee101/bed/resolve/main/tokenizer.json
Using huggingface_hub (Python)
from huggingface_hub import hf_hub_download
# Download model file
model_path = hf_hub_download(repo_id="lee101/bed", filename="modelint8_512dim.safetensors")
# Download tokenizer
tokenizer_path = hf_hub_download(repo_id="lee101/bed", filename="tokenizer.json")
Model Architecture
This model uses static embeddings with int8 quantization:
- Embedding layer: 30,522 x 512 int8 weights
- Scale factors: 30,522 float32 scale values (one per token)
- Tokenizer: WordPiece tokenizer (same as BERT)
Embeddings are computed by:
- Tokenizing input text
- Looking up int8 embeddings for each token
- Multiplying by scale factors to reconstruct float values
- Mean pooling across tokens
Quantization Details
Original model: 30,522 x 1024 float32 (119 MB) Quantized model: 30,522 x 512 int8 + 30,522 float32 scales (15 MB)
Per-vector quantization preserves relative magnitudes:
max_abs = max(abs(embedding_vector))
scale = max_abs / 127.0
quantized = round(embedding_vector / scale).astype(int8)
Files
modelint8_512dim.safetensors- Quantized embeddings and scalestokenizer.json- HuggingFace tokenizer
License
MIT License - see gobed repository for details.
Citation
@software{gobed,
author = {Lee Penkman},
title = {gobed: Ultra-Fast Semantic Search for Go},
url = {https://github.com/lee101/gobed},
year = {2024}
}