Bed - Int8 Quantized Static Embeddings for Semantic Search

Ultra-fast int8 quantized static embeddings model for semantic search. Optimized for the gobed Go library.

Model Details

Property Value
Dimensions 512
Precision int8 + scale factors
Vocabulary 30,522 tokens
Model Size 15 MB
Format safetensors

Performance

  • Embedding latency: 0.16ms average
  • Throughput: 6,200+ embeddings/sec
  • Memory: 15 MB (7.9x smaller than float32 version)
  • Compression ratio: 87.4% space reduction vs original

Usage with gobed (Go)

go get github.com/lee101/gobed
package main

import (
    "fmt"
    "log"
    "github.com/lee101/gobed"
)

func main() {
    engine, err := gobed.NewAutoSearchEngine()
    if err != nil {
        log.Fatal(err)
    }
    defer engine.Close()

    docs := map[string]string{
        "doc1": "machine learning and neural networks",
        "doc2": "natural language processing",
    }
    engine.AddDocuments(docs)

    results, _, _ := engine.SearchWithMetadata("AI research", 3)
    for _, r := range results {
        fmt.Printf("[%.3f] %s\n", r.Similarity, r.Content)
    }
}

Download Model Manually

# Clone the model repository
git clone https://huggingface.co/lee101/bed

# Or download specific files
wget https://huggingface.co/lee101/bed/resolve/main/modelint8_512dim.safetensors
wget https://huggingface.co/lee101/bed/resolve/main/tokenizer.json

Using huggingface_hub (Python)

from huggingface_hub import hf_hub_download

# Download model file
model_path = hf_hub_download(repo_id="lee101/bed", filename="modelint8_512dim.safetensors")

# Download tokenizer
tokenizer_path = hf_hub_download(repo_id="lee101/bed", filename="tokenizer.json")

Model Architecture

This model uses static embeddings with int8 quantization:

  • Embedding layer: 30,522 x 512 int8 weights
  • Scale factors: 30,522 float32 scale values (one per token)
  • Tokenizer: WordPiece tokenizer (same as BERT)

Embeddings are computed by:

  1. Tokenizing input text
  2. Looking up int8 embeddings for each token
  3. Multiplying by scale factors to reconstruct float values
  4. Mean pooling across tokens

Quantization Details

Original model: 30,522 x 1024 float32 (119 MB) Quantized model: 30,522 x 512 int8 + 30,522 float32 scales (15 MB)

Per-vector quantization preserves relative magnitudes:

max_abs = max(abs(embedding_vector))
scale = max_abs / 127.0
quantized = round(embedding_vector / scale).astype(int8)

Files

  • modelint8_512dim.safetensors - Quantized embeddings and scales
  • tokenizer.json - HuggingFace tokenizer

License

MIT License - see gobed repository for details.

Citation

@software{gobed,
  author = {Lee Penkman},
  title = {gobed: Ultra-Fast Semantic Search for Go},
  url = {https://github.com/lee101/gobed},
  year = {2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support