M2V-Qwen3-Embedding-0.6B-1024d
A high-performance Model2Vec distilled embedding model based on Qwen/Qwen3-Embedding-0.6B.
Key Features:
- Ultra-fast inference (<1ms for 4 sentences on CPU)
- 1024-dimensional embeddings
- Multilingual support (inherited from Qwen3-Embedding)
- ~302MB model size
MTEB Benchmark Results
| Task Category | Score |
|---|---|
| STS (Semantic Similarity) | 0.4845 |
| - STSBenchmark | 0.4215 |
| - SICK-R | 0.5475 |
| Classification (kNN) | 0.5949 |
| - Banking77 | 0.7402 |
| - Emotion | 0.4496 |
| Clustering | 0.1810 |
| - TwentyNewsgroups | 0.1810 |
| Overall MTEB | 0.4202 |
Comparison with Other Models
| Model | STS | Classification | Size | Latency |
|---|---|---|---|---|
| M2V-BGE-M3-1024d | 0.5831 | 0.6564 | 499 MB | <1ms |
| M2V-Qwen3-0.6B-1024d | 0.4845 | 0.5949 | 302 MB | <1ms |
| POTION-base-8M | ~0.52 | ~0.55 | 30 MB | <1ms |
Installation
Install model2vec using pip:
pip install model2vec
Usage
Using Model2Vec
The Model2Vec library is the fastest and most lightweight way to run Model2Vec models.
Load this model using the from_pretrained method:
from model2vec import StaticModel
# Load a pretrained Model2Vec model
model = StaticModel.from_pretrained("m2v-qwen3-emb-0.6b-1024d")
# Compute text embeddings
embeddings = model.encode(["Example sentence"])
Using Sentence Transformers
You can also use the Sentence Transformers library to load and use the model:
from sentence_transformers import SentenceTransformer
# Load a pretrained Sentence Transformer model
model = SentenceTransformer("m2v-qwen3-emb-0.6b-1024d")
# Compute text embeddings
embeddings = model.encode(["Example sentence"])
Distilling a Model2Vec model
You can distill a Model2Vec model from a Sentence Transformer model using the distill method. First, install the distill extra with pip install model2vec[distill]. Then, run the following code:
from model2vec.distill import distill
# Distill a Sentence Transformer model, in this case the BAAI/bge-base-en-v1.5 model
m2v_model = distill(model_name="BAAI/bge-base-en-v1.5", pca_dims=256)
# Save the model
m2v_model.save_pretrained("m2v_model")
Model Details
- Base Model: Qwen/Qwen3-Embedding-0.6B
- Distillation Method: Model2Vec with PCA (1024 dimensions)
- Embedding Dimension: 1024
- Languages: Multilingual
- Model Size: ~302 MB
Use Cases
- Semantic search and retrieval
- Document similarity
- Text classification (via kNN)
- Clustering
- RAG (Retrieval Augmented Generation) pipelines
- Real-time applications requiring ultra-low latency
Limitations
- Static embeddings don't capture context as well as transformer models
- Lower quality than full Qwen3-Embedding (~48% vs ~65% on STS benchmarks)
- Best suited for applications where speed is critical
How It Works
Model2Vec distills a Sentence Transformer by:
- Passing vocabulary through the base model (Qwen3-Embedding-0.6B)
- Reducing dimensionality with PCA (to 1024)
- Applying SIF weighting
- During inference: mean pooling of token embeddings
This results in embeddings that are 500x faster with only moderate quality loss.
Additional Resources
Citation
@article{minishlab2024model2vec,
author = {Tulkens, Stephan and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
url = {https://github.com/MinishLab/model2vec}
}
@misc{qwen3embedding,
title={Qwen3-Embedding},
author={Alibaba Cloud},
year={2024},
publisher={Hugging Face}
}
License
MIT License - Same as base Qwen3-Embedding model.
Created by: The Seed Ship / Deposium Project
- Downloads last month
- 13