M2V-Qwen3-Embedding-0.6B-1024d

A high-performance Model2Vec distilled embedding model based on Qwen/Qwen3-Embedding-0.6B.

Key Features:

Ultra-fast inference (<1ms for 4 sentences on CPU)
1024-dimensional embeddings
Multilingual support (inherited from Qwen3-Embedding)
~302MB model size

MTEB Benchmark Results

Task Category	Score
STS (Semantic Similarity)	0.4845
- STSBenchmark	0.4215
- SICK-R	0.5475
Classification (kNN)	0.5949
- Banking77	0.7402
- Emotion	0.4496
Clustering	0.1810
- TwentyNewsgroups	0.1810
Overall MTEB	0.4202

Comparison with Other Models

Model	STS	Classification	Size	Latency
M2V-BGE-M3-1024d	0.5831	0.6564	499 MB	<1ms
M2V-Qwen3-0.6B-1024d	0.4845	0.5949	302 MB	<1ms
POTION-base-8M	~0.52	~0.55	30 MB	<1ms

Installation

Install model2vec using pip:

pip install model2vec

Usage

Using Model2Vec

The Model2Vec library is the fastest and most lightweight way to run Model2Vec models.

Load this model using the from_pretrained method:

from model2vec import StaticModel

# Load a pretrained Model2Vec model
model = StaticModel.from_pretrained("m2v-qwen3-emb-0.6b-1024d")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Using Sentence Transformers

You can also use the Sentence Transformers library to load and use the model:

from sentence_transformers import SentenceTransformer

# Load a pretrained Sentence Transformer model
model = SentenceTransformer("m2v-qwen3-emb-0.6b-1024d")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Distilling a Model2Vec model

You can distill a Model2Vec model from a Sentence Transformer model using the distill method. First, install the distill extra with pip install model2vec[distill]. Then, run the following code:

from model2vec.distill import distill

# Distill a Sentence Transformer model, in this case the BAAI/bge-base-en-v1.5 model
m2v_model = distill(model_name="BAAI/bge-base-en-v1.5", pca_dims=256)

# Save the model
m2v_model.save_pretrained("m2v_model")

Model Details

Base Model: Qwen/Qwen3-Embedding-0.6B
Distillation Method: Model2Vec with PCA (1024 dimensions)
Embedding Dimension: 1024
Languages: Multilingual
Model Size: ~302 MB

Use Cases

Semantic search and retrieval
Document similarity
Text classification (via kNN)
Clustering
RAG (Retrieval Augmented Generation) pipelines
Real-time applications requiring ultra-low latency

Limitations

Static embeddings don't capture context as well as transformer models
Lower quality than full Qwen3-Embedding (~48% vs ~65% on STS benchmarks)
Best suited for applications where speed is critical

How It Works

Model2Vec distills a Sentence Transformer by:

Passing vocabulary through the base model (Qwen3-Embedding-0.6B)
Reducing dimensionality with PCA (to 1024)
Applying SIF weighting
During inference: mean pooling of token embeddings

This results in embeddings that are 500x faster with only moderate quality loss.

Additional Resources

Citation

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}

@misc{qwen3embedding,
  title={Qwen3-Embedding},
  author={Alibaba Cloud},
  year={2024},
  publisher={Hugging Face}
}

License

MIT License - Same as base Qwen3-Embedding model.

Created by: The Seed Ship / Deposium Project

Downloads last month: 13

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tss-deposium/m2v-qwen3-embedding-0.6b-1024d

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Finetuned

(84)

this model