M2V-Qwen3-Embedding-0.6B-1024d

A high-performance Model2Vec distilled embedding model based on Qwen/Qwen3-Embedding-0.6B.

Key Features:

  • Ultra-fast inference (<1ms for 4 sentences on CPU)
  • 1024-dimensional embeddings
  • Multilingual support (inherited from Qwen3-Embedding)
  • ~302MB model size

MTEB Benchmark Results

Task Category Score
STS (Semantic Similarity) 0.4845
- STSBenchmark 0.4215
- SICK-R 0.5475
Classification (kNN) 0.5949
- Banking77 0.7402
- Emotion 0.4496
Clustering 0.1810
- TwentyNewsgroups 0.1810
Overall MTEB 0.4202

Comparison with Other Models

Model STS Classification Size Latency
M2V-BGE-M3-1024d 0.5831 0.6564 499 MB <1ms
M2V-Qwen3-0.6B-1024d 0.4845 0.5949 302 MB <1ms
POTION-base-8M ~0.52 ~0.55 30 MB <1ms

Installation

Install model2vec using pip:

pip install model2vec

Usage

Using Model2Vec

The Model2Vec library is the fastest and most lightweight way to run Model2Vec models.

Load this model using the from_pretrained method:

from model2vec import StaticModel

# Load a pretrained Model2Vec model
model = StaticModel.from_pretrained("m2v-qwen3-emb-0.6b-1024d")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Using Sentence Transformers

You can also use the Sentence Transformers library to load and use the model:

from sentence_transformers import SentenceTransformer

# Load a pretrained Sentence Transformer model
model = SentenceTransformer("m2v-qwen3-emb-0.6b-1024d")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Distilling a Model2Vec model

You can distill a Model2Vec model from a Sentence Transformer model using the distill method. First, install the distill extra with pip install model2vec[distill]. Then, run the following code:

from model2vec.distill import distill

# Distill a Sentence Transformer model, in this case the BAAI/bge-base-en-v1.5 model
m2v_model = distill(model_name="BAAI/bge-base-en-v1.5", pca_dims=256)

# Save the model
m2v_model.save_pretrained("m2v_model")

Model Details

  • Base Model: Qwen/Qwen3-Embedding-0.6B
  • Distillation Method: Model2Vec with PCA (1024 dimensions)
  • Embedding Dimension: 1024
  • Languages: Multilingual
  • Model Size: ~302 MB

Use Cases

  • Semantic search and retrieval
  • Document similarity
  • Text classification (via kNN)
  • Clustering
  • RAG (Retrieval Augmented Generation) pipelines
  • Real-time applications requiring ultra-low latency

Limitations

  • Static embeddings don't capture context as well as transformer models
  • Lower quality than full Qwen3-Embedding (~48% vs ~65% on STS benchmarks)
  • Best suited for applications where speed is critical

How It Works

Model2Vec distills a Sentence Transformer by:

  1. Passing vocabulary through the base model (Qwen3-Embedding-0.6B)
  2. Reducing dimensionality with PCA (to 1024)
  3. Applying SIF weighting
  4. During inference: mean pooling of token embeddings

This results in embeddings that are 500x faster with only moderate quality loss.

Additional Resources

Citation

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}

@misc{qwen3embedding,
  title={Qwen3-Embedding},
  author={Alibaba Cloud},
  year={2024},
  publisher={Hugging Face}
}

License

MIT License - Same as base Qwen3-Embedding model.


Created by: The Seed Ship / Deposium Project

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tss-deposium/m2v-qwen3-embedding-0.6b-1024d

Finetuned
(84)
this model