speed all-MiniLM-L12-v2
#18
by
kalle07
- opened
extremely slow in comparison to nomic v1.5 and roberta(cross-en-de-es-roberta-sentence-transformer) aprox 10 times
Hello!
This is a bit surprising to me. Are you sure that you're on GPU in all cases or on CPU in all cases? I wrote this quick script to compare:
import time
from sentence_transformers import SentenceTransformer
from datasets import load_dataset
from pprint import pprint
model_names = [
"all-MiniLM-L12-v2",
"T-Systems-onsite/cross-en-de-es-roberta-sentence-transformer",
"nomic-ai/nomic-embed-text-v1.5",
]
# load questions from natural-questions dataset
dataset = load_dataset("sentence-transformers/natural-questions", split="train")
docs = dataset["answer"][:1_000]
results = {}
for model_name in model_names:
# NOTE: only use bfloat16 with GPUs. If your GPU doesn't support bfloat16, use .half() for fp16
# Or if you're on CPU, remove the .bfloat16() call
model = SentenceTransformer(model_name, trust_remote_code=True).bfloat16()
print(f"Loading model {model_name} on {model.device} with {model.dtype}")
results[model_name] = {}
for batch_size in [16, 32, 64, 128, 256]:
texts_per_second = []
for _ in range(5):
start_time = time.time()
model.encode(docs, batch_size=batch_size, convert_to_tensor=True)
end_time = time.time()
texts_per_second.append(len(docs) / (end_time - start_time))
average_texts_per_second = sum(texts_per_second) / len(texts_per_second)
print(f"Average texts per second for bfloat16 and batch_size={batch_size}: {average_texts_per_second}")
results[model_name][batch_size] = average_texts_per_second
pprint(results)
Which results in:
Loading model all-MiniLM-L12-v2 on cuda:0 with torch.bfloat16
Average texts per second for bfloat16 and batch_size=16: 1363.3089283906697
Average texts per second for bfloat16 and batch_size=32: 2241.947166317702
Average texts per second for bfloat16 and batch_size=64: 2954.5304119462057
Average texts per second for bfloat16 and batch_size=128: 3717.7650770828236
Average texts per second for bfloat16 and batch_size=256: 4360.094024126143
No sentence-transformers model found with name T-Systems-onsite/cross-en-de-es-roberta-sentence-transformer. Creating a new one with mean pooling.
Loading model T-Systems-onsite/cross-en-de-es-roberta-sentence-transformer on cuda:0 with torch.bfloat16
Average texts per second for bfloat16 and batch_size=16: 806.5823234444125
Average texts per second for bfloat16 and batch_size=32: 1045.9944091642888
Average texts per second for bfloat16 and batch_size=64: 1211.9272079490079
Average texts per second for bfloat16 and batch_size=128: 1195.3808822219091
Average texts per second for bfloat16 and batch_size=256: 1042.000349024391
<All keys matched successfully>
Loading model nomic-ai/nomic-embed-text-v1.5 on cuda:0 with torch.bfloat16
Average texts per second for bfloat16 and batch_size=16: 551.9840145552902
Average texts per second for bfloat16 and batch_size=32: 679.5504971126851
Average texts per second for bfloat16 and batch_size=64: 778.7069051673566
Average texts per second for bfloat16 and batch_size=128: 762.8170006579953
Average texts per second for bfloat16 and batch_size=256: 643.7683778506613
On my GPU, this model is a lot faster than the others. If I use CPU only (with docs = dataset["answer"][:100],model = SentenceTransformer(model_name, trust_remote_code=True, device="cpu"), and for batch_size in [16, 32]:), then I get:
Loading model all-MiniLM-L12-v2 on cpu with torch.float32
Average texts per second for bfloat16 and batch_size=16: 121.13928721384814
Average texts per second for bfloat16 and batch_size=32: 114.38789027022051
No sentence-transformers model found with name T-Systems-onsite/cross-en-de-es-roberta-sentence-transformer. Creating a new one with mean pooling.
Loading model T-Systems-onsite/cross-en-de-es-roberta-sentence-transformer on cpu with torch.float32
Average texts per second for bfloat16 and batch_size=16: 20.074944670292656
Average texts per second for bfloat16 and batch_size=32: 14.775895888226259
<All keys matched successfully>
Loading model nomic-ai/nomic-embed-text-v1.5 on cpu with torch.float32
Average texts per second for bfloat16 and batch_size=16: 13.40306013813625
Average texts per second for bfloat16 and batch_size=32: 8.918299134217868
- Tom Aarsen