-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practice to encode huge dataset #2925
Comments
Hi! |
Thank you very much! In addition to parallel processing, are there any other strategies that could accelerate the encoding process? Thanks! |
I think there are various approaches, but speeding up with torch.compile came to mind immediately. |
Thank you! I will give torch.compile a try! |
Hi @shizhediao , can you share your experience with torch.compile, is this speed up your inference? |
Hi same to me! |
Sorry, the above was a confusion between TorchScript and torchrun and a mistake, so forget it. I'm not sure if it works with sentence_transformers as I haven't tried it, but using TensorRT can sometimes increase speed even more than torch.compile, so trying it out might be a viable option to consider. |
I ran the following code on the CPU for verification, and indeed, using torch.compile with the default settings did not improve much, as you say. from sentence_transformers import SentenceTransformer
import torch
# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2", model_kwargs={"torch_dtype": torch.float16})
# The sentences to encode
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
] * 10000
# 2. Calculate embeddings by calling model.encode()
import time
from statistics import mean
cpu_without_torch_compile_seconds = []
for i in range(5):
start = time.perf_counter()
embeddings = model.encode(sentences)
end = time.perf_counter()
cpu_without_torch_compile_seconds.append(end-start)
print(f'avg without torch.compile: {mean(cpu_without_torch_compile_seconds):.2f} sec')
model = torch.compile(model, mode="reduce-overhead")
cpu_with_torch_compile_seconds = []
for i in range(5):
start = time.perf_counter()
embeddings = model.encode(sentences)
end = time.perf_counter()
cpu_with_torch_compile_seconds.append(end-start)
print(f'avg with torch.compile and reduce-overhead: {mean(cpu_with_torch_compile_seconds):.2f} sec')
model = SentenceTransformer("all-MiniLM-L6-v2", model_kwargs={"torch_dtype": torch.float16})
model = torch.compile(model)
cpu_with_torch_compile_seconds = []
for i in range(5):
start = time.perf_counter()
embeddings = model.encode(sentences)
end = time.perf_counter()
cpu_with_torch_compile_seconds.append(end-start)
print(f'avg with torch.compile: {mean(cpu_with_torch_compile_seconds):.2f} sec') output:
[Environment] |
Hi, |
Hi,
I was wondering are there any recommended practices to encode a super huge dataset? Let's say 10M data samples.
How to accelerate the encoding process and spread the encoding process to multi-nodes(perhaps)?
The text was updated successfully, but these errors were encountered: