Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice to encode huge dataset #2925

Open
shizhediao opened this issue Sep 9, 2024 · 9 comments
Open

Best practice to encode huge dataset #2925

shizhediao opened this issue Sep 9, 2024 · 9 comments

Comments

@shizhediao
Copy link

Hi,

I was wondering are there any recommended practices to encode a super huge dataset? Let's say 10M data samples.
How to accelerate the encoding process and spread the encoding process to multi-nodes(perhaps)?

@pesuchin
Copy link
Contributor

pesuchin commented Sep 9, 2024

Hi!
I believe SentenceTransformers doesn't have functionality to distribute across multiple nodes, so it might be good to use a framework that can perform distributed parallel processing, such as Ray.
For parallel processing within a single machine, the following documentation may be helpful:
https://sbert.net/examples/applications/computing-embeddings/README.html#multi-process-multi-gpu-encoding

@shizhediao
Copy link
Author

Thank you very much! In addition to parallel processing, are there any other strategies that could accelerate the encoding process?

Thanks!

@pesuchin
Copy link
Contributor

pesuchin commented Sep 9, 2024

I think there are various approaches, but speeding up with torch.compile came to mind immediately.
For torch.compile, the following documentation may be helpful:
#2755

@shizhediao
Copy link
Author

Thank you! I will give torch.compile a try!

@hh23485
Copy link

hh23485 commented Sep 11, 2024

Hi @shizhediao , can you share your experience with torch.compile, is this speed up your inference?
In my case, here shows no difference.

@shizhediao
Copy link
Author

Hi same to me!
I also did not observe any difference after using torch.compile.

@pesuchin
Copy link
Contributor

pesuchin commented Sep 11, 2024

I'm not sure what kind of comparison was conducted, but one possibility that comes to mind is this:
When comparing cases with and without torch.compile, if both were executed using TorchScript, you might achieve an equivalent speedup effect to that of torch.compile. So, if you're using TorchScript, there might not be a difference in execution speed.

Sorry, the above was a confusion between TorchScript and torchrun and a mistake, so forget it.

I'm not sure if it works with sentence_transformers as I haven't tried it, but using TensorRT can sometimes increase speed even more than torch.compile, so trying it out might be a viable option to consider.

@pesuchin
Copy link
Contributor

I ran the following code on the CPU for verification, and indeed, using torch.compile with the default settings did not improve much, as you say.
However, although a slight improvement, passing mode=“reduce-overhead” as an argument improved the average encoding speed from 6.22 seconds to 6.09 seconds in my environment.
It may not be very useful, but try it.

from sentence_transformers import SentenceTransformer
import torch
# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2", model_kwargs={"torch_dtype": torch.float16})

# The sentences to encode
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
] * 10000

# 2. Calculate embeddings by calling model.encode()
import time
from statistics import mean
cpu_without_torch_compile_seconds = []
for i in range(5):
    start = time.perf_counter()
    embeddings = model.encode(sentences)
    end = time.perf_counter()
    cpu_without_torch_compile_seconds.append(end-start)
print(f'avg without torch.compile: {mean(cpu_without_torch_compile_seconds):.2f} sec')

model = torch.compile(model, mode="reduce-overhead")
cpu_with_torch_compile_seconds = []
for i in range(5):
    start = time.perf_counter()
    embeddings = model.encode(sentences)
    end = time.perf_counter()
    cpu_with_torch_compile_seconds.append(end-start)

print(f'avg with torch.compile and reduce-overhead: {mean(cpu_with_torch_compile_seconds):.2f} sec')

model = SentenceTransformer("all-MiniLM-L6-v2", model_kwargs={"torch_dtype": torch.float16})
model = torch.compile(model)
cpu_with_torch_compile_seconds = []
for i in range(5):
    start = time.perf_counter()
    embeddings = model.encode(sentences)
    end = time.perf_counter()
    cpu_with_torch_compile_seconds.append(end-start)

print(f'avg with torch.compile: {mean(cpu_with_torch_compile_seconds):.2f} sec')

output:

avg without torch.compile: 6.22 sec
avg with torch.compile and reduce-overhead: 6.09 sec
avg with torch.compile: 6.31 sec

[Environment]
CPU: Apple M3
Memory size: 64 GB
OS: Sonoma 14.5

@shizhediao
Copy link
Author

Hi,
Thank you for your suggestion! I will definitely have a try. Currently I find that I need to use a smaller model which helps a lot in reducing time cost :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants