Best practice to encode huge dataset #2925

shizhediao · 2024-09-09T05:07:25Z

Hi,

I was wondering are there any recommended practices to encode a super huge dataset? Let's say 10M data samples.
How to accelerate the encoding process and spread the encoding process to multi-nodes(perhaps)?

pesuchin · 2024-09-09T05:50:09Z

Hi!
I believe SentenceTransformers doesn't have functionality to distribute across multiple nodes, so it might be good to use a framework that can perform distributed parallel processing, such as Ray.
For parallel processing within a single machine, the following documentation may be helpful:
https://sbert.net/examples/applications/computing-embeddings/README.html#multi-process-multi-gpu-encoding

shizhediao · 2024-09-09T05:53:37Z

Thank you very much! In addition to parallel processing, are there any other strategies that could accelerate the encoding process?

Thanks!

pesuchin · 2024-09-09T06:02:40Z

I think there are various approaches, but speeding up with torch.compile came to mind immediately.
For torch.compile, the following documentation may be helpful:
#2755

shizhediao · 2024-09-09T06:06:02Z

Thank you! I will give torch.compile a try!

hh23485 · 2024-09-11T12:12:49Z

Hi @shizhediao , can you share your experience with torch.compile, is this speed up your inference?
In my case, here shows no difference.

shizhediao · 2024-09-11T16:11:30Z

Hi same to me!
I also did not observe any difference after using torch.compile.

pesuchin · 2024-09-11T16:23:22Z

I'm not sure what kind of comparison was conducted, but one possibility that comes to mind is this:
When comparing cases with and without torch.compile, if both were executed using TorchScript, you might achieve an equivalent speedup effect to that of torch.compile. So, if you're using TorchScript, there might not be a difference in execution speed.

Sorry, the above was a confusion between TorchScript and torchrun and a mistake, so forget it.

I'm not sure if it works with sentence_transformers as I haven't tried it, but using TensorRT can sometimes increase speed even more than torch.compile, so trying it out might be a viable option to consider.

pesuchin · 2024-09-13T09:43:26Z

I ran the following code on the CPU for verification, and indeed, using torch.compile with the default settings did not improve much, as you say.
However, although a slight improvement, passing mode=“reduce-overhead” as an argument improved the average encoding speed from 6.22 seconds to 6.09 seconds in my environment.
It may not be very useful, but try it.

from sentence_transformers import SentenceTransformer
import torch
# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2", model_kwargs={"torch_dtype": torch.float16})

# The sentences to encode
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
] * 10000

# 2. Calculate embeddings by calling model.encode()
import time
from statistics import mean
cpu_without_torch_compile_seconds = []
for i in range(5):
    start = time.perf_counter()
    embeddings = model.encode(sentences)
    end = time.perf_counter()
    cpu_without_torch_compile_seconds.append(end-start)
print(f'avg without torch.compile: {mean(cpu_without_torch_compile_seconds):.2f} sec')

model = torch.compile(model, mode="reduce-overhead")
cpu_with_torch_compile_seconds = []
for i in range(5):
    start = time.perf_counter()
    embeddings = model.encode(sentences)
    end = time.perf_counter()
    cpu_with_torch_compile_seconds.append(end-start)

print(f'avg with torch.compile and reduce-overhead: {mean(cpu_with_torch_compile_seconds):.2f} sec')

model = SentenceTransformer("all-MiniLM-L6-v2", model_kwargs={"torch_dtype": torch.float16})
model = torch.compile(model)
cpu_with_torch_compile_seconds = []
for i in range(5):
    start = time.perf_counter()
    embeddings = model.encode(sentences)
    end = time.perf_counter()
    cpu_with_torch_compile_seconds.append(end-start)

print(f'avg with torch.compile: {mean(cpu_with_torch_compile_seconds):.2f} sec')

output:

avg without torch.compile: 6.22 sec
avg with torch.compile and reduce-overhead: 6.09 sec
avg with torch.compile: 6.31 sec

[Environment]
CPU: Apple M3
Memory size: 64 GB
OS: Sonoma 14.5

shizhediao · 2024-09-13T13:40:33Z

Hi,
Thank you for your suggestion! I will definitely have a try. Currently I find that I need to use a smaller model which helps a lot in reducing time cost :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practice to encode huge dataset #2925

Best practice to encode huge dataset #2925

shizhediao commented Sep 9, 2024

pesuchin commented Sep 9, 2024

shizhediao commented Sep 9, 2024

pesuchin commented Sep 9, 2024 •

edited

Loading

shizhediao commented Sep 9, 2024

hh23485 commented Sep 11, 2024

shizhediao commented Sep 11, 2024

pesuchin commented Sep 11, 2024 •

edited

Loading

pesuchin commented Sep 13, 2024

shizhediao commented Sep 13, 2024

Best practice to encode huge dataset #2925

Best practice to encode huge dataset #2925

Comments

shizhediao commented Sep 9, 2024

pesuchin commented Sep 9, 2024

shizhediao commented Sep 9, 2024

pesuchin commented Sep 9, 2024 • edited Loading

shizhediao commented Sep 9, 2024

hh23485 commented Sep 11, 2024

shizhediao commented Sep 11, 2024

pesuchin commented Sep 11, 2024 • edited Loading

pesuchin commented Sep 13, 2024

shizhediao commented Sep 13, 2024

pesuchin commented Sep 9, 2024 •

edited

Loading

pesuchin commented Sep 11, 2024 •

edited

Loading