Limiting CPU Usage and Optimizing Offline Encoding with Sentence Transformer #2948

hh23485 · 2024-09-21T03:08:26Z

Hi Everyone,

First of all, thank you for the amazing work on this framework! I’m currently using the Sentence Transformer with the BGE Small EN model for sentence encoding, but I’ve encountered an issue on my server.

My server has 8 CPUs, and the transformer seems to always utilize all of them. However, there are multiple tasks running simultaneously on the server, so I would like to limit the CPU usage to just 2 cores to avoid impacting other tasks.

I’ve attempted the following settings, but they don’t seem to have the desired effect:

# worker_number = 2
torch.set_num_threads(worker_number)
torch.set_num_interop_threads(worker_number)
os.environ["MKLDNN"] = "1"
os.environ["DNNL"] = "1"
os.environ["OMP_NUM_THREADS"] = f"{worker_number}"
os.environ["MKL_NUM_THREADS"] = f"{worker_number}"
os.environ["OPENBLAS_NUM_THREADS"] = f"{worker_number}"

Could anyone provide guidance on how to effectively limit the model to use only 2 CPUs? Additionally, I would appreciate any advice on optimizing offline inference performance using Sentence Transformers on CPUs—what’s the fastest way to achieve this with just 2 cores?

Thanks in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limiting CPU Usage and Optimizing Offline Encoding with Sentence Transformer #2948

Limiting CPU Usage and Optimizing Offline Encoding with Sentence Transformer #2948

hh23485 commented Sep 21, 2024 •

edited

Loading

Limiting CPU Usage and Optimizing Offline Encoding with Sentence Transformer #2948

Limiting CPU Usage and Optimizing Offline Encoding with Sentence Transformer #2948

Comments

hh23485 commented Sep 21, 2024 • edited Loading

hh23485 commented Sep 21, 2024 •

edited

Loading