Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set cpu affinity and membind for better oob performance #853

Merged
merged 19 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions docker/Dockerfile.intel
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
libpng-dev \
python3 \
python3-pip \
python3-dev \
libnuma-dev \
&& rm -rf /var/lib/apt/lists/*"
RUN /usr/sbin/update-ccache-symlinks
RUN mkdir /opt/ccache && ccache --set-config=cache_dir=/opt/ccache
Expand All @@ -43,12 +45,13 @@ RUN python3 -m pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} \
-f https://download.pytorch.org/whl/torch_stable.html && \
python3 -m pip install intel-extension-for-pytorch==$IPEX_VERSION && \
python3 -m pip install oneccl_bind_pt --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
python3 -m pip install oneccl_bind_pt --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ && \
python3 -m pip install --no-cache-dir numa

ARG OMP_NUM_THREADS=1
ENV OMP_NUM_THREADS=${OMP_NUM_THREADS}
ARG KMP_BLOCKTIME=1
ENV KMP_BLOCKTIME=${KMP_BLOCKTIME}
ARG KMP_HW_SUBSET=1T
ENV KMP_HW_SUBSET=${KMP_HW_SUBSET}
ENV LD_PRELOAD="/usr/local/lib/libiomp5.so /usr/lib/x86_64-linux-gnu/libtcmalloc.so"
ENV LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtcmalloc.so"
28 changes: 27 additions & 1 deletion optimum/intel/ipex/modeling_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@
from ..utils.import_utils import is_ipex_version, is_torch_version, is_transformers_version
from ..utils.modeling_utils import MULTI_QUERY_ATTN_MODELS, recursive_to_device


logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -129,6 +128,21 @@ def ipex_jit_trace(model, task, use_cache):

return trace_model

def get_int_from_env(env_keys, default):
"""Returns the first positive env value found in the `env_keys` list or the default."""
for e in env_keys:
val = int(os.environ.get(e, -1))
if val >= 0:
return val
return default

def get_number_of_sockets():
sockets = set()
with open('/proc/cpuinfo') as f:
for line in f:
if line.startswith('physical id'):
sockets.add(line.strip().split()[-1])
return len(sockets)

class IPEXModel(OptimizedModel):
auto_model_class = AutoModel
Expand All @@ -153,6 +167,18 @@ def __init__(
else:
self._device = torch.device("cpu")

import numa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should add check if the numa package is available

import psutil
n_sockets=get_number_of_sockets()
num_cpu_threads_per_process = int(psutil.cpu_count(logical=False) / n_sockets)
os.environ["OMP_NUM_THREADS"]=str(num_cpu_threads_per_process)
torch.set_num_threads(num_cpu_threads_per_process)
numa.set_affinity(0,range(num_cpu_threads_per_process))
numa.set_membind([0])
print("affinity", numa.get_affinity(0))
print("membind", numa.get_membind())

Copy link
Collaborator

@sywangyi sywangyi Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also ,if the OMP_NUM_THREADS, membind is already set by external user. should not override the user's configuration.

also Tensor Parallel case should be considered as well.


# CPU only support jit model for now.
if export:
if isinstance(model, torch.jit.RecursiveScriptModule):
Expand Down