Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use --offload-compress linker option to compress offload sections #1961

Merged
merged 3 commits into from
Jan 14, 2025

Conversation

oleksandr-pavlyk
Copy link
Collaborator

See https://www.intel.com/content/www/us/en/developer/articles/technical/sycl-compilation-device-image-compression.html

It is applicable for any SYCL targets. This change results in 28.4% reduction in shared objects sizes with offload sections on Linux.

(* in base branch *)
(dev_dpctl) :~/repos/dpctl$ ls -l dpctl/tensor/_tensor*_impl*
-rw-r--r-- 1 me myself 36403864 Jan  9 16:32 dpctl/tensor/_tensor_accumulation_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 38666520 Jan  9 16:32 dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 60695704 Jan  9 16:32 dpctl/tensor/_tensor_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 16431464 Jan  9 16:32 dpctl/tensor/_tensor_linalg_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 55497816 Jan  9 16:32 dpctl/tensor/_tensor_reductions_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 49789576 Jan  9 16:32 dpctl/tensor/_tensor_sorting_impl.cpython-312-x86_64-linux-gnu.so
(* this PR *)
(dev_dpctl) :~/repos/dpctl$ ls -l dpctl/tensor/_tensor*_impl*
-rw-r--r-- 1 me myself 19480664 Jan 10 12:05 dpctl/tensor/_tensor_accumulation_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 21665576 Jan 10 12:05 dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 35233512 Jan 10 12:05 dpctl/tensor/_tensor_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 10868840 Jan 10 12:05 dpctl/tensor/_tensor_linalg_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 33741656 Jan 10 12:05 dpctl/tensor/_tensor_reductions_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 me myself 31597128 Jan 10 12:05 dpctl/tensor/_tensor_sorting_impl.cpython-312-x86_64-linux-gnu.so
Python 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.24.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: sum([36403864, 38666520, 60695704, 16431464, 55497816, 49789576])
Out[1]: 257484944

In [2]: sum([19480664, 21665576, 35233512, 10868840, 33741656, 31597128, 31931848])
Out[2]: 184519224

In [3]: sum([19480664, 21665576, 35233512, 10868840, 33741656, 31597128])
Out[3]: 152587376

In [4]: Out[3]/Out[2], Out[3]/Out[1]
Out[4]: (0.8269456845320355, 0.592606983653382)

In [5]: (1 - x for x in _)
Out[5]: <generator object <genexpr> at 0x7f0153f23ac0>

In [6]: list(_)
Out[6]: [0.17305431546796446, 0.40739301634661795]

In [7]: quit
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

Copy link

github-actions bot commented Jan 10, 2025

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_426 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@coveralls
Copy link
Collaborator

coveralls commented Jan 10, 2025

Coverage Status

coverage: 87.715%. remained the same
when pulling 8b7a79b on use-offload-compression
into c354cd8 on master.

See https://www.intel.com/content/www/us/en/developer/articles/technical/sycl-compilation-device-image-compression.html

It is applicable for any SYCL targets. This change results in 28.4% reduction in
shared objects sizes with offload sections on Linux.
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_426 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@oleksandr-pavlyk
Copy link
Collaborator Author

The story of using --offload-compress is a little complicated. Offload shared objects did shrink, saving about 17% of the installation footprint:

(base) opavlyk@opavlyk-mobl:~/tmp/dpctl_cmp$ ls -l zipped/lib/python3.13/site-packages/dpctl/tensor/*.so
-rw-r--r-- 1 opavlyk opavlyk   346264 Jan 10 17:30 zipped/lib/python3.13/site-packages/dpctl/tensor/_dlpack.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk   217944 Jan 10 17:30 zipped/lib/python3.13/site-packages/dpctl/tensor/_flags.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 18877104 Jan 10 17:30 zipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_accumulation_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 21721072 Jan 10 17:30 zipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_elementwise_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 35136368 Jan 10 17:30 zipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 10701104 Jan 10 17:30 zipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_linalg_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 33171680 Jan 10 17:30 zipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_reductions_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 30454176 Jan 10 17:30 zipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_sorting_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk   684064 Jan 10 17:30 zipped/lib/python3.13/site-packages/dpctl/tensor/_usmarray.cpython-313-x86_64-linux-gnu.so
(base) opavlyk@opavlyk-mobl:~/tmp/dpctl_cmp$ ls -l unzipped/lib/python3.13/site-packages/dpctl/tensor/*.so
-rw-r--r-- 1 opavlyk opavlyk   346264 Jan 10 12:26 unzipped/lib/python3.13/site-packages/dpctl/tensor/_dlpack.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk   217944 Jan 10 12:26 unzipped/lib/python3.13/site-packages/dpctl/tensor/_flags.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 35800304 Jan 10 12:26 unzipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_accumulation_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 38722000 Jan 10 12:26 unzipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_elementwise_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 60598544 Jan 10 12:26 unzipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 16263712 Jan 10 12:26 unzipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_linalg_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 55268608 Jan 10 12:26 unzipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_reductions_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 48646624 Jan 10 12:26 unzipped/lib/python3.13/site-packages/dpctl/tensor/_tensor_sorting_impl.cpython-313-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk   684064 Jan 10 12:26 unzipped/lib/python3.13/site-packages/dpctl/tensor/_usmarray.cpython-313-x86_64-linux-gnu.so

but the downloadable conda tar-ball ballooned by a factor of 2:

(base) opavlyk@opavlyk-mobl:~/tmp/dpctl_cmp$ ls -l zipped/dpctl-0.19.0dev0-py313h93fe807_426.tar.bz2 unzipped/dpctl-0.19.0dev0-py313h93fe807_427.tar.bz2
-rw-r--r-- 1 opavlyk opavlyk 25613868 Jan 10 18:26 unzipped/dpctl-0.19.0dev0-py313h93fe807_427.tar.bz2
-rw-r--r-- 1 opavlyk opavlyk 52435566 Jan 10  2025 zipped/dpctl-0.19.0dev0-py313h93fe807_426.tar.bz2

@oleksandr-pavlyk
Copy link
Collaborator Author

Since the tar-ball balloons in size 2x, I am inclined to not use the --offload-compress by default. I would like to keep supporting this option in CMakeLists script, however.

If -DDPCTL_OFFLOAD_COMPRESS=ON is used, DPC++ link-time option
`--offload-compress` is used to compress offload sections.

The option is OFF by default.
@oleksandr-pavlyk
Copy link
Collaborator Author

oleksandr-pavlyk commented Jan 13, 2025

I pushed the change that makes use of --offload-section option optional (off by default).

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_434 ran successfully.
Passed: 893
Failed: 3
Skipped: 118

@ndgrigorian
Copy link
Collaborator

ndgrigorian commented Jan 14, 2025

I pushed the change that makes use of --offload-section option optional (off by default).

Can/should a comment be added, or otherwise a line of documentation, that notes the reason why a user would want to use this option (i.e., shrinking .so size when building locally)?

```
$ cmake . -LAH -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx | grep -a2 DPCTL_OFFLOAD_COMPRESS

// Build using offload section compression feature of DPC++ to reduce size of shared object with offloading sections
DPCTL_OFFLOAD_COMPRESS:BOOL=OFF

// Build DPCTL to target CUDA devices
```
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_435 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a specific reason the change of DPCTL_TARGET_HIP to an option was reversed?

In any case this LGTM

@oleksandr-pavlyk
Copy link
Collaborator Author

@ndgrigorian options are meant to be boolean only. Change to option broken the build, so had to be reverted

@oleksandr-pavlyk oleksandr-pavlyk merged commit 598082f into master Jan 14, 2025
63 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the use-offload-compression branch January 14, 2025 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants