-
Notifications
You must be signed in to change notification settings - Fork 46
Add XCCL Backend Support for Intel GPU in TorchComms #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add XCCL Backend Support for Intel GPU in TorchComms #52
Conversation
|
Hi @zhangxiaoli73! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
This reverts commit 2ce3df2.
e1fcfee to
d1ee478
Compare
d1ee478 to
a19186e
Compare
805be37 to
3fd9dfc
Compare
Expand XPU memory info with free memory query: * Check for sycl::aspect::ext_intel_free_memory capability * Use Intel SYCL extension to query actual free memory when available * Fall back to total memory with warning when extension unsupported * Add TorchCommLogging.hpp include for warning messages This provides more accurate memory reporting on Intel XPU devices that support the free memory extension, while maintaining backward compatibility with devices that don't.
Move the free memory extension warning from memGetInfo() to getDeviceProperties() to provide early notification during device initialization: * Add Intel free memory extension check in getDeviceProperties() * Remove duplicate warning from memGetInfo() to avoid log spam * Add device properties verification call in TorchCommXCCL::init() This ensures users are warned once during device setup rather than repeatedly during memory queries, while maintaining the same functionality and error handling.
Add [[maybe_unused]] attribute to device_prop variable in TorchCommXCCL::init() to prevent compiler warnings since it is only used for device properties validation during initialization but is never accessed.
Add [[likely]] and [[unlikely]] attributes to optimize branch prediction for Intel SYCL free memory extension checks: * Mark extension availability as [[likely]] in memGetInfo() * Mark extension unavailability as [[unlikely]] in getDeviceProperties() and memGetInfo() This optimizes the common case where Intel XPU devices support the free memory extension, improving performance in the hot path while maintaining compatibility with devices that lack the extension.
Wrap XPU memory free operation in try-catch block within the TorchCommXCCLBootstrap destructor: * Catch exceptions during barrier buffer deallocation * Log errors instead of allowing exceptions to escape * Ensure safe object destruction even if memory freeing fails This prevents potential program termination due to throwing exceptions from a destructor, which is undefined behavior in C++ during stack unwinding.
The AllReduce.py unittest case fails when empty tensor is provided as input. This patch adds the missing check and return without a crash.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Add Intel XPU to prerequisites - Document XCCL backend setup with Intel oneAPI CCL environment - Add USE_XCCL flag to build configuration options
Implement getMemAllocator() in TorchCommXCCL as a placeholder that throws a runtime error. This satisfies the interface requirement from TorchCommBackend.
Replace hardcoded CUDA device detection with PyTorch's accelerator API for better hardware abstraction: * Use torch.accelerator.current_accelerator() for device detection * Improve variable naming (device -> device_str) for clarity * Add return type annotation for better type safety * Fallback to CPU when no accelerator is available or device_count is 0
3fd9dfc to
f7f00c1
Compare
|
Hi, @d4l3k This is our initial PR to integrate XCCL backend into TorchComms. Could you please help review? cc. @pkourdis @siju-samuel @newtdms |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Motivation:
As design illustrated in RFC #51, we would like to add XCCL (Intel GPU Collective Communications Library) backend support to TorchComms.
In this PR, we will enable
allreducein the XCCL backend as an entry point. Full collectives support will come later.PR explanation:
Core Files
TorchCommXCCL.cpp/.hppTorchWorkXCCL.cpp/.hppTorchWorkXCCLQueue.cppTorchCommXCCLBootstrap.cpp/.hppTorchCommXCCLUtils.cpp/.hppTorchCommXCCLPy.cppAPI Abstraction Layers
XcclApi.cpp/.hppdevice/XpuApi.cpp/.hppBuild System
xccl/CMakeLists.txtsetup.pyExamples: