Skip to content

Conversation

@DDEle
Copy link
Contributor

@DDEle DDEle commented Nov 28, 2025

Proposed changes

ck_tile version of f9bf275 (merged with #2297)

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces multi-threaded random tensor value generation for the ck_tile library, replacing the previous single-threaded or opt-in multi-threaded approach with an always-on deterministic multi-threaded implementation. The changes ensure reproducible results across different thread counts by using a block-based distribution strategy with RNG state management via discard().

Key Changes

  • Refactored FillUniformDistribution to always use multi-threading with deterministic block-based random number generation
  • Added CPU core management utilities (get_available_cpu_cores() and cpu_core_guard) for testing different thread configurations
  • Updated the template parameter to allow type deduction (T = void)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
include/ck_tile/host/fill.hpp Replaced opt-in multi-threading with always-on deterministic block-based multi-threaded filling; changed template parameter to support type deduction
include/ck_tile/host/joinable_thread.hpp Added get_available_cpu_cores() function and cpu_core_guard class for CPU affinity management in tests
test/ck_tile/utility/test_fill.cpp New comprehensive test suite validating deterministic behavior across different sizes and thread counts
test/ck_tile/utility/CMakeLists.txt Registered the new test executable
example/ck_tile/18_flatmm/mxgemm/run_mx_flatmm.inc Updated to use new template syntax with type deduction

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants