-
Notifications
You must be signed in to change notification settings - Fork 0
HABU - Highly Asynchronous Buffered Update Library
License
PE-Cray/habu
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
// // // // // // // // // // // // // // // // // // // // // // // // // // // // // // // // Copyright 2021 Hewlett Packard Enterprise // // MIT License // Permission is hereby granted, free of charge, to any person obtaining a copy of this software // and associated documentation files (the "Software"), to deal in the Software without restriction, // including without limitation the rights to use, copy, modify, merge, publish, distribute, // sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is // furnished to do so, subject to the following conditions: // // The above copyright notice and this permission notice shall be included in all copies or // substantial portions of the Software. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT // NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND // NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, // DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. // // author: Nathan Wichmann (wichmann@hpe.com) // // // // // // // // // // // // // // // // // // // // // // // // // // // // // // // The Heimdallr benchmark was created to test the performance of small GLOBAL references and updates using different operations, including atomics, gets and puts via the shmem library. Heimdallr is meant to approximate the reference patterns and operations common in customers that are important to HPE, even if these reference patterns and operations are not common in HPC. Random reference patterns are generated by assuming a power-of-two sized local table and a arbitrary number of PEs. A index array with random integers is contructed in such manner as allow the offset into a table to be a simple bitwise AND and the PE to be a simple shift. Heimdallr also exercises and test the habu library. This library aggregates messages, transfers them to a remote PE, and executes the update. Each test is paired with a shmem version and the results are tested for correctness when possible. If the correctness test determines the results match with 100% accuracy a "PASSED" message is given. If results match with and error rate of <1% then a PASSED is still given but the error rate is printed out. If an error rate of >1% is detected then the test is assumed to have FAILED. The 1% threshold is somewhat arbitrary. Some tests are expected to have an error rate, but that error rate is expected to be low or very low. On the other hand, when real errors happen they often cause error rates >>1%. There are three groups of tests; Atomics, Gets, and Puts. Atomics are only single word references. Gets and puts can be single word or relatively small blocks. Heimdallr sweeps through various block sizes so one can examine the performance of gets and puts when using those block sizes. There are also tests to examine the impact of where the data resides localling. The first group of tests have only random atomic references. The first test is an atomic increment that does not return any data to the processor that is initiating the operation. Other tests include fetching adds, multiple different atomics in the same iteration, and a function set up to test the recursive capabilities of habu. While atomic adds, both fetching and non-fetching, are important by themselves, these atomics are also represent many other atomics such at bitwise, min, max, and more. The second group focuses on gets, testing both blocking gets and non-blocking gets. The non-blocking gets also test 3 different methods for allocating the destination arrays; symmetric heap, heap, and stack. Testing these allocation methods are meant to inform the users as well as the developers. The third group focuses on puts, testing both blocking puts and non-blocking puts. The non-blocking puts also tests 3 different methods for allocating the source arrays; symmetric heap, heap, and stack. Testing these allocation methods are ment to inform the users as well as the developers. There is an additional subgroup that examines put-with-signal operations. The first of this subgroup is blocking, the second is non-blocking, while the third is how a user might implement a put-with-signal functionality if the the shmem implementation did not have a put-with-signal function. This benchmark is meant to be run any number of core and nodes. One can choose to run with 1 shmem PE per core, or more than one core per PE with OpenMP threading underneath. This benchmark utilizes thread-hot shmem and can be used to examine any performance differences between shmem only and shmem with threading. Heimdallr options: -t <value> Table size (as log base 2) per PE. For example, if value is 26 the size of the table on each PE will be 2^26*8 = 0.5 GiB. Default is 25. -n <value> Nupdates (as log base 2) per PE. Default is 23. -r <value> Number of Repeats completed within a timed test. Default is 4. -T <name> Name of single test to be executed. If <name> = ALL then all of the test will run. If <name> matchs the first srtlen(name) characters of tests as printed out during an ALL test then that test will run. There can be multiple -T options provided on and run. Default is ALL. -w <value> Min Message size (in number of 8B words). Default is 1. -W <value> Max Message size (in number of 8B words). Default is 4096. -G <value> Message size growth factor (integer). If <value> is 0 (zero) then the message size will increase by a rate of square root of 2. If value>1 then the message rate will increase by that rate ever iteration. If the value is less negative, then the message size will increase by 32 bytes ever iteration until the message size reaches a size of abs(value) and after that it will revert to a growth as if value was 0 (zero). -h Display option information. The total memory footprint per rank will be the table size plus r times the number of updates times 8 bytes. There are 4 arrays that are the size of the number of updates, the index array as well as the sym heap, heap, and stack versions of the local arrays. The program simple replicates the full table and the space for the index and local arrays on each PE. The spirit of the benchmark is that when running using all of the CPUs in a NUMA domain that the size of the table consumes significantly more space that is available in the last level cache. Finally, Heimdallr is meant to be a vehicle for discussion as much as static implementation. Benchmarkers should feel free to discuss the results and various implementations with the evaluators rather than just report the final numbers.
About
HABU - Highly Asynchronous Buffered Update Library
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published