Skip to content

Commit

Permalink
Develop stream 2024-09-12 (#462)
Browse files Browse the repository at this point in the history
* Fixed overflow bug for large sizes in thrust::shuffle

* Added definitions of execution space macros

* Add missing overloads for thrust::pow

* Refactors thrust::unique_by_key to use cub::DeviceSelect::UniqueByKey

* Fix a typo in thrust-config.cmake

* Check that thrust::pair is trivially copyable

* Remove double ignore in discard_iterator.h docs

* Replace deprecated _VSTD macro with std

* Update mode example to use thrust::unique_count

* Ensure that thrust fancy iterators are trivially_copy_constructible when possible

* Use checked allocators in CUB catch2 tests

* Refactors thrust::copy_if to use cub::DeviceSelect

* Refactor thrust::[stable_]partition[_copy] to use cub::DevicePartition

* Fix include of <thrust/random.h> with NVC++

* Cleanup diagnostic handling

* Rework config.h

* Bump version to 2.4.0

* Fix issues with ambiguous calls to addressof in thrust::optional

* Try harder to unwrap nested thrust::tuple_of_iterator_references, CUDA backend

* Added missing element from thrust's tuple implementation

* Ensure that we can run reduce_by_key with const inputs

* Leave definitions of __host__ and __device__

This prevents CCCL/thrust's build breakage because of v2.4.0 changes

* Patched up CI because of CCCL2.4.0 tests' build failure

* Updated tests and examples for __host__ __device__ use

* Updated CHANGELOG

* Added operator to transform_reduce benchmark

* Added mem allocator in benchmarks

* Changes for review

* ci: set up sccache

* Added helper functions for choosing between different custom reporter

* Added json and csv custom reporter for benchmarks

* Changes for review

* Added hipstdpar tests

* Relocated our ParallelSTL additions

* Fixed several naming issues

* Added missing unimplemented algorithms

* Split hipstdpar_lib.hpp

* Added relevant information to README and CHANGELOG regarding HIPSTDPAR

* Clarified upstream LLVM offload support

* Emit error when HIPSTDPAR macros are not defined

* Move forwarding calls to rocPRIM to thrust's stubs

* Fix path to hipstdpar impl headers

* Prevent building hipstdpar tests when no compatible libstdc++ is present

* Disable TBB tests build

---------

Co-authored-by: Beatriz Navidad Vilches <beatriz@streamhpc.com>
Co-authored-by: Robin Voetter <robin@streamhpc.com>
  • Loading branch information
3 people authored Nov 20, 2024
1 parent 2695a52 commit bc24ef2
Show file tree
Hide file tree
Showing 710 changed files with 13,550 additions and 13,115 deletions.
114 changes: 79 additions & 35 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Style file for MLSE Libraries based on the modified rocBLAS style

# Common settings
BasedOnStyle: WebKit
TabWidth: 4
IndentWidth: 4
BasedOnStyle: LLVM
TabWidth: 2
IndentWidth: 2
UseTab: Never
ColumnLimit: 100
ColumnLimit: 120

# Other languages JavaScript, Proto

Expand All @@ -20,14 +20,14 @@ Language: Cpp
# void formatted_code_again;

DisableFormat: false
Standard: Cpp11

AccessModifierOffset: -4
Standard: c++14
AccessModifierOffset: -2
AlignAfterOpenBracket: true
AlignConsecutiveAssignments: true
AlignConsecutiveDeclarations: true
AlignEscapedNewlinesLeft: true
AlignOperands: true
AllowAllArgumentsOnNextLine: true
AlignTrailingComments: false
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: false
Expand All @@ -39,13 +39,26 @@ AlwaysBreakAfterDefinitionReturnType: false
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: true
AttributeMacros: [
'THRUST_DEVICE',
'THRUST_FORCEINLINE',
'THRUST_HOST_DEVICE',
'THRUST_HOST',
'_CCCL_DEVICE',
'_CCCL_FORCEINLINE',
'_CCCL_HOST_DEVICE',
'_CCCL_HOST',
'THRUST_RUNTIME_FUNCTION',
'THRUST_DETAIL_KERNEL_ATTRIBUTES',
]
BinPackArguments: false
BinPackParameters: false

# Configure each individual brace in BraceWrapping
BreakBeforeBraces: Custom
# Control of individual brace wrapping cases
BraceWrapping: {
AfterCaseLabel: 'false'
AfterClass: 'true'
AfterControlStatement: 'true'
AfterEnum : 'true'
Expand All @@ -56,52 +69,69 @@ BraceWrapping: {
BeforeCatch : 'true'
BeforeElse : 'true'
IndentBraces : 'false'
# AfterExternBlock : 'true'
SplitEmptyFunction: 'false'
SplitEmptyRecord: 'false'
}

#BreakAfterJavaFieldAnnotations: true
#BreakBeforeInheritanceComma: false
#BreakBeforeBinaryOperators: None
#BreakBeforeTernaryOperators: true
#BreakConstructorInitializersBeforeComma: true
#BreakStringLiterals: true
BreakBeforeConceptDeclarations: true
BreakBeforeBinaryOperators: NonAssignment
BreakBeforeTernaryOperators: true
BreakConstructorInitializers: BeforeComma
BreakInheritanceList: BeforeComma
EmptyLineAfterAccessModifier: Never
EmptyLineBeforeAccessModifier: Always

InsertBraces: true
InsertNewlineAtEOF: true
InsertTrailingCommas: Wrapped
IndentRequires: true
IndentPPDirectives: AfterHash
PackConstructorInitializers: Never
PenaltyBreakAssignment: 30
PenaltyBreakTemplateDeclaration: 0
PenaltyIndentedWhitespace: 2
RemoveSemicolon: false
SpaceAfterLogicalNot: false
SpaceAfterTemplateKeyword: true
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
SpaceBeforeRangeBasedForLoopColon: true


CommentPragmas: '^ IWYU pragma:'
#CompactNamespaces: false
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: false
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
ContinuationIndentWidth: 2
Cpp11BracedListStyle: true
#SpaceBeforeCpp11BracedList: false
DerivePointerAlignment: false
SpaceBeforeCpp11BracedList: false
ExperimentalAutoDetectBinPacking: false
ForEachMacros: [ foreach, Q_FOREACH, BOOST_FOREACH ]
IndentCaseLabels: false
#FixNamespaceComments: true
IndentCaseLabels: true
FixNamespaceComments: true
IndentWrappedFunctionNames: false
KeepEmptyLinesAtTheStartOfBlocks: true
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd: ''
#JavaScriptQuotes: Double
MaxEmptyLinesToKeep: 1
NamespaceIndentation: Inner
NamespaceIndentation: None
ObjCBlockIndentWidth: 4
#ObjCSpaceAfterProperty: true
#ObjCSpaceBeforeProtocolList: true
PenaltyBreakBeforeFirstCallParameter: 19
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000

PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 60
PenaltyBreakBeforeFirstCallParameter: 50
PenaltyBreakComment: 0
PenaltyBreakFirstLessLess: 0
PenaltyBreakString: 70
PenaltyExcessCharacter: 100
PenaltyReturnTypeOnItsOwnLine: 90
PointerAlignment: Left
SpaceAfterCStyleCast: false
SpaceAfterCStyleCast: true
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: Never
SpaceBeforeParens: ControlStatements
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInAngles: Never
SpacesInContainerLiterals: true
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
Expand All @@ -110,11 +140,25 @@ SpacesInSquareBrackets: false
#SpaceBeforeInheritanceColon: true

#SortUsingDeclarations: true
SortIncludes: true
SortIncludes: CaseInsensitive

# Comments are for developers, they should arrange them
ReflowComments: false
ReflowComments: true

#IncludeBlocks: Preserve
#IndentPPDirectives: AfterHash

StatementMacros: [
'THRUST_EXEC_CHECK_DISABLE',
'THRUST_NAMESPACE_BEGIN',
'THRUST_NAMESPACE_END',
'THRUST_EXEC_CHECK_DISABLE',
'CUB_NAMESPACE_BEGIN',
'CUB_NAMESPACE_END',
'THRUST_NAMESPACE_BEGIN',
'THRUST_NAMESPACE_END',
'_LIBCUDACXX_BEGIN_NAMESPACE_STD',
'_LIBCUDACXX_END_NAMESPACE_STD',
]
TabWidth: 2
UseTab: Never
---
32 changes: 25 additions & 7 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ include:
- /deps-rocm.yaml
- /deps-windows.yaml
- /deps-nvcc.yaml
- /deps-compiler-acceleration.yaml
- /gpus-rocm.yaml
- /gpus-nvcc.yaml
- /rules.yaml
Expand Down Expand Up @@ -46,17 +47,21 @@ copyright-date:
extends:
- .deps:rocm
- .deps:cmake-latest
- .deps:compiler-acceleration
before_script:
- !reference [".deps:rocm", before_script]
- !reference [".deps:cmake-latest", before_script]
- !reference [".deps:compiler-acceleration", before_script]

.cmake-minimum:
extends:
- .deps:rocm
- .deps:cmake-minimum
- .deps:compiler-acceleration
before_script:
- !reference [".deps:rocm", before_script]
- !reference [".deps:cmake-minimum", before_script]
- !reference [".deps:compiler-acceleration", before_script]

.install-rocprim:
script:
Expand All @@ -69,8 +74,11 @@ copyright-date:
-D CMAKE_CXX_COMPILER=hipcc
-D CMAKE_BUILD_TYPE=Release
-D BUILD_TEST=OFF
-D BUILD_HIPSTDPAR_TEST=OFF
-D BUILD_EXAMPLE=OFF
-D ROCM_DEP_ROCMCORE=OFF
-D CMAKE_C_COMPILER_LAUNCHER=phc_sccache_c
-D CMAKE_CXX_COMPILER_LAUNCHER=phc_sccache_cxx
-S $ROCPRIM_DIR
-B $ROCPRIM_DIR/build
- cd $ROCPRIM_DIR/build
Expand All @@ -91,7 +99,7 @@ copyright-date:
- !reference [.install-rocprim, script]
- | # Setup env vars for testing
rng_seed_count=0; prng_seeds="0";
if [[ $CI_COMMIT_BRANCH == "develop_stream" ]]; then
if [[ $CI_COMMIT_BRANCH == "develop_stream" ]]; then
rng_seed_count=3
prng_seeds="0, 1000"
fi
Expand All @@ -111,6 +119,9 @@ copyright-date:
-D AMDGPU_TEST_TARGETS=$GPU_TARGETS
-D RNG_SEED_COUNT=$rng_seed_count
-D PRNG_SEEDS=$prng_seeds
-D CMAKE_C_COMPILER_LAUNCHER=phc_sccache_c
-D CMAKE_CXX_COMPILER_LAUNCHER=phc_sccache_cxx
-D CMAKE_CUDA_COMPILER_LAUNCHER=phc_sccache_cuda
-S $CI_PROJECT_DIR
-B $CI_PROJECT_DIR/build
- cmake --build $CI_PROJECT_DIR/build
Expand Down Expand Up @@ -198,10 +209,10 @@ build:windows:
-D CMAKE_INSTALL_PREFIX:PATH="$ROCPRIM_DIR/build/install" *>&1
- \& cmake --build "$ROCPRIM_DIR/build" --target install *>&1
# Configure and build rocThrust
- \& cmake
-S "$CI_PROJECT_DIR"
-B "$CI_PROJECT_DIR/build"
-G Ninja
- \& cmake
-S "$CI_PROJECT_DIR"
-B "$CI_PROJECT_DIR/build"
-G Ninja
-D CMAKE_BUILD_TYPE=Release
-D GPU_TARGETS=$GPU_TARGET
-D BUILD_TEST=ON
Expand Down Expand Up @@ -327,10 +338,12 @@ test:rocm-windows-install:
- .deps:nvcc
- .gpus:nvcc-gpus
- .deps:cmake-latest
- .deps:compiler-acceleration
- .rules:manual
before_script:
- !reference [".deps:nvcc", before_script]
- !reference [".deps:cmake-latest", before_script]
- !reference [".deps:compiler-acceleration", before_script]

build:cuda-and-omp:
stage: build
Expand All @@ -340,7 +353,7 @@ build:cuda-and-omp:
tags:
- build
variables:
CCCL_GIT_BRANCH: v2.3.2
CCCL_GIT_BRANCH: v2.4.0
CCCL_DIR: ${CI_PROJECT_DIR}/cccl
needs: []
script:
Expand All @@ -349,16 +362,21 @@ build:cuda-and-omp:
- rm -R $CCCL_DIR/thrust/thrust
- cp -r $CI_PROJECT_DIR/thrust $CCCL_DIR/thrust
# Build tests and examples from CCCL Thrust
# CCCL 2.4.0 breaks compilation of tests. Compile examples only until we
# match v2.5.0.
- cmake
-G Ninja
-D CMAKE_BUILD_TYPE=Release
-D CMAKE_CUDA_ARCHITECTURES="$GPU_TARGETS"
-D THRUST_ENABLE_TESTING=ON
-D THRUST_ENABLE_TESTING=OFF
-D THRUST_ENABLE_EXAMPLES=ON
-D THRUST_ENABLE_BENCHMARKS=OFF
-D THRUST_ENABLE_MULTICONFIG=ON
-D THRUST_MULTICONFIG_ENABLE_SYSTEM_OMP=ON
-D THRUST_MULTICONFIG_ENABLE_SYSTEM_CUDA=ON
-D CMAKE_C_COMPILER_LAUNCHER=phc_sccache_c
-D CMAKE_CXX_COMPILER_LAUNCHER=phc_sccache_cxx
-D CMAKE_CUDA_COMPILER_LAUNCHER=phc_sccache_cuda
-B $CI_PROJECT_DIR/build
-S $CCCL_DIR/thrust
- cmake --build $CI_PROJECT_DIR/build
Expand Down
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,18 @@
Documentation for rocThrust available at
[https://rocm.docs.amd.com/projects/rocThrust/en/latest/](https://rocm.docs.amd.com/projects/rocThrust/en/latest/).

## (Unreleased) rocThrust 3.2.0 for ROCm 6.4
## (Unreleased) rocThrust 3.3.0 for ROCm 6.4

### Added
* Added extended tests to `rtest.py`. These tests are extra tests that did not fit the criteria of smoke and regression tests. These tests will take much longer to run relative to smoke and regression tests. Use `python rtest.py [--emulation|-e|--test|-t]=extended` to run these tests.
* Added regression tests to `rtest.py`. These tests recreate scenarios that have caused hardware problems in past emulation environments. Use `python rtest.py [--emulation|-e|--test|-t]=regression` to run these tests.
* Added smoke test options, which runs a subset of the unit tests and ensures that less than 2gb of VRAM will be used. Use `python rtest.py [--emulation|-e|--test|-t]=smoke` to run these tests.
* Added `--emulation` option for `rtest.py`
* Merged changes from upstream CCCL/thrust 2.4.0

### Changed
* `--test|-t` is no longer a required flag for `rtest.py`. Instead, the user can use either `--emulation|-e` or `--test|-t`, but not both.
* Split the contents of HIPSTDPAR's forwarding header into several implementation headers.

## (Unreleased) rocThrust 3.2.0 for ROCm 6.3

Expand All @@ -38,6 +40,7 @@ Documentation for rocThrust available at

* Merged changes from upstream CCCL/thrust 2.2.0
* Updated the contents of `system/hip` and `test` with the upstream changes to `system/cuda` and `testing`
* Added HIPSTDPAR library as part of rocThrust.

### Changes

Expand Down
5 changes: 3 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ endif()
# Disable -Werror
option(DISABLE_WERROR "Disable building with Werror" ON)
option(BUILD_TEST "Build tests" OFF)
option(BUILD_HIPSTDPAR_TEST "Build hipstdpar tests" OFF)
option(BUILD_EXAMPLES "Build examples" OFF)
option(BUILD_BENCHMARKS "Build benchmarks" OFF)
option(DOWNLOAD_ROCPRIM "Download rocPRIM and do not search for rocPRIM package" OFF)
Expand Down Expand Up @@ -143,14 +144,14 @@ if(BUILD_TEST OR BUILD_BENCHMARKS)
endif()

# Tests
if(BUILD_TEST)
if(BUILD_TEST OR BUILD_HIPSTDPAR_TEST)
rocm_package_setup_client_component(tests)
if (ENABLE_UPSTREAM_TESTS)
enable_testing()
endif()
# We still want the testing to be compiled to catch some errors
#TODO: Get testing folder working with HIP on Windows
if (NOT WIN32)
if (NOT WIN32 AND BUILD_TEST)
add_subdirectory(testing)
endif()
enable_testing()
Expand Down
Loading

0 comments on commit bc24ef2

Please sign in to comment.