Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase private segment limit for dynamic scratch kernels #80

Open
wants to merge 91 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
1239309
SWDEV-449558 - Update barrier's logic
gandryey Mar 12, 2024
be2bdab
SWDEV-430437,SWDEV-434702 - Split the streamset per device
iassiour Feb 6, 2024
67473ba
SWDEV-1 - Do not pass daz_opt to comgr link options
arsenm Nov 17, 2023
ad32e60
SWDEV-449146 - Call hipDeviceSynchronize before removeFatBinary
iassiour Mar 20, 2024
0be92b8
SWDEV-452299 - Pass dst pitch while capturing hipMemcpyParam2DAsync &…
jaydeeppatel1111 Mar 21, 2024
f44ca70
Revert "SWDEV-444098 - remove rocm-ocl-icd packaging"
jujiang-del Mar 22, 2024
51926b6
SWDEV-451736 - Revert "SWDEV-444670 - Register the Runtime::tearDown …
iassiour Mar 20, 2024
f296159
SWDEV-353281 - Change pool type for graphs
gandryey Mar 18, 2024
7f84df9
SWDEV-301667 - Disable HostBlit copy for HIP correct if check
Mar 19, 2024
5b28e38
SWDEV-448288 - Remove references to deprecated llvm references.
kjayapra-amd Mar 22, 2024
5cbd74b
SWDEV-413997 - Save hsa_handle as ptr in hipMemCreate path.
kjayapra-amd Mar 19, 2024
09328f4
SWDEV-446992 - Request can be for bytes OR dptr OR both.
jaydeeppatel1111 Feb 29, 2024
d1fff7c
SWDEV-445096 - Fix -O0 crash in OpenCL tests
rakesroy Mar 20, 2024
c157bfb
SWDEV-301667 - Create TS for each node recorded in graph
saleelk Mar 21, 2024
4761ecb
SWDEV-450636 - Hostcall Listen thread cause seg fault due to main thr…
jaydeeppatel1111 Mar 20, 2024
95e3958
SWDEV-449922 - Remove per queue residency update
gandryey Mar 26, 2024
2f3ad43
SWDEV-449922 - Remove a global lock for queues on wait
gandryey Mar 26, 2024
19b4660
SWDEV-443567 - SWDEV-436126 - Fix Prohibited and Unhandled Operations…
Jan 25, 2024
ea4f09e
SWDEV-452787 - correct hipDrvGraphAddMemcpyNode check
Mar 27, 2024
411960a
SWDEV-451687 - Fix alloc message values in AMD_LOG_LEVEL for 32 bit
Mar 15, 2024
f7b1398
SWDEV-443537 - fix make build warning message
Feb 9, 2024
bc80802
SWDEV-446726 - Disable large bar for 32 bit windows
Mar 15, 2024
880f1f0
SWDEV-450361 - Add nullptr validation for waitStream
rakesroy Apr 1, 2024
d7f352d
SWDEV-453301 - Remove the option to write multiple packets in dispatc…
iassiour Mar 25, 2024
dbac297
SWDEV-451964 - Limit gpu single allocation percentage for gfx940 only
Mar 19, 2024
3f0bcf7
SWDEV-301667 - Fix SDMA mask reuse
saleelk Apr 5, 2024
481912a
SWDEV-379007 - add __hip_bfloat16_raw types
cjatin Feb 9, 2024
2b8634b
SWDEV-446298 - Adding error code to the logs on p2p hsa api failure.
kjayapra-amd Feb 22, 2024
f0c7ecf
SWDEV-455254 - Add kernel arg optimization
gandryey Apr 9, 2024
d52168b
SWDEV-436754 - Use glbctx instead so that ref count increments for mu…
jaydeeppatel1111 Apr 4, 2024
35c80dd
SWDEV-424956 - Fix half vector printf issue
tomsang Apr 3, 2024
7de7da4
SWDEV-455254 - Reduce blit kernels signature
gandryey Apr 9, 2024
d52d16c
SWDEV-413997 - Fixing multiple device cases.
kjayapra-amd Mar 15, 2024
52db98e
SWDEV-453180 - Add UUID support for HIP_VISIBLE_DEVICES on Linux
rakesroy Apr 8, 2024
815e450
SWDEV-413997 - Read Access can be valid now that ROCr takes care of a…
kjayapra-amd Mar 29, 2024
00ddc3e
SWDEV-413997 - Fixing alignment validation check for power of 2 inste…
kjayapra-amd Mar 29, 2024
fcfe2ec
SWDEV-453577 - Fixes to account for right CU count based on WGP or CU…
Apr 3, 2024
ca07f59
SWDEV-379007 initial implementation of fp8 header
cjatin Feb 13, 2024
a1e0970
SWDEV-422580 - Adding back the pcie.function to PCI address string in…
kjayapra-amd Apr 3, 2024
c95a75a
SWDEV-444670 - Enable teardown class
gandryey Apr 12, 2024
49349f1
SWDEV-379007 - use avx instruction for bf16 cvt
cjatin Apr 11, 2024
5ddca58
SWDEV-455699 - removing HW_REG_TRAPSTS for gfx12
pghafari Apr 9, 2024
03562a2
SWDEV-454959 - ignore the upper half of the mask in wave32 mode
ssahasra Apr 17, 2024
d7b0d78
SWDEV-379007 - fix bool check for fp8_fnuz
cjatin Apr 16, 2024
d511e57
SWDEV-441603 - Correct dst device
shadidashmiz Mar 25, 2024
62559a6
SWDEV-440746 - Fix the hostcall buffer creation
gandryey Apr 16, 2024
8942939
SWDEV-455346 - End wait if HostcallListener terminates.
jaydeeppatel1111 Apr 17, 2024
12e0bdc
SWDEV-453535 - Capture hipMemset3DAsync.
jaydeeppatel1111 Mar 28, 2024
8179fa9
SWDEV-450053 - Handle MemcpyNodeSetParamsTo/FromSymbol negative param…
Mar 26, 2024
56ebf51
SWDEV-413997 - VMM IPC implementation for Linux.
kjayapra-amd Mar 20, 2024
e829ef6
SWDEV-455869 - Revert "SWDEV-410751 - Consider null amd::memory is in…
Apr 10, 2024
bf74ef4
SWDEV-451594 - Implement Readback and Avoid HDP Flush workaround for …
iassiour Mar 15, 2024
8f7acbd
SWDEV-446610 - Attribute HIP_POINTER_ATTRIBUTE_SYNC_MEMOPS should ret…
satyanveshd Mar 6, 2024
fd81490
SWDEV-440746 - Don't set CL_SUBMITTED twice
gandryey Apr 19, 2024
ffb516d
SWDEV-353281 - Reuse timestamp on memory reuse
gandryey Apr 19, 2024
329ba27
SWDEV-440746 - Wait for signal before release
gandryey Apr 19, 2024
fb217fa
SWDEV-453180 - Add UUID support for HIP_VISIBLE_DEVICES on Windows
rakesroy Apr 17, 2024
863c562
SWDEV-455041 - Continue processing fat binary even if other code obje…
kjayapra-amd Apr 3, 2024
7448113
SWDEV-440746 - Remove obsolete code
gandryey Apr 19, 2024
0ccdb3e
SWDEV-440746 - Release last command on terminate
gandryey Apr 19, 2024
74ffc5f
SWDEV-413997 - Cleanup fixes for Virtual Memory Management.
kjayapra-amd Apr 20, 2024
5a715ed
Switch luxmark to lightning compiler for all ASICs
kzhuravl Apr 18, 2024
8809633
[SWDEV-454661][SWDEV-454653] - GraphExecMemcpyNodeSetParam to return …
Apr 9, 2024
1761f1b
457619 - Fixed the broken link to build HIP instructions
jujiang-del Apr 19, 2024
5c23440
SWDEV-353281 - Align VA size
gandryey Apr 23, 2024
2841aab
SWDEV-451099 - Added include for __half type definitions for non-HIP …
iassiour Mar 21, 2024
9fdddb7
SWDEV-447691 - Correct handle type for DX12 semaphore
gandryey Apr 24, 2024
f2b0178
SWDEV-420016 - Add more driver side graph APIs
shadidashmiz Jan 15, 2024
cffff4e
SWDEV-457859 - Initialize isWGPMode_ in WorkGroupInfo
iassiour Apr 23, 2024
5c1804a
SWDEV-353281 - Corret VA unmap
gandryey Apr 23, 2024
49b4aef
SWDEV-413997 - Handling cases where multiple phys_mem is mapped into …
kjayapra-amd Apr 20, 2024
9e74f6d
SWDEV-451004 - Remove .bat files from hip-dev/devel package
raramakr Apr 25, 2024
d7d1e01
SWDEV-458516 - Add support for external CI builds using Azure Pipelines
amd-jmacaran Apr 24, 2024
0e1a057
SWDEV-413997 - Changes to use GlobalContext in views.
kjayapra-amd Apr 24, 2024
7a37150
SWDEV-311271 - Enable mempools under Linux
gandryey Mar 15, 2024
08889f4
SWDEV-459583 - Update codeownder in clr repos
jujiang-del Apr 29, 2024
1d48f2a
SWDEV-456279 - Adding new hip flag to access contiguous memory and pa…
jaydeeppatel1111 Apr 12, 2024
948ca5a
SWDEV-301667 - Add LOG_TS mask
saleelk Apr 25, 2024
f1ef05b
SWDEV 301667 - Fix Debug dot print for graphs
saleelk Apr 30, 2024
feaef7f
SWDEV-455586 - create ocltst ASAN test instrumented pkg
jujiang-del Apr 17, 2024
59051ed
[SWDEV-442583] - Fix hipLaunchCooperativeKernel API error codes
manocharahul Apr 16, 2024
996c16a
SWDEV-449327 - Adding CONTRIBUTING.md in clr repos
jujiang-del Apr 8, 2024
e53df57
SWDEV-433371 - use comgr to unbundle code objects
tomsang Apr 11, 2024
6cb7b6e
SWDEV-451594 - Change device kernel args to use HDP flush by default
iassiour May 2, 2024
1bd8b4e
SWDEV-433371 - Build needs comgr >= 2.8
mangupta May 3, 2024
68ddd8f
SWDEV-460091 - Handle cases where inline variables are included in mu…
kjayapra-amd May 2, 2024
4a9d24a
SWDEV-301667 - Pass reference to kernel name
saleelk Apr 25, 2024
2eb3037
SWDEV-451945 - Remove ShouldLoadPlatform function
AlexXAmd May 3, 2024
7ad1416
SWDEV-460151 - Enabling gfx12 in HIP runtime.
kjayapra-amd Sep 30, 2024
933aa1d
SWDEV-460242 - Add system memory suballocator
gandryey May 3, 2024
c76223c
Increase private segment size limit to match HW limit
publixsubfan May 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions .azuredevops/rocm-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
resources:
repositories:
- repository: pipelines_repo
type: github
endpoint: ROCm
name: ROCm/ROCm
- repository: matching_repo
type: github
endpoint: ROCm
name: ROCm/HIP
ref: develop
pipelines:
- pipeline: hip_pipeline
source: HIP
trigger:
branches:
include:
- develop

variables:
- group: common
- template: /.azuredevops/variables-global.yml@pipelines_repo

trigger:
batch: true
branches:
include:
- develop
paths:
exclude:
- CODEOWNERS
- LICENCE
- '**/*.md'

pr:
autoCancel: true
branches:
include:
- develop
paths:
exclude:
- CODEOWNERS
- LICENCE
- '**/*.md'
drafts: false

jobs:
# if the build reason is a resource trigger, it means trigger is HIP repo build
# HIP repo build would have just built runtime, just copy their build products
# this is to ensure clr has latest good package for combined-packaging jobs
# combined-packaging jobs only have to look at clr pipeline for latest runtime
# to remove logic of comparing build products from both clr and hip triggers
- ${{ if eq(variables['Build.Reason'], 'ResourceTrigger') }}:
- template: ${{ variables.CI_COMPONENT_PATH }}/copyHIP.yml@pipelines_repo
- ${{ if ne(variables['Build.Reason'], 'ResourceTrigger') }}:
- template: ${{ variables.CI_COMPONENT_PATH }}/HIP.yml@pipelines_repo
2 changes: 1 addition & 1 deletion CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @gargrahul @mangupta @rakesroy
* @cpaquot @gandryey @skudchad @mangupta @rakesroy
142 changes: 142 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Contributing to HIP/CLR #

We welcome contributions to the HIP project.
CLR is a part of HIP runtime for the AMD platform.Please follow these details to help ensure your contributions will be successfully accepted.

## Issue Discussion ##

Please use the [GitHub Issue](https://github.com/ROCm/clr/issues) tab to notify us of issues.

* Use your best judgement for issue creation. If your issue is already listed, upvote the issue and
comment or post to provide additional details, such as how you reproduced this issue.
* If you're not sure if your issue is the same, err on the side of caution and file your issue.
You can add a comment to include the issue number (and link) for the similar issue. If we evaluate
your issue as being the same as the existing issue, we'll close the duplicate.
* If your issue doesn't exist, use the issue template to file a new issue.
* When filing an issue, be sure to provide as much information as possible, including script output so
we can collect information about your configuration. This helps reduce the time required to
reproduce your issue.
* Check your issue regularly, as we may require additional information to successfully reproduce the
issue.
* You may also open an issue to ask questions to the maintainers about whether a proposed change
meets the acceptance criteria, or to discuss an idea pertaining to the library.

## Acceptance Criteria ##

clr is Common Language Runtime contains C++ codes for the implementation of HIP runtime APIs on the AMD platform.
Bug fixes and performance are both important goals in clr. Because of this, when a pull request is created, the owner of the repository will review, and put it in automated testing to make sure,
* The change will build on various OS platforms (Ubuntu, RHEL, etc.)
* The build package will install and run the code on different GPU architectures (MI-series, Radeon series cards, etc.),
* And the test results will achieve the goal as expected.

## Code Structure ##

clr contains three parts of codes,
- `hipamd` - contains implementation for HIP runtime on the AMD platform, which includes
- `include/hip/amd_detail` for headers
- `/src` for all types of functionality implementation such as hip event, memory, module and texture, etc.

- `opencl` - contains implementation of OpenCL on the AMD platform.

- `rocclr` - contains common runtime used in HIP and OpenCL, which includes
- `include`, header files,
- `device`, implementation of GPU device related interfaces to the backend support,
- `cimpiler`, implementation of interfaces with compiler,
- `utils`, implementation of some useful utilities,
- `os`, implementation of OS related interfaces.


## Coding Style ##

clr is a C++ runtime API implementation on the AMD platform. It allows codeing in C++ programming language, and follows styles as below,
- Code Indentation:
- Tabs should be expanded to spaces.
- Use 4 spaces indentation.
- Capitalization and Naming
- Prefer camelCase for HIP interfaces and internal symbols. Note HCC uses _ for separator.
This guideline is not yet consistently followed in HIP code - eventual compliance is aspirational.
- Member variables should begin with a leading "_". This allows them to be easily distinguished from other variables or functions.

- `{}` placement
- namespace should be on same line as `{` and separated by a space.
- Single-line if statement should still use `{/}` pair (even though C++ does not require).
- For functions, the opening `{` should be placed on a new line.
- For if/else blocks, the opening `{` is placed on same line as the if/else. Use a space to separate `{` from if/else. For example,
```console
if (foo) {
doFoo()
} else {
doFooElse();
}
```

- Miscellaneous
- All references in function parameter lists should be const.
- "ihip" means internal hip structures. These should not be exposed through the HIP API.
- Keyword TODO refers to a note that should be addressed in long-term. Could be style issue, software architecture, or known bugs.
- FIXME refers to a short-term bug that needs to be addressed.

- `HIP_INIT_API()` should be placed at the start of each top-level HIP API. This function will make sure the HIP runtime is initialized, and also constructs an appropriate API string for tracing and CodeXL marker tracing. The arguments to HIP_INIT_API should match those of the parent function.
- `hipExtGetLastError()` can be called as the AMD platform specific API, to return error code from last HIP API called from the active host thread. `hipGetLastError()` and `hipPeekAtLastError()` can also return the last error that was returned by any of the HIP runtime calls in the same host thread.
- All HIP environment variables should begin with the keyword HIP_
Environment variables should be long enough to describe their purpose but short enough so they can be remembered - perhaps 10-20 characters, with 3-4 parts separated by underscores.
To see the list of current environment variables, along with their values, set HIP_PRINT_ENV and run any hip applications on ROCm platform.
HIPCC or other tools may support additional environment variables which should follow the above convention.

## Pull Request Guidelines ##

By creating a pull request, you agree to the statements made in the code license section. Your pull request should target the default branch. Our current default branch is the develop branch, which serves as our integration branch.

Follow existing best practice for writing a good Git commit message.

Some tips:
http://chris.beams.io/posts/git-commit/
https://robots.thoughtbot.com/5-useful-tips-for-a-better-commit-message

In particular :
- Use imperative voice, ie "Fix this bug", "Refactor the XYZ routine", "Update the doc".
Not : "Fixing the bug", "Fixed the bug", "Bug fix", etc.
- Subject should summarize the commit. Do not end subject with a period. Use a blank line
after the subject.

### Deliverables ###

HIP is an open source library. Because of this, we include the following license description at the top of every source file.
If you create new source files in the repository, please include this text in them as well (replacing "xx" with the digits for the current year):
```
// Copyright (c) 20xx Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
```

### Process ###

After you create a PR, you can take a look at a diff of the changes you made using the PR's "Files" tab.

PRs must pass through the checks and the code review described in the [Acceptance Criteria](#acceptance-criteria) section before they can be merged.

Checks may take some time to complete. You can view their progress in the table near the bottom of the pull request page. You may also be able to use the links in the table
to view logs associated with a check if it fails.

During code reviews, another developer will take a look through your proposed change. If any modifications are requested (or further discussion about anything is
needed), they may leave a comment. You can follow up and respond to the comment, and/or create comments of your own if you have questions or ideas.
When a modification request has been completed, the conversation thread about it will be marked as resolved.

To update the code in your PR (eg. in response to a code review discussion), you can simply push another commit to the branch used in your pull request.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Building clr requires `rocm-hip-libraries` meta package, which provides the pre-

Users can also build `OCL` and `HIP` at the same time by passing `-DCLR_BUILD_HIP=ON -DCLR_BUILD_OCL=ON` to configure command.

For detail instructions, please refer to [how to build HIP](https://rocm.docs.amd.com/projects/HIP/en/latest/developer_guide/build.html)
For detail instructions, please refer to [how to build HIP](https://rocm.docs.amd.com/projects/HIP/en/latest/install/build.html)

## Tests

Expand Down
22 changes: 2 additions & 20 deletions hipamd/include/hip/amd_detail/amd_hip_atomic.h
Original file line number Diff line number Diff line change
Expand Up @@ -973,22 +973,13 @@ inline
unsigned int atomicInc(unsigned int* address, unsigned int val)
{
#if defined(__gfx941__)
__device__
extern
unsigned int __builtin_amdgcn_atomic_inc(
unsigned int*,
unsigned int,
unsigned int,
unsigned int,
bool) __asm("llvm.amdgcn.atomic.inc.i32.p0i32");

return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>(
address,
val,
[](unsigned int& x, unsigned int y) { x = (x >= y) ? 0 : (x + 1); },
[=]() {
return
__builtin_amdgcn_atomic_inc(address, val, __ATOMIC_RELAXED, 1, false);
__builtin_amdgcn_atomic_inc32(address, val, __ATOMIC_RELAXED, "agent");
});
#else
return __builtin_amdgcn_atomic_inc32(address, val, __ATOMIC_RELAXED, "agent");
Expand All @@ -1001,22 +992,13 @@ inline
unsigned int atomicDec(unsigned int* address, unsigned int val)
{
#if defined(__gfx941__)
__device__
extern
unsigned int __builtin_amdgcn_atomic_dec(
unsigned int*,
unsigned int,
unsigned int,
unsigned int,
bool) __asm("llvm.amdgcn.atomic.dec.i32.p0i32");

return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>(
address,
val,
[](unsigned int& x, unsigned int y) { x = (!x || x > y) ? y : (x - 1); },
[=]() {
return
__builtin_amdgcn_atomic_dec(address, val, __ATOMIC_RELAXED, 1, false);
__builtin_amdgcn_atomic_dec32(address, val, __ATOMIC_RELAXED, "agent");
});
#else
return __builtin_amdgcn_atomic_dec32(address, val, __ATOMIC_RELAXED, "agent");
Expand Down
Loading