[SYCL-MLIR] Merge from intel/llvm sycl branch #12142

keryell · 2023-12-12T01:53:09Z

Please do not squash and merge this PR.

Specify their restrictions w.r.t. `align` attribute.

…4423) The explicit 'align 4' caused the pointers to be underaligned on RV64.

…408) This PR creates the wrapper class AffineForOp and adds a testcase for it. A testcase for the AffineLoadOp is also added.

… macro with parentheses. (#68618) Error message: ``` In file included from ../src/amd/addrlib/src/core/addrobject.h:21: ../src/amd/addrlib/src/core/addrcommon.h:343:13: error: expected unqualified-id out = ::_tzcnt_u32(mask); ^ /usr/lib/llvm-15/lib/clang/15.0.6/include/bmiintrin.h:74:27: note: expanded from macro '_tzcnt_u32' #define _tzcnt_u32(a) (__tzcnt_u32((a))) ``` This is because both GCC/Clang doesn't support compiling the following code: ``` #ifdef _MSC_VER #include <intrin.h> #else #include <x86intrin.h> #endif int f(int a) { return ::(_tzcnt_u32)(a); } ``` This is because the return statement expects an expression or braced init list: http://eel.is/c++draft/stmt.jump#general-1 but we really only need to care about the expression form (there's no braces in sight). Grammatically, the leading :: will parse as a qualified-id because it matches the production for nested-name-specifier: http://eel.is/c++draft/expr.prim.id.qual#nt:qualified-id That needs to be followed by an unqualified-id (http://eel.is/c++draft/expr.prim.id.unqual#nt:unqualified-id), but the open paren does not match any of the grammar productions, so this is a syntax error. Closes: llvm/llvm-project#64467 Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>

Also eliminate an unused variable while here.

Follow up on a post-commit review of 9468de4 (TargetInstrInfo: make getOperandLatency return optional (NFC)) by Bjorn Pettersson to fix a couple of things that are not NFC: - std::optional<T>::operator<= returns true if the first operand is a std::nullopt and second operand is T. Fix a couple of places where we assumed it would return false. - In TargetSchedule, computeInstrCost could take another codepath, returning InstrLatency instead of DefaultDefLatency. Fix one instance not accounting for this behavior.

…#73523) Updates patterns for flattening `vector.transfer_read` by relaxing the requirement that the "collapsed" indices are all zero. This enables collapsing cases like this one: ```mlir %2 = vector.transfer_read %arg4[%c0, %arg0, %arg1, %c0] ... : memref<1x43x4x6xi32>, vector<1x2x6xi32> ``` Previously only the following case would be consider for collapsing (all indices are 0): ```mlir %2 = vector.transfer_read %arg4[%c0, %c0, %c0, %c0] ... : memref<1x43x4x6xi32>, vector<1x2x6xi32> ``` Also adds some new comments and renames the `firstContiguousInnerDim` parameter as `firstDimToCollapse` (the latter better matches the actual meaning). Similar updates for `vector.transfer_write` will be implemented in a follow-up patch.

… (#73501) Count how often the value is with signext/zeroext calls when determining the preferred extension type.

This commit refactors `verifyDominanceOfContainedRegions` to iterative algorithms similar to https://reviews.llvm.org/D154925 to fix stack overflow for deeply nested regions (e.g. llvm/circt#5316). There should be no functional change except that this could result in slightly different order of verification.

Reported at: llvm/llvm-project#69931 (comment)

Because AA does not support vectors of pointers, we have to treat pointers that are inserted into a vector as captures. We mostly already do so, but missed the case where getelementptr is used to produce a vector.

AsmParser creates dummy values when value identifiers are not going in ascending order and tries to use those dummy values when/if they are being referenced. We need to postpone this until all required data is read.

@banach-space

Now that tbaa tags pass is enabled by default, I would like to remove these flags. `-fno-alias-analysis` was originally intended to be useful for debugging, but as it also disables tbaa tag generation in codegen, it turned out to be too noisy. @banach-space expressed that these flags felt too non-standard. The tbaa tags pass can be toggled using `-mllvm -disable-fir-alias-tags=0`

Translate debug emission kind into LLVM (the importer already supports this).

…uctions (#74305)

…l fcmp (#66505)"" This reverts commit d55692d. See discussion in #66505: assertion fires in OSS build of TensorFlow.

…d (#73939)

…h instead" (#74444) Reverts llvm/llvm-project#73939 This broke libc-aarch64-ubuntu build bot https://lab.llvm.org/buildbot/#/builders/138/builds/56186

…h instead (#73939) (#74446) Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h` that was still using `deferred_static_assert`.

This is a minor improvement to #73491.

Add support for frame pointers in MLIR. --------- Co-authored-by: Markus Böck <markus.boeck02@gmail.com> Co-authored-by: Christian Ulmann <christianulmann@gmail.com>

ARM64EC needs to handle both ARM and x86_64 exception tables. This is achieved by separating their chunks and sorting them separately. EXCEPTION_TABLE directory references x86_64 variant, while ARM variant is exposed using CHPE metadata, which references __arm64x_extra_rfe_table and __arm64x_extra_rfe_table_size symbols.

…… (#74336) …ng member pointers. This initially landed with a broken test due to a mid-air collision with a new requirement for Environment initialization before field modeling. Have added that initialization in the test. From first landing: getMethodDecl does not handle pointers to members and returns nullptr for them. getMethodDecl contains a decade-plus-old FIXME to handle pointers to members, but two approaches I looked at for fixing it are more invasive or complex than simply swapping to getCalleeDecl. The first, have getMethodDecl call getCalleeDecl, creates a large tree of const-ness mismatches due to getMethodDecl returning a non-const value while being a const member function and getCalleeDecl only being a const member function when it returns a const value. The second, implementing an AST walk to match how CXXMemberCallExpr::getImplicitObjectArgument grabs the LHS of the binary operator, is basically reimplementing Expr::getReferencedDeclOfCallee, which is used by Expr::getCalleeDecl. We don't need another copy of that code.

…te (#74093) Avoid editing a range of DPValues and then remapping them. This occurs when we try to de-duplicate dbg.values, but then re-use the same iterator range. We can instead remap them, and then erase any duplicates. At the same time refactor the computation of seen-intrinsic hashes, and account for a peculiarity of loop-rotates existing behaviour: it will only deduplicate dbg.values that are immediately before the preheaders branch instruction, not just any dbg.value in the preheader.

…2384) This includes some commonly needed information like how to add reviewers. This is implemented as a job before the labeler, so that on a new PR the comment is added before there are any subscribers and only the author gets a nofitication. The labeler job depends on the greeter having run or having been skipped. So if the PR wasn't just opened, or it's from a regular contributor, the labeling still happens. But we can be sure that when a greeting comment is left, it's the very first thing we do.

@example

This patch adds a target_features (TargetFeaturesAttr) to the LLVM dialect to allow setting and querying the features in use on a function. The motivation for this comes from the Arm SME dialect where we would like a convenient way to check what variants of an operation are available based on the CPU features. Intended usage: The target_features attribute is populated manually or by a pass: ```mlir func.func @example() attributes { target_features = #llvm.target_features<["+sme", "+sve", "+sme-f64f64"]> } { // ... } ``` Then within a later rewrite the attribute can be checked, and used to make lowering decisions. ```c++ // Finds the "target_features" attribute on the parent // FunctionOpInterface. auto targetFeatures = LLVM::TargetFeaturesAttr::featuresAt(op); // Check a feature. // Returns false if targetFeatures is null or the feature is not in // the list. if (!targetFeatures.contains("+sme-f64f64")) return failure(); ``` For now, this is rather simple just checks if the exact feature is in the list, though it could be possible to extend with implied features using information from LLVM.

@main

Continuation of llvm/llvm-project#74247 to fix llvm/llvm-project#56962. Fixes verifier for (Integer Attr): ```mlir llvm.mlir.constant(1 : index) : f32 ``` and (Dense Attr): ```mlir llvm.mlir.constant(dense<100.0> : vector<1xf64>) : f32 ``` ## Integer Attr The addition that this PR makes to `LLVM::ConstantOp::verify` is meant to be exactly verifying the code in `mlir::LLVM::detail::getLLVMConstant`: https://github.com/llvm/llvm-project/blob/9f78edbd20ed922cced9482f7791deb9899a6d82/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L350-L353 One failure mode is when the `type` (`llvm.mlir.constant(<value>) : <type>`) is not an `Integer`, because then the `cast` in `getIntegerBitWidth` will crash: https://github.com/llvm/llvm-project/blob/dca432cb7b1c282f5dc861095813c4f40f109619/llvm/include/llvm/IR/DerivedTypes.h#L97-L99 So that's now caught in the verifier. Apart from that, I don't see anything we could check for. `sextOrTrunc` means "Sign extend or truncate to width" and that one is quite permissive. For example, the following doesn't have to be caught in the verifier as it doesn't crash during `mlir-translate -mlir-to-llvmir`: ```mlir llvm.func @main() -> f32 { %cst = llvm.mlir.constant(100 : i64) : f32 llvm.return %cst : f32 } ``` ## Dense Attr Crash if not either a MLIR Vector type or one of these: https://github.com/llvm/llvm-project/blob/9f78edbd20ed922cced9482f7791deb9899a6d82/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L375-L391

First draft of an extension allowing opaque types to be passed to kernels via their raw byte representation. --------- Signed-off-by: John Pennycook <john.pennycook@intel.com> Co-authored-by: Greg Lueck <gregory.m.lueck@intel.com>

Allows the use of sycl_ext_oneapi_kernel_properties extension in Sycl-Graph. Kernel properties passed to the backend using the sycl kernel object. Therefore, the graph implementation does not prevent the backend from accessing these properties. Removes exception throwing. Adds tests.

… pass (intel#12108) This PR removes some remnants of the opaque pointer transition from the internalization pass. For private internalization, we insert an `alloca` instruction to model a work-item's private storage. Currently, we have to infer the accessor's element type by looking at its users. Now, we simply pass the element size as metadata to the pass, and insert an `i8`-`alloca` with the desired size in bytes. I'm using separate `LocalSize` (= number of buffer elements associated with each work-item resp. work-group) and `ElemSize` here for now as the remapping of GEP instructions still works in terms of number-of-elements. If in the future these are rewritten to perform `i8*` arithmetic as well, we could go back to only passing a single size (then in bytes instead of number of elements). --------- Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

It is cascade removal of some unused code after removing esimd/detail/host_util.hpp and removing __ESIMD_EMU_DNS namespace. Even with this change ESIMD headers still have lots of unused HOST-only code. Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>

CreateProgram and CreateKernel APIs implicitly retain program and kernels. Extra call to retain is incorrect.

…intel#12119) The changes which prevent coverage and profile options from being used for device compilations had a problem with -fsanitize=address logic. We were allowing _any_ option that happened to have 'address' as an argument as a legal pass through. Adjust this exception to be specific to -fsanitize.

Extend intel#11674 by modifying the globaloffset optimization pass to always replace uses of Loads from the `llvm.nvvm.implicit.offset` and `llvm.amdgcn.implicit.offset intrinsics` with constant zeros in the original non-offset kernel. Hence, perform the optimization even when `-enable-global-offset=true` (default). Duplicate recursively functions containing calls to the implicit offset intrinsic and let the implicit offset kernel entry point only call the original functions (i.e. do not call the functions with added offset arguments). Remove zero allocations for the original kernel entry points.

…ntel#11541) When event list is null, a barrier is still needed for all previous commands, so fix it. --------- Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com> Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

It passes with a newer driver so just check for that. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

The CI E2E tests are failing because these tests are passing rather than expectedly failing * UR nightly CI - https://github.com/oneapi-src/unified-runtime/actions/runs/7160854862/job/19495692574 * intel/llvm CI - https://github.com/intel/llvm/actions/runs/7168248741/job/19516014849 Change the tests to be skipped rather than XFAIL while we investigate

Use Builder::createItem help function to create `item` object instead of adding one helper function in reduction namespace.

…cs. (intel#12005) Also handle translating these properties in pi2ur. oneapi-src/unified-runtime#1123 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

Educated guess merge conflict resolution before even trying to compile.

The API of mlir::InlinerInterface::handleTerminator() changed in mlir/include/mlir/Transforms/InliningUtils.h with llvm/llvm-project#72765 llvm/llvm-project@26a0b27

keryell · 2023-12-12T02:01:23Z

There are still a few problems related to libstdc++-13-dev I have on my system and which are probably not tested on Intel CI.
Just curious by the result of the CI. :-)

victor-eds · 2023-12-12T15:33:35Z

CI failed because PR was missing disable-lint tag. Re-running. Changes LGTM. Will merge if CI passes.

Thanks a lot for the contribution @keryell!

victor-eds · 2023-12-12T22:16:25Z

Failing tests:

check-polygeist: polygeist-opt/cpuifyhotspot.mlir
check-cgeist: Verification/sycl/raising/host_raising_nd_range.cpp
check-sycl-e2e:

SYCL :: Basic/float_division_precise.cpp
  SYCL :: Basic/vector/bool.cpp
  SYCL :: Basic/vector/int-convert.cpp
  SYCL :: Basic/vector/operators.cpp
  SYCL :: Basic/vector/scalar_access.cpp
  SYCL :: ESIMD/fp_controls.cpp
  SYCL :: ESIMD/noinline_call_from_func.cpp
  SYCL :: ESIMD/private_memory/private_memory.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_acc.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_acc_64.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_acc_cmpxchg.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_slm.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_slm_acc.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_slm_acc_cmpxchg.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_slm_cmpxchg.cpp
  SYCL :: ESIMD/unified_memory_api/block_store_slm.cpp
  SYCL :: ESIMD/unified_memory_api/block_store_slm_acc.cpp
  SYCL :: Graph/Explicit/sub_group_prop.cpp
  SYCL :: Graph/RecordReplay/sub_group_prop.cpp
  SYCL :: InvokeSimd/Regression/address_space_cast.cpp
  SYCL :: KernelAndProgram/disable-caching.cpp
  SYCL :: Regression/vec_logical_ops.cpp
  SYCL :: Regression/vec_rel_swizzle_ops.cpp

keryell · 2023-12-15T17:54:01Z

I do not have the right to set tags on this repository.
The doc talks about ignore-lint with gh script but this is disable-lint I guess.
More generally, since I do not the details of this project, it is unclear for me how to fix most of the test cases.

sommerlukas · 2023-12-18T12:27:16Z

I do not have the right to set tags on this repository. The doc talks about ignore-lint with gh script but this is disable-lint I guess.

ignore-lint was renamed to disable-lint, I'll fix the docs in that point.

More generally, since I do not the details of this project, it is unclear for me how to fix most of the test cases.

Explicit SIMD and SYCL Graphs are currently not yet supported with SYCL-MLIR, so you can just add the failing tests to the list of expectedly failing tests here, as described in the documentation.

For the others, a quick investigation to see what causes the error would be useful.

victor-eds · 2024-04-15T08:29:13Z

I'll close this PR now as it's outdated.

keryell · 2024-04-15T09:50:07Z

Yes, you are right!

danilaml and others added 30 commits December 5, 2023 10:33

[llvm][docs] Expand section on non-integral pointers (#73242)

6b4b6d9

Specify their restrictions w.r.t. `align` attribute.

[RISCV] Use ABI align in varargs tests in push-pop-popret.ll. NFC (#7…

d504824

…4423) The explicit 'align 4' caused the pointers to be underaligned on RV64.

[mlir][python] python binding wrapper for the affine.AffineForOp (#74…

543589a

…408) This PR creates the wrapper class AffineForOp and adds a testcase for it. A testcase for the AffineLoadOp is also added.

[lldb] Sync OpenBSD GetOSVersion() closer to NetBSD function (#74396)

5c70c18

Also eliminate an unused variable while here.

[NFC] Remove duplicated message resulted from a bad merge I think

0ca80eb

[bazel] port c33e5d5

43455a2

[SDAG] Count call argument attributes to reduce unnecessary extension…

eaba81f

… (#73501) Count how often the value is with signext/zeroext calls when determining the preferred extension type.

[GVN] Add test for capture via vector GEP (NFC)

292ecb8

Reported at: llvm/llvm-project#69931 (comment)

[CaptureTracking] Treat vector GEPs as captures

383e350

Because AA does not support vectors of pointers, we have to treat pointers that are inserted into a vector as captures. We mostly already do so, but missed the case where getelementptr is used to produce a vector.

Fix parsing out-of-order ValueInfos (#73239)

432bb52

AsmParser creates dummy values when value identifiers are not going in ascending order and tries to use those dummy values when/if they are being referenced. We need to postpone this until all required data is read.

[MLIR][LLVM] Translate Debug EmissionKind (#74376)

fd870c6

Translate debug emission kind into LLVM (the importer already supports this).

[AMDGPU] Add GFX12 encoding and aliases for existing SOP (SALU) instr…

1c55b22

…uctions (#74305)

Revert "Reapply "ValueTracking: Identify implied fp classes by genera…

0d0c229

…l fcmp (#66505)"" This reverts commit d55692d. See discussion in #66505: assertion fires in OSS build of TensorFlow.

[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instea…

b140948

…d (#73939)

Revert "[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.…

de7fdc5

…h instead" (#74444) Reverts llvm/llvm-project#73939 This broke libc-aarch64-ubuntu build bot https://lab.llvm.org/buildbot/#/builders/138/builds/56186

[reland][libc][NFC] Remove __support/bit.h and use __support/CPP/bit.…

1d89478

…h instead (#73939) (#74446) Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h` that was still using `deferred_static_assert`.

[clang-format][NFC] Use prog in clang-format-diff.py (#74399)

900bb31

This is a minor improvement to #73491.

[MLIR] Add support for frame pointers in MLIR (#72145)

3257e4c

Add support for frame pointers in MLIR. --------- Co-authored-by: Markus Böck <markus.boeck02@gmail.com> Co-authored-by: Christian Ulmann <christianulmann@gmail.com>

[AArch64][SME2] Remove IsPreservesZA from ldr_zt builtin (#74303)

7788ef4

Pennycook and others added 15 commits December 8, 2023 14:36

[SYCL][NFC] Fix post-commit failure (intel#12126)

9f9a1d5

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

[SYCL] Fix kernel_compiler leak. (intel#12120)

74bf30a

CreateProgram and CreateKernel APIs implicitly retain program and kernels. Extra call to retain is incorrect.

[SYCL][E2E] Move XFAIL to driver check in get_backend.cpp (intel#12130)

e82769a

It passes with a newer driver so just check for that. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[SYCL][NFC] Remove item dependency on reduction. (intel#12135)

8074617

Use Builder::createItem help function to create `item` object instead of adding one helper function in reduction namespace.

[UR] Pull in fix that allows handling location properties in USM allo…

7bd51c6

…cs. (intel#12005) Also handle translating these properties in pi2ur. oneapi-src/unified-runtime#1123 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

Merge branch 'sycl/intel' into sycl-mlir-merge

a2cc7d0

Educated guess merge conflict resolution before even trying to compile.

[SYCL-MLIR][Polygeist] Update API of handleTerminator()

3542a62

The API of mlir::InlinerInterface::handleTerminator() changed in mlir/include/mlir/Transforms/InliningUtils.h with llvm/llvm-project#72765 llvm/llvm-project@26a0b27

keryell requested a review from a team as a code owner December 12, 2023 01:53

victor-eds added the disable-lint label Dec 12, 2023

victor-eds added the sycl-mlir label Dec 12, 2023

victor-eds self-assigned this Dec 12, 2023

victor-eds closed this Dec 12, 2023

victor-eds reopened this Dec 12, 2023

victor-eds removed their assignment Dec 12, 2023

victor-eds closed this Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL-MLIR] Merge from intel/llvm sycl branch #12142

[SYCL-MLIR] Merge from intel/llvm sycl branch #12142

keryell commented Dec 12, 2023

keryell commented Dec 12, 2023

victor-eds commented Dec 12, 2023 •

edited

Loading

victor-eds commented Dec 12, 2023

keryell commented Dec 15, 2023

sommerlukas commented Dec 18, 2023

victor-eds commented Apr 15, 2024

keryell commented Apr 15, 2024

[SYCL-MLIR] Merge from intel/llvm sycl branch #12142

[SYCL-MLIR] Merge from intel/llvm sycl branch #12142

Conversation

keryell commented Dec 12, 2023

keryell commented Dec 12, 2023

victor-eds commented Dec 12, 2023 • edited Loading

victor-eds commented Dec 12, 2023

keryell commented Dec 15, 2023

sommerlukas commented Dec 18, 2023

victor-eds commented Apr 15, 2024

keryell commented Apr 15, 2024

victor-eds commented Dec 12, 2023 •

edited

Loading