Skip to content

[SYCL-MLIR] Merge from intel/llvm sycl branch #12142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3,011 commits into from

Conversation

keryell
Copy link
Contributor

@keryell keryell commented Dec 12, 2023

Please do not squash and merge this PR.

danilaml and others added 30 commits December 5, 2023 10:33
Specify their restrictions w.r.t. `align` attribute.
…4423)

The explicit 'align 4' caused the pointers to be underaligned on RV64.
…408)

This PR creates the wrapper class AffineForOp and adds a testcase for
it. A testcase for the AffineLoadOp is also added.
… macro with parentheses. (#68618)

Error message:
```
In file included from ../src/amd/addrlib/src/core/addrobject.h:21:
../src/amd/addrlib/src/core/addrcommon.h:343:13: error: expected unqualified-id
    out = ::_tzcnt_u32(mask);
            ^
/usr/lib/llvm-15/lib/clang/15.0.6/include/bmiintrin.h:74:27: note: expanded from macro '_tzcnt_u32'
#define _tzcnt_u32(a)     (__tzcnt_u32((a)))
```

This is because both GCC/Clang doesn't support compiling the following
code:
```
#ifdef _MSC_VER
#include <intrin.h>
#else
#include <x86intrin.h>
#endif

int f(int a) {
    return ::(_tzcnt_u32)(a);
}
```

This is because the return statement expects an expression or braced
init list: http://eel.is/c++draft/stmt.jump#general-1 but we really only
need to care about the expression form (there's no braces in sight).
Grammatically, the leading :: will parse as a qualified-id because it
matches the production for nested-name-specifier:
http://eel.is/c++draft/expr.prim.id.qual#nt:qualified-id That needs to
be followed by an unqualified-id
(http://eel.is/c++draft/expr.prim.id.unqual#nt:unqualified-id), but the
open paren does not match any of the grammar productions, so this is a
syntax error.

Closes: llvm/llvm-project#64467

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Also eliminate an unused variable while here.
Follow up on a post-commit review of 9468de4 (TargetInstrInfo: make
getOperandLatency return optional (NFC)) by Bjorn Pettersson to fix a
couple of things that are not NFC:

- std::optional<T>::operator<= returns true if the first operand is a
std::nullopt and second operand is T. Fix a couple of places where we
assumed it would return false.
- In TargetSchedule, computeInstrCost could take another codepath,
returning InstrLatency instead of DefaultDefLatency. Fix one instance
not accounting for this behavior.
…#73523)

Updates patterns for flattening `vector.transfer_read` by relaxing the
requirement that the "collapsed" indices are all zero. This enables
collapsing cases like this one:

```mlir
  %2 = vector.transfer_read %arg4[%c0, %arg0, %arg1, %c0] ... :
    memref<1x43x4x6xi32>, vector<1x2x6xi32>
```

Previously only the following case would be consider for collapsing
(all indices are 0):

```mlir
  %2 = vector.transfer_read %arg4[%c0, %c0, %c0, %c0] ... :
    memref<1x43x4x6xi32>, vector<1x2x6xi32>
```

Also adds some new comments and renames the `firstContiguousInnerDim`
parameter as `firstDimToCollapse` (the latter better matches the actual
meaning).

Similar updates for `vector.transfer_write` will be implemented in a
follow-up patch.
… (#73501)

Count how often the value is with signext/zeroext calls
when determining the preferred extension type.
This commit refactors `verifyDominanceOfContainedRegions` to iterative
algorithms similar to https://reviews.llvm.org/D154925 to fix stack
overflow for deeply nested regions (e.g.
llvm/circt#5316). There should be no
functional change except that this could result in slightly different
order of verification.
Because AA does not support vectors of pointers, we have to
treat pointers that are inserted into a vector as captures. We
mostly already do so, but missed the case where getelementptr
is used to produce a vector.
AsmParser creates dummy values when value identifiers are not going in ascending order and tries to use those dummy values when/if they are being referenced. We need to postpone this until all required data is read.
Now that tbaa tags pass is enabled by default, I would like to remove
these flags. `-fno-alias-analysis` was originally intended to be useful
for debugging, but as it also disables tbaa tag generation in codegen,
it turned out to be too noisy.

@banach-space expressed that these flags felt too non-standard.

The tbaa tags pass can be toggled using `-mllvm
-disable-fir-alias-tags=0`
Translate debug emission kind into LLVM (the importer already supports
this).
…l fcmp (#66505)""

This reverts commit d55692d.

See discussion in #66505: assertion fires in OSS build of TensorFlow.
…h instead (#73939) (#74446)

Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h`
that was still using `deferred_static_assert`.
Add support for frame pointers in MLIR.

---------

Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
ARM64EC needs to handle both ARM and x86_64 exception tables. This is
achieved by separating their chunks and sorting them separately.
EXCEPTION_TABLE directory references x86_64 variant, while ARM variant
is exposed using CHPE metadata, which references
__arm64x_extra_rfe_table and __arm64x_extra_rfe_table_size symbols.
…… (#74336)

…ng member pointers.

This initially landed with a broken test due to a mid-air collision with
a new requirement for Environment initialization before field modeling.
Have added that initialization in the test.

From first landing:

getMethodDecl does not handle pointers to members and returns nullptr
for them. getMethodDecl contains a decade-plus-old FIXME to handle
pointers to members, but two approaches I looked at for fixing it are
more invasive or complex than simply swapping to getCalleeDecl.

The first, have getMethodDecl call getCalleeDecl, creates a large tree
of const-ness mismatches due to getMethodDecl returning a non-const
value while being a const member function and getCalleeDecl only being a
const member function when it returns a const value.

The second, implementing an AST walk to match how
CXXMemberCallExpr::getImplicitObjectArgument grabs the LHS of the binary
operator, is basically reimplementing Expr::getReferencedDeclOfCallee,
which is used by Expr::getCalleeDecl. We don't need another copy of that
code.
…te (#74093)

Avoid editing a range of DPValues and then remapping them. This occurs
when we try to de-duplicate dbg.values, but then re-use the same
iterator range. We can instead remap them, and then erase any
duplicates.

At the same time refactor the computation of seen-intrinsic hashes, and
account for a peculiarity of loop-rotates existing behaviour: it will
only deduplicate dbg.values that are immediately before the preheaders
branch instruction, not just any dbg.value in the preheader.
…2384)

This includes some commonly needed information like how to add
reviewers.

This is implemented as a job before the labeler, so that on a new PR the
comment is added before there are any subscribers and only the author
gets a nofitication.

The labeler job depends on the greeter having run or having been
skipped. So if the PR wasn't just opened, or it's from a regular
contributor, the labeling still happens.

But we can be sure that when a greeting comment is left, it's the very
first thing we do.
This patch adds a target_features (TargetFeaturesAttr) to the LLVM
dialect to allow setting and querying the features in use on a function.

The motivation for this comes from the Arm SME dialect where we would
like a convenient way to check what variants of an operation are
available based on the CPU features.

Intended usage:

The target_features attribute is populated manually or by a pass:

```mlir
func.func @example() attributes {
   target_features = #llvm.target_features<["+sme", "+sve", "+sme-f64f64"]>
} {
 // ...
}
```

Then within a later rewrite the attribute can be checked, and used to
make lowering decisions.

```c++
// Finds the "target_features" attribute on the parent
// FunctionOpInterface.
auto targetFeatures = LLVM::TargetFeaturesAttr::featuresAt(op);

// Check a feature.
// Returns false if targetFeatures is null or the feature is not in
// the list.
if (!targetFeatures.contains("+sme-f64f64"))
    return failure();
```

For now, this is rather simple just checks if the exact feature is in
the list, though it could be possible to extend with implied features
using information from LLVM.
Continuation of llvm/llvm-project#74247 to fix
llvm/llvm-project#56962. Fixes verifier for
(Integer Attr):
```mlir
llvm.mlir.constant(1 : index) : f32
```
and (Dense Attr):
```mlir
llvm.mlir.constant(dense<100.0> : vector<1xf64>) : f32
```

## Integer Attr

The addition that this PR makes to `LLVM::ConstantOp::verify` is meant
to be exactly verifying the code in
`mlir::LLVM::detail::getLLVMConstant`:


https://github.com/llvm/llvm-project/blob/9f78edbd20ed922cced9482f7791deb9899a6d82/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L350-L353

One failure mode is when the `type` (`llvm.mlir.constant(<value>) :
<type>`) is not an `Integer`, because then the `cast` in
`getIntegerBitWidth` will crash:


https://github.com/llvm/llvm-project/blob/dca432cb7b1c282f5dc861095813c4f40f109619/llvm/include/llvm/IR/DerivedTypes.h#L97-L99

So that's now caught in the verifier.

Apart from that, I don't see anything we could check for. `sextOrTrunc`
means "Sign extend or truncate to width" and that one is quite
permissive. For example, the following doesn't have to be caught in the
verifier as it doesn't crash during `mlir-translate -mlir-to-llvmir`:

```mlir
llvm.func @main() -> f32 {
  %cst = llvm.mlir.constant(100 : i64) : f32
  llvm.return %cst : f32
}
```

## Dense Attr

Crash if not either a MLIR Vector type or one of these:


https://github.com/llvm/llvm-project/blob/9f78edbd20ed922cced9482f7791deb9899a6d82/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L375-L391
Pennycook and others added 15 commits December 8, 2023 14:36
First draft of an extension allowing opaque types to be passed to
kernels via their raw byte representation.

---------

Signed-off-by: John Pennycook <john.pennycook@intel.com>
Co-authored-by: Greg Lueck <gregory.m.lueck@intel.com>
Allows the use of sycl_ext_oneapi_kernel_properties extension in
Sycl-Graph.
Kernel properties passed to the backend using the sycl kernel object. 
Therefore, the graph implementation does not prevent the backend from
accessing these properties.
Removes exception throwing.
Adds tests.
… pass (intel#12108)

This PR removes some remnants of the opaque pointer transition from the
internalization pass. For private internalization, we insert an `alloca`
instruction to model a work-item's private storage. Currently, we have
to infer the accessor's element type by looking at its users. Now, we
simply pass the element size as metadata to the pass, and insert an
`i8`-`alloca` with the desired size in bytes. I'm using separate
`LocalSize` (= number of buffer elements associated with each work-item
resp. work-group) and `ElemSize` here for now as the remapping of GEP
instructions still works in terms of number-of-elements. If in the
future these are rewritten to perform `i8*` arithmetic as well, we could
go back to only passing a single size (then in bytes instead of number
of elements).

---------

Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
It is cascade removal of some unused code after removing
esimd/detail/host_util.hpp and removing __ESIMD_EMU_DNS namespace.
Even with this change ESIMD headers still have lots of unused HOST-only
code.

Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
CreateProgram and CreateKernel APIs implicitly retain program and
kernels. Extra call to retain is incorrect.
…intel#12119)

The changes which prevent coverage and profile options from being used
for device compilations had a problem with -fsanitize=address logic. We
were allowing _any_ option that happened to have 'address' as an
argument as a legal pass through. Adjust this exception to be specific
to -fsanitize.
Extend intel#11674 by modifying the globaloffset optimization pass to always
replace uses of Loads from the `llvm.nvvm.implicit.offset` and
`llvm.amdgcn.implicit.offset intrinsics` with constant zeros in the
original non-offset kernel.
Hence, perform the optimization even when
`-enable-global-offset=true` (default).
Duplicate recursively functions containing calls to the implicit offset
intrinsic and let the implicit offset kernel entry point only call
the original functions (i.e. do not call the functions with added offset
arguments).
Remove zero allocations for the original kernel entry points.
…ntel#11541)

When event list is null, a barrier is still needed for all previous
commands, so fix it.

---------

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
It passes with a newer driver so just check for that.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
The CI E2E tests are failing because these tests are passing rather than
expectedly failing

* UR nightly CI -
https://github.com/oneapi-src/unified-runtime/actions/runs/7160854862/job/19495692574
* intel/llvm CI -
https://github.com/intel/llvm/actions/runs/7168248741/job/19516014849

Change the tests to be skipped rather than XFAIL while we investigate
Use Builder::createItem help function to create `item` object instead of
adding one helper function in reduction namespace.
…cs. (intel#12005)

Also handle translating these properties in pi2ur.

oneapi-src/unified-runtime#1123

---------

Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Educated guess merge conflict resolution before even trying to
compile.
The API of mlir::InlinerInterface::handleTerminator() changed in
mlir/include/mlir/Transforms/InliningUtils.h
with
llvm/llvm-project#72765
llvm/llvm-project@26a0b27
@keryell keryell requested a review from a team as a code owner December 12, 2023 01:53
@keryell
Copy link
Contributor Author

keryell commented Dec 12, 2023

There are still a few problems related to libstdc++-13-dev I have on my system and which are probably not tested on Intel CI.
Just curious by the result of the CI. :-)

@victor-eds victor-eds added the disable-lint Skip linter check step and proceed with build jobs label Dec 12, 2023
@victor-eds
Copy link
Contributor

victor-eds commented Dec 12, 2023

CI failed because PR was missing disable-lint tag. Re-running. Changes LGTM. Will merge if CI passes.

Thanks a lot for the contribution @keryell!

@victor-eds victor-eds added the sycl-mlir Pull requests or issues for sycl-mlir branch label Dec 12, 2023
@victor-eds victor-eds self-assigned this Dec 12, 2023
@victor-eds victor-eds closed this Dec 12, 2023
@victor-eds victor-eds reopened this Dec 12, 2023
@victor-eds victor-eds removed their assignment Dec 12, 2023
@victor-eds
Copy link
Contributor

Failing tests:

  • check-polygeist: polygeist-opt/cpuifyhotspot.mlir
  • check-cgeist: Verification/sycl/raising/host_raising_nd_range.cpp
  • check-sycl-e2e:
SYCL :: Basic/float_division_precise.cpp
  SYCL :: Basic/vector/bool.cpp
  SYCL :: Basic/vector/int-convert.cpp
  SYCL :: Basic/vector/operators.cpp
  SYCL :: Basic/vector/scalar_access.cpp
  SYCL :: ESIMD/fp_controls.cpp
  SYCL :: ESIMD/noinline_call_from_func.cpp
  SYCL :: ESIMD/private_memory/private_memory.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_acc.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_acc_64.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_acc_cmpxchg.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_slm.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_slm_acc.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_slm_acc_cmpxchg.cpp
  SYCL :: ESIMD/unified_memory_api/atomic_update_slm_cmpxchg.cpp
  SYCL :: ESIMD/unified_memory_api/block_store_slm.cpp
  SYCL :: ESIMD/unified_memory_api/block_store_slm_acc.cpp
  SYCL :: Graph/Explicit/sub_group_prop.cpp
  SYCL :: Graph/RecordReplay/sub_group_prop.cpp
  SYCL :: InvokeSimd/Regression/address_space_cast.cpp
  SYCL :: KernelAndProgram/disable-caching.cpp
  SYCL :: Regression/vec_logical_ops.cpp
  SYCL :: Regression/vec_rel_swizzle_ops.cpp

@keryell
Copy link
Contributor Author

keryell commented Dec 15, 2023

I do not have the right to set tags on this repository.
The doc talks about ignore-lint with gh script but this is disable-lint I guess.
More generally, since I do not the details of this project, it is unclear for me how to fix most of the test cases.

@sommerlukas
Copy link
Contributor

I do not have the right to set tags on this repository. The doc talks about ignore-lint with gh script but this is disable-lint I guess.

ignore-lint was renamed to disable-lint, I'll fix the docs in that point.

More generally, since I do not the details of this project, it is unclear for me how to fix most of the test cases.

Explicit SIMD and SYCL Graphs are currently not yet supported with SYCL-MLIR, so you can just add the failing tests to the list of expectedly failing tests here, as described in the documentation.

For the others, a quick investigation to see what causes the error would be useful.

@victor-eds
Copy link
Contributor

I'll close this PR now as it's outdated.

@victor-eds victor-eds closed this Apr 15, 2024
@keryell
Copy link
Contributor Author

keryell commented Apr 15, 2024

Yes, you are right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.