Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEN] Update GENX branch to LLVM 3a83162 #13773

Merged
merged 3,172 commits into from
May 14, 2024
Merged

Conversation

whitneywhtsang
Copy link
Contributor

No description provided.

tbaederr and others added 30 commits May 6, 2024 10:37
We need to use InitField here, not SetField.
Referring to RISC-V, adding an MI level pass to optimize *W instructions
for LoongArch.

First it removes unneeded sext(addi.w rd, rs, 0) instructions. Either
because the sign extended bits aren't consumed or because the input was
already sign extended by an earlier instruction.

Then:
1. Unless explicit disabled or the target prefers instructions with W
suffix, it removes the -w suffix from opw instructions whenever all
users are dependent only on the lower word of the result of the
instruction. The cases handled are:
* addi.w because it helps reduce test differences between LA32 and LA64
w/o being a pessimization.

2. Or if explicit enabled or the target prefers instructions with W
suffix, it adds the W suffix to the instruction whenever all users are
dependent only on the lower word of the result of the instruction. The
cases handled are:
   * add.d/addi.d/sub.d/mul.d.
   * slli.d with imm < 32.
   * ld.d/ld.wu.
Noticed while investigating GFNI per-element vector shifts (we can form SHL but not SRL/SRA)

Alive2: https://alive2.llvm.org/ce/z/fSH-rf
Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128
to allow for simplified use in atomic load/store.

Update logic to split 128-bit loads and stores in DAGCombine to also
handle the f128 case where appropriate. This fixes the regressions
introduced by recent atomic load/store patches.
If we don't demand the same element from both single source shuffles (permutes), then attempt to blend the sources together first and then perform a merged permute.

For vXi16 blends we have to be careful as these are much more likely to involve byte/word vector shuffles that will result in the creation of additional shuffle instructions.

This fold might be worth it for VSELECT with constant masks on AVX512 targets, but I haven't investigated this yet, but I've tried to write combineBlendOfPermutes so to be prepared for this.

The PR34592 -O0 regression is an unfortunate failure to cleanup with a later pass that calls SimplifyDemandedElts like the -O3 does - I'm not sure how worried we should be tbh.
… with 1 or 2 args (#89624)

The destructors generated by the legacy IBM `xlclang++` compiler can
take 1 or 2 arguments and the differences were handled by type `cast`
where it is needed. Clang now treats the `cast` here as an error after
llvm/llvm-project@999d4f8
landed with `-Xextra -Werror`. The issue had been worked around by using
`#pragma GCC diagnostic push/pop`. This patch defines 2 separate
destructor types for 1 argument and 2 arguments respectively so `cast`
is not needed.
llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:40081:21:
error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
  for (int I = 0; I != NumElts; ++I) {
                  ~ ^  ~~~~~~~
1 error generated.
llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:3582:13:
error: unused function 'isBlendOrUndef' [-Werror,-Wunused-function]
static bool isBlendOrUndef(ArrayRef<int> Mask) {
            ^
1 error generated.
Handle implicit firstprivate DSAs on task generating constructs.

Fixes llvm/llvm-project#64480
Plugins are not loaded without the -cc1 phase. Do not report them when
running on an assembly file or when linking. Many build tools add these
options to all driver invocations, including LLVM's build system.

Fixes #88173
An `emitc.expression` can only yield a single result, but some
operations which have the `CExpression` trait can have multiple results,
which can result in a crash when applying the `fold-expressions` pass.
This change adds a check for the single-result condition and a simple
test.
Pre-commit tests for an upcoming patch.
…xts (reland #90438) (#91172)

This relands #90348 with a fix for a [buildbot
failure](https://lab.llvm.org/buildbot/#/builders/216/builds/38446)
caused by the test being run with `-fno-rtti`.
… macros (#88816)

Adds more FP test macros for the upcoming test adds for #61092 and the
issues opened from it: #88768, #88769, #88770, #88771, #88772.

Fix bug in `{EXPECT,ASSERT}_FP_EXCEPTION`. `EXPECT_FP_EXCEPTION(0)`
seems to be used to test that an exception did not happen, but it always
does `EXPECT_GE(... & 0, 0)` which never fails.

Update and refactor tests that break after the above bug fix. An
interesting way things broke after the above change is that
`ForceRoundingMode` and `quick_get_round()` were raising the inexact
exception, breaking a lot of the `atan*` tests.

The changes for all files other than `FPMatcher.h` and
`libc/test/src/math/smoke/RoundToIntegerTest.h` should have the same
semantics as before. For `RoundToIntegerTest.h`, lines 56-58 before the
changes do not always hold since this test is used for functions with
different exception and errno behavior like `lrint` and `lround`. I've
deleted those lines for now, but tests for those cases should be added
for the different nearest int functions to account for this.

Adding @nickdesaulniers for review.
…n ThinLTO summaries" (#90610)" (#91194)

Reverts llvm/llvm-project#90692

Breaking PPC buildbots. The bots are not meant to test LLD, but are
running a test that is using an old version of LLD without the change
(so is incompatible). Revert until a fix is found.
…or. (#87649)

`SBProcess::GetMemoryRegionInfo` uses `qMemoryRegionInfo` packet to get
memory region info, but this is not supported in gdb-server and causing
downstream lldb test failures. This change ignores the the error from
`SBProcess::GetMemoryRegionInfo` .

Reported by @tedwoodward @jerinphilip.
… (#91019)

This reverts commit 1106644.

As noted in the original patch, this was designed to reverted once
https://reviews.llvm.org/D142479 and https://reviews.llvm.org/D142660
landed, which has long since happened.
Fix the issue that `char` constants are converted to `uint64_t` in the
wrong way when doing the inlining.
Need to check that the signed operand has an extra sign bit to be sure
that we do not skip signedness, when trying to minimize bitwidth for
smin/smax intrinsics.
…1168)

Because LiveVariables has been run, we no longer need to lookup the
users in MachineRegisterInfo anymore and can instead just check for the
dead flag.
NagyDonat and others added 24 commits May 8, 2024 12:38
This commit explicitly specifies the matching mode (C library function,
any non-method function, or C++ method) for the `CallDescription`s
constructed in the checker `osx.MIG`.

The code was simplified to use a `CallDescriptionMap` instead of a raw
vector of pairs.

This change won't cause major functional changes, but isn't NFC because
it ensures that e.g. call descriptions for a non-method function won't
accidentally match a method that has the same name.

Separate commits have already performed this change in other checkers:
- easy cases: e2f1cba,
    6d64f8e
- MallocChecker: d6d84b5
- iterator checkers: 06eedff
- InvalidPtr checker: 024281d
- apiModeling.llvm.ReturnValue: 97dd8e3

... and follow-up commits will handle the remaining few checkers.

My goal is to ensure that the call description mode is always explicitly
specified and eliminate (or strongly restrict) the vague "may be either
a method or a simple function" mode that's the current default.
Add HLSLPackOffsetAttr to save packoffset in AST.

Since we have to parse the attribute manually in ParseHLSLAnnotations,
we could create the ParsedAttribute with a integer offset parameter
instead of string. This approach avoids parsing the string if the offset
is saved as a string in HLSLPackOffsetAttr.

For #57914.
Adding myself to linalg dialect
…(#90217)

This patch fixes an integer overflow in the SampleProfileLoader pass.
The issue occurs when weights are saturated and Profi isn't being used.

This patch also adds a newline to a debug message to make it more
readable.
Debug-info metadata does not have a strictly defined order. Check that
elements are linked to each other correctly, not that metadata appears
in a particular order.
Emit named metadata "dx.version" for DXIL version.

Default to DXIL 1.0
…tered. (#91329)

PR #89664 introduced a regression that it unregistered llvm-tblgen
option `-D` for macros. The test `TestOps.cpp` failed due to passing a
macros to llvm-tblgen.

It caused our internal build to fail because we append `-DLOCAL_NAME`
into `LLVM_TABLEGEN_FLANGS` in `llvm/lib/cmake/llvm/TableGen.cmake` as

```
list(APPEND LLVM_TABLEGEN_FLAGS "-DLOCAL_NAME")
```

And in `./llvm/lib/Target/PowerPC/PPC.td`, we check it for some
downstream code as:

```
...
#ifdef LOCAL_NAME
...
#endif
```
Now we got error message from mlir-src-sharder as
```
mlir-src-sharder -op-shard-index=1 -DLOCAL_NAME llvm-project/mlir/test/lib/Dialect/Test/TestOps.cpp --write-if-changed -o tools/mlir/test/lib/Dialect/Test/TestOps.1.cpp -d tools/mlir/test/lib/Dialect/Test/TestOps.1.cpp.d
mlir-src-sharder: Unknown command line argument '-DLOCAL_NAME'.  Try: 'llvm-project/build/bin/mlir-src-sharder --help'
mlir-src-sharder: Did you mean '-I'?
```

This PR is to fix the regression.
This commit ensures that Mem2Reg reuses the `DominanceInfo` as well as
block index maps to avoid expensive recomputations. Due to the recent
migration to `OpBuilder`, the promotion of a slot does no longer replace
blocks. Having stable blocks makes the `DominanceInfo` preservable and
additionally allows to cache block index maps between different
promotions.

Performance measurements on very large functions show an up to 4x
speedup by these changes.
…ic index lists (#90897)

This patch is a first pass at making consistent syntax across the
`LinalgTransformOp`s that use dynamic index lists for size parameters.
Previously, there were two different forms: inline types in the list, or
place them in the functional style tuple. This patch goes for the
latter.

In order to do this, the `printPackedOrDynamicIndexList`,
`printDynamicIndexList` and their `parse` counterparts were modified so
that the types can be optionally provided to the corresponding custom
directives.

All affected ops now use tablegen `assemblyFormat`, so custom
`parse`/`print` functions have been removed. There are a couple ops that
will likely add dynamic size support, and once that happens it should be
made sure that the assembly remains consistent with the changes in this
patch.

The affected ops are as follows: `pack`, `pack_greedily`,
`tile_using_forall`. The `tile_using_for` and `vectorize` ops already
used this syntax, but their custom assembly was removed.

---------

Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
This allows the tests to be run against any implementation of `Solver`
instead
of begin specific to `WatchedLiteralsSolver` as they currently are.
The binary version is four times faster than current implementation in
my setup, and generally considered a better implementation.

Code inspired by https://en.algorithmica.org/hpc/algorithms/gcd/ which
itself is inspired by
https://lemire.me/blog/2013/12/26/fastest-way-to-compute-the-greatest-common-divisor/

Fix #77648
When a child process is forked with OpenMP already initialized, the
child process resets its affinity mask and sets proc-bind-var to false
so that the entire original affinity mask is used. This patch corrects
an issue with the affinity initialization code setting affinity to
compact instead of none for this special case of forked children.

The test trying to catch this only testing explicit setting of
KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting.

Fixes: #91098
…hape involving dynamic dims (#89093)

`fold-memref-alias-ops` bails out in presence of dynamic shapes in
`memref.expand_shape` op. Handle this case.
…other type.

Need to look through the SExt/ZExt scalars to be gathered, when trying
to reduce their width after minbitwidth analysis to prevent permanent
attempts to revectorize such gathered instructions.
`clang/tools/scan-build` is implemented in `perl`. However given `perl`
is not mentioned as a required dependency in `GettingStarted.rst` we
should make this optional.

This adds a `find_package(Perl)` check to cmake and disables the
`scan-build` tests when no perl executable is found.

Ideally we would also check if dependent perl modules like `Hash::Util`
are present on the system, but I don't see any pre-existing cmake macros
to easily test this. So for now I go with a plain check for the `perl`
package, at least this allows to use `cmake
-DCMAKE_DISABLE_FIND_PACKAGE_Perl=ON` to manually disable `perl` and the
tests.
Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
@whitneywhtsang whitneywhtsang added the genx Pull requests or issues for genx branch label May 13, 2024
@whitneywhtsang whitneywhtsang requested a review from a team May 13, 2024 18:41
@whitneywhtsang whitneywhtsang self-assigned this May 13, 2024
@whitneywhtsang whitneywhtsang merged commit 25f6fcb into intel:genx May 14, 2024
11 checks passed
@whitneywhtsang whitneywhtsang deleted the merge branch May 14, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment