[GENX] Update GENX branch to LLVM `4017f04` #12711

whitneywhtsang · 2024-02-14T04:23:38Z

No description provided.

A distinction that doesn't _usually_ matter is that the MachO::SymbolKind is really a mapping of entries in TBD files not symbols. To better understand this, rename the enum so it represents an encoding mapped to TBDs as opposed to symbols alone. For example, it can be a bit confusing that "GlobalSymbol" is a enum value when all of those values can represent a GlobalSymbol.

Use definitions from `<linux/mman.h>` to dispatch arch-specific flag values. For example, `MCL_CURRENT/MCL_FUTURE/MCL_ONFAULT` are different on different architectures.

This patch refactors the instantiation of BenchmarkMeasure within all the unit tests to use BenchmarkMeasure::Create rather than through direct struct instantialization. This allows us to change what values are stored in BenchmarkMeasure without getting compiler warnings on every instantiation in the unit tests, and is also just a cleanup in general as the Create function didn't seem to exist at the time the unit tests were originally written.

The use of SmallDenseSet saves 0.39% of heap allocations during the compilation of a large preprocessed file, namely X86ISelLowering.cpp, for the X86 target. During the experiment, WL.size() was 2 or less 99.9% of the time. The inline size of 4 should accommodate up to 2 entries at the 3/4 occupancy rate.

…aryFunctionsChecker (#78895)

Fixes #78965.

Rushing this one out before vacation starts. Refactoring on top of #66505

On Darwin, the Makefile already (ad-hoc) signs everything it builds. There's also no need to use lldb_codesign for this.

In trying to set up python headers in an out-of-tree bazel MLIR project, I encountered the `pybind11_bazel` project, and found that the `@python_runtime` target used here is not defined by it. Instead, it seems that `@python_runtime` is an alias used in some projects like Tensorflow (see https://github.com/tensorflow/tensorflow/blob/322936ffdd96ee59e27d028467fe458859cf3855/third_party/python_runtime/BUILD#L7-L7), where it is aliased to `@local_config_python`. In fact, `@local_config_python` is defined by `@pybind11_bazel`, and so it seems that this layer of indirection no longer serves a purpose, and instead just prevents anyone who doesn't clone Tensorflow's config from using the python bindings here. This commit updates the dependent targets to their canonical de-aliased equivalents, and I suspect this will not even break any downstream users since the new target is defined in those projects already. Without this change, running, for example ``` bazel build @llvm-project//mlir:MLIRBindingsPythonCore ``` gives the error ``` no such package '@python_runtime//': The repository '@python_runtime' could not be resolved: Repository '@python_runtime' is not defined and referenced by '@llvm-project//mlir:MLIRBindingsPythonCore' ``` Minimal reproduction in https://github.com/j2kun/test_mlir_bazel_pybind, which, when pointing to a local LLVM repository that has this change (see `bazel/import_llvm.bzl` in that repository), results in that build succeeding. Hat tip to Maksim Levental for going on an hours-long investigation with me to figure this out.

See #78920. This reverts commit ce3e767.

On Gentoo, libc++ is indeed in /usr/include/c++/*, but libstdc++ is at e.g. /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14. Use '/include/g++' as it should be unique enough. Note that the omission of a trailing slash is intentional to match g++-*. See llvm/llvm-project#78534 (comment). Reviewed by: mgorny Closes: llvm/llvm-project#79264 Signed-off-by: Sam James <sam@gentoo.org>

…nside a constraint scope (#79568) We preserve the trailing requires-expression during the lambda expression transformation. In order to get those referenced parameters inside a requires-expression properly resolved to the instantiated decls, we intended to inject these 'original' `ParmVarDecls` to the current instantiaion scope, at `Sema::SetupConstraintScope`. The previous approach seems to overlook nested instantiation chains, leading to the crash within a nested lambda followed by a requires clause. This fixes llvm/llvm-project#73418.

classifyComplexElementType() doesn't return a std::optional anymore.

This patch bumps the mlgo-utils version to 19.0.0 as 18.0.0 got branched recently.

Implements https://isocpp.org/files/papers/P2662R3.pdf The feature is exposed as an extension in older language modes. Mangling is not yet supported and that is something we will have to do before release.

…s (#79371) This pull request would solve llvm/llvm-project#78449 . There is also a discussion about this on stackoverflow: https://stackoverflow.com/questions/77832658/stdtype-identity-to-support-several-variadic-argument-lists . The following program is well formed: ```cpp #include <type_traits> template <typename... T> struct args_tag { using type = std::common_type_t<T...>; }; template <typename... T> void bar(args_tag<T...>, std::type_identity_t<T>..., int, std::type_identity_t<T>...) {} // example int main() { bar(args_tag<int, int>{}, 4, 8, 15, 16, 23); } ``` but Clang rejects it, while GCC and MSVC doesn't. The reason for this is that, in `Sema::DeduceTemplateArguments` we are not prepared for this case. # Substitution/deduction of parameter packs The logic that handles substitution when we have explicit template arguments (`SubstituteExplicitTemplateArguments`) does not work here, since the types of the pack are not pushed to `ParamTypes` before the loop starts that does the deduction. The other "candidate" that may could have handle this case would be the loop that does the deduction for trailing packs, but we are not dealing with trailing packs here. # Solution proposed in this PR The solution proposed in this PR works similar to the trailing pack deduction. The main difference here is the end of the deduction cycle. When a non-trailing template pack argument is found, whose type is not explicitly specified and the next type is not a pack type, the length of the previously deduced pack is retrieved (let that length be `s`). After that the next `s` arguments are processed in the same way as in the case of non trailing packs. # Another possible solution There is another possible approach that would be less efficient. In the loop when we get to an element of `ParamTypes` that is a pack and could be substituted because the type is deduced from a previous argument, then `s` number of arg types would be inserted before the current element of `ParamTypes` type. Then we would "cancel" the processing of the current element, first process the previously inserted elements and the after that re-process the current element. Basically we would do what `SubstituteExplicitTemplateArguments` does but during deduction. # Adjusted test cases In `clang/test/CXX/temp/temp.fct.spec/temp.deduct/temp.deduct.call/p1-0x.cpp` there is a test case named `test_pack_not_at_end` that should work, but still does not. This test case is relevant because the note for the error message has changed. This is what the test case looks like currently: ```cpp template<typename ...Types> void pack_not_at_end(tuple<Types...>, Types... values, int); // expected-note {{<int *, double *> vs. <int, int>}} void test_pack_not_at_end(tuple<int*, double*> t2) { pack_not_at_end(t2, 0, 0, 0); // expected-error {{no match}} // FIXME: Should the "original argument type must match deduced parameter // type" rule apply here? pack_not_at_end<int*, double*>(t2, 0, 0, 0); // ok } ``` The previous note said (before my changes): ``` deduced conflicting types for parameter 'Types' (<int *, double *> vs. <>) ```` The current note says (after my changesand also clang 14 would say this if the pack was not trailing): ``` deduced conflicting types for parameter 'Types' (<int *, double *> vs. <int, int>) ``` GCC says: ``` error: no matching function for call to ‘pack_not_at_end(std::tuple<int*, double*>&, int, int, int)’ 70 | pack_not_at_end(t2, 0, 0, 9); // expected-error {{no match}} ```` --------- Co-authored-by: cor3ntin <corentinjabot@gmail.com> Co-authored-by: Erich Keane <ekeane@nvidia.com>

As it breaks buildkite CI

This patch is aiming at resolving the below missed-optimization case. ### Code ``` define <8 x i64> @vwadd_mask_v8i32(<8 x i32> %x, <8 x i64> %y) { %mask = icmp slt <8 x i32> %x, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42> %a = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> zeroinitializer %sa = sext <8 x i32> %a to <8 x i64> %ret = add <8 x i64> %sa, %y ret <8 x i64> %ret } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/cd1bKTrx6) ``` vwadd_mask_v8i32: li a0, 42 vsetivli zero, 8, e32, m2, ta, ma vmslt.vx v0, v8, a0 vmv.v.i v10, 0 vmerge.vvm v16, v10, v8, v0 vwadd.wv v8, v12, v16 ret ``` ### After this patch ``` vwadd_mask_v8i32: li a0, 42 vsetivli zero, 8, e32, m2, ta, ma vmslt.vx v0, v8, a0 vsetvli zero, zero, e32, m2, tu, mu vwadd.wv v12, v12, v8, v0.t vmv4r.v v8, v12 ret ``` This pattern could be found in a reduction with a widening destination Specifically, we first do a fold like `(vwadd.wv y, (vmerge cond, x, 0)) -> (vwadd.wv y, x, y, cond)`, then do pattern matching on it.

…(#79657) The `map` clause in OpenMP allows structure components to be specified (unlike other clauses). Structure components do get their own symbols, but these are not meant to be instantiated. When a component reference is passed as an argument to the omp.target op, it gets a corresponding parameter in the target op's entry block. The original symbols are then bound to the same kind of an extended value as before, but the value is now based on the parameters. To handle structure components more gracefully, put their symbols on the list of mapped objects, but skip them when creating extended values. Fixes llvm/llvm-project#79478.

llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:13754:12: error: unused variable 'Opc' [-Werror,-Wunused-variable] unsigned Opc = N->getOpcode(); ^ 1 error generated.

This patch implements cloning for VPlans and recipes. Cloning is used in the epilogue vectorization path, to clone the VPlan for the main vector loop. This means we won't re-use a VPlan when executing the VPlan for the epilogue vector loop, which in turn will enable us to perform optimizations based on UF & VF.

Annotating tokens can invalid the stack of Peaked tokens.

when possible.

Reverts llvm/llvm-project#78120 Buildbot is broken: llvm/lib/Support/RISCVISAInfo.cpp:910:18: error: call to deleted constructor of 'llvm::Error' return E; ^

… LLVMIR (#79828) There is no `SHL` used in canonicalization in `arith` --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>

reserveRegisterTuples is slow because it uses MCRegAliasIterator and hence ends up reserving the same aliased registers many times. This patch changes getReservedRegs not to use it for reserving SGPRs, VGPRs and AGPRs. Instead it iterates through base register classes, which should come closer to reserving each register once only. Overall this speeds up the time to run check-llvm-codegen-amdgpu in my Release build from 18.4 seconds to 16.9 seconds (all timings +/- 0.2).

…mparison (#79698) This is a follow-up for the comparison of constraints on out-of-line function template definitions. We require the instantiation of a ParmVarDecl while transforming the expression if that Decl gets referenced by a DeclRefExpr. However, we're not actually performing the class or function template instantiation at the time of such comparison. Therefore, let's map these parameters to themselves so that they get preserved after the substitution. Fixes llvm/llvm-project#74447.

…ressed' switch code. NFC. Stop clang-format trying to expand manually compressed lookup switch() code - if it still fits into 80col, then keep it to a single line instead of expanding across multiple lines each.

…tructions Minor correction for #79775 - noticed in EXPENSIVE_CHECKS builds

`OpaqueValueExpr` doesn't necessarily contain a source expression. Particularly, after #78041, it is used to carry the type and the value kind of a non-type template argument of floating-point type or referring to a subobject (those are so called `StructuralValue` arguments). This fixes #79575.

Those were deprecated and basically not used anymore after we renamed them in batch. This patch removes the macros entirely.

…U (#79322) This patch tries to better explain the differences between the `IsTargetDevice` and `IsGPU` flags of the `OpenMPIRBuilderConfig`.

Another round of additional tests for llvm/llvm-project#7863 with different sext/zext and use variants.

…ass template explict specializations (#78720) According to [[dcl.type.elab] p2](http://eel.is/c++draft/dcl.type.elab#2): > If an [elaborated-type-specifier](http://eel.is/c++draft/dcl.type.elab#nt:elaborated-type-specifier) is the sole constituent of a declaration, the declaration is ill-formed unless it is an explicit specialization, an explicit instantiation or it has one of the following forms [...] Consider the following: ```cpp template<typename T> struct A { template<typename U> struct B; }; template<> template<typename U> struct A<int>::B; // intel#1 ``` The _elaborated-type-specifier_ at `intel#1` declares an explicit specialization (which is itself a template). We currently (incorrectly) reject this, and this PR fixes that. I moved the point at which _elaborated-type-specifiers_ with _nested-name-specifiers_ are diagnosed from `ParsedFreeStandingDeclSpec` to `ActOnTag` for two reasons: `ActOnTag` isn't called for explicit instantiations and partial/explicit specializations, and because it's where we determine if a member specialization is being declared. With respect to diagnostics, I am currently issuing the diagnostic without marking the declaration as invalid or returning early, which results in more diagnostics that I think is necessary. I would like feedback regarding what the "correct" behavior should be here.

This prevents having to use double parentheses in common cases.

The <__threading_support> header is a huge beast and it's really difficult to navigate. I find myself struggling to find what I want every time I have to open it, and I've been considering splitting it up for years for that reason. This patch aims not to contain any functional change. The various implementations of the threading base are simply moved to separate headers and then the individual headers are simplified in mechanical ways. For example, we used to have redundant declarations of all the functions at the top of `__threading_support`, and those are removed since they are not needed anymore. The various #ifdefs are also simplified and removed when they become unnecessary. Finally, this patch adds documentation for the API we expect from any threading implementation.

…(#79871) Some of the checks in sfinae_helpers.h were not used anymore since we refactored the std::tuple implementation and were now dead code. This patch removes the code.

This macro is unnecessary with `basic_string& operator=(value_type __c)`.

Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

cyndyishida and others added 30 commits January 26, 2024 16:12

[libc] adjust linux's mman.h definitions (#79652)

2e1e27c

Use definitions from `<linux/mman.h>` to dispatch arch-specific flag values. For example, `MCL_CURRENT/MCL_FUTURE/MCL_ONFAULT` are different on different architectures.

[clang][analyzer] Improve modeling of 'popen' and 'pclose' in StdLibr…

ff05c30

…aryFunctionsChecker (#78895)

[clang-format] Fix a bug in AnnotatingParser::rParenEndsCast() (#79549)

f826f55

Fixes #78965.

ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (#66522)

e44d3b3

Rushing this one out before vacation starts. Refactoring on top of #66505

[mlir][complex] Prevent underflow in complex.abs (#76316)

69f99cd

[InstCombine] Fix a comment. (#79422)

701ec45

[lldb] Remove obsolete signBinary helper (#79656)

7595287

On Darwin, the Makefile already (ad-hoc) signs everything it builds. There's also no need to use lldb_codesign for this.

Revert "[Coverage] Map regions from system headers (#76950)"

faef68b

See #78920. This reverts commit ce3e767.

[clang-tools-extra] Use SmallString::operator std::string (NFC)

2b00d44

[Analysis] Use llvm::succ_empty and llvm::successors (NFC)

ac0b601

[CodeGen] Use a range-based for loop (NFC)

f2e69d2

[Driver] Use StringRef::consume_back (NFC)

fe35d72

[clang][Interp][NFC] Remove unused function

499507f

[clang][Interp][NFC] Don't unnecessarily use std::optional

ce75cbe

classifyComplexElementType() doesn't return a std::optional anymore.

[MLGO] Bump mlgo-utils version to 19.0.0

1f13203

This patch bumps the mlgo-utils version to 19.0.0 as 18.0.0 got branched recently.

[Clang][C++26] Implement Pack Indexing (P2662R3). (#72644)

ad1a65f

Implements https://isocpp.org/files/papers/P2662R3.pdf The feature is exposed as an extension in older language modes. Mangling is not yet supported and that is something we will have to do before release.

Disable gdb_pretty_printer_test.sh.cpp for clang 19

bc5c151

As it breaks buildkite CI

[DevPolicy] Add guidance on bans (#69701)

608d602

[RISCV] Fix -Wunused-variable in RISCVISelLowering.cpp (NFC)

3e29e52

llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:13754:12: error: unused variable 'Opc' [-Werror,-Wunused-variable] unsigned Opc = N->getOpcode(); ^ 1 error generated.

[Clang] Fix asan error after ad1a65f

143b510

Annotating tokens can invalid the stack of Peaked tokens.

tbaederr and others added 21 commits January 30, 2024 11:25

[clang][NFC] Use no-param version of skipRValueSubobjectAdjustments

c61686e

when possible.

Revert "[RISCV] Relax march string order constraint" (#79976)

5a00cb1

Reverts llvm/llvm-project#78120 Buildbot is broken: llvm/lib/Support/RISCVISAInfo.cpp:910:18: error: call to deleted constructor of 'llvm::Error' return E; ^

[X86][test] Update CodeGen/X86/popcnt.ll after #78545

e5054fb

[mlir] [arith] add shl overflow flag in Arith and lower to SPIR-V and…

f7ef73e

… LLVMIR (#79828) There is no `SHL` used in canonicalization in `arith` --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>

[X86] X86CompressEVEX.cpp - ensure we tie the operands on MOVBErr ins…

0c623b5

…tructions Minor correction for #79775 - noticed in EXPENSIVE_CHECKS builds

[X86][NFC] X86CompressEVEX.cpp - Simplify code after 0c623b5

2acf302

[libc++] Officially remove _VSTD and _LIBCPP_INLINE_VISIBILITY (#79885)

683bc94

Those were deprecated and basically not used anymore after we renamed them in batch. This patch removes the macros entirely.

[OpenMPIRBuilder] NFC: Improve description of IsTargetDevice and IsGP…

fdac7d0

…U (#79322) This patch tries to better explain the differences between the `IsTargetDevice` and `IsGPU` flags of the `OpenMPIRBuilderConfig`.

[X86][CodeGen] Add entries for TB_BCAST_SH in getBroadcastOpcode

02a275c

[AArch64] Add tests with sext of vec3 loads.

6251b6b

Another round of additional tests for llvm/llvm-project#7863 with different sext/zext and use variants.

[libc++] Accept __VA_ARGS__ in conditional _NOEXCEPT_ macro (#79877)

f89d707

This prevents having to use double parentheses in common cases.

[libc++][NFC] Remove dead code implementing some tuple SFINAE checks …

e37a600

…(#79871) Some of the checks in sfinae_helpers.h were not used anymore since we refactored the std::tuple implementation and were now dead code. This patch removes the code.

Remove unnecessary _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS (#79574)

4017f04

This macro is unnecessary with `basic_string& operator=(value_type __c)`.

Merge commit '4017f04e310454ccced4c404a23f7698eec735ca'

b7a8dce

[GENX] Update libGenISAIntrinsics

23e2542

Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

whitneywhtsang changed the title ~~[GENX] Update GENX branch to LLVM 0784b1e~~ [GENX] Update GENX branch to LLVM 4017f04 Feb 14, 2024

whitneywhtsang requested a review from a team February 14, 2024 04:24

whitneywhtsang self-assigned this Feb 14, 2024

whitneywhtsang added the genx label Feb 14, 2024

whitneywhtsang mentioned this pull request Feb 14, 2024

Merge OpenAI Triton commit 075701a intel/intel-xpu-backend-for-triton#489

Merged

etiotto approved these changes Feb 14, 2024

View reviewed changes

whitneywhtsang merged commit 23e2542 into intel:genx Feb 14, 2024
11 checks passed

whitneywhtsang deleted the merge branch February 14, 2024 14:55

whitneywhtsang mentioned this pull request Feb 16, 2024

Merge OpenAI Triton till Feb 11 intel/intel-xpu-backend-for-triton#208

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GENX] Update GENX branch to LLVM `4017f04` #12711

[GENX] Update GENX branch to LLVM `4017f04` #12711

whitneywhtsang commented Feb 14, 2024

[GENX] Update GENX branch to LLVM 4017f04 #12711

[GENX] Update GENX branch to LLVM 4017f04 #12711

Conversation

whitneywhtsang commented Feb 14, 2024

[GENX] Update GENX branch to LLVM `4017f04` #12711

[GENX] Update GENX branch to LLVM `4017f04` #12711