-
Notifications
You must be signed in to change notification settings - Fork 769
[GENX] Update GENX branch to LLVM 0784b1e
#12600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
R16-R31 was added into GPRs in llvm/llvm-project#70958, This patch supports the lowering for promoted BMI instructions in EVEX space, enc/dec has been supported in llvm/llvm-project#73899. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
…/sign extension." (#76785) This patch was originally introduced in PR #72340, but was reverted due to a bug on invalid extension combine. Specifically, we resolve the case in the llvm/llvm-project#72340 (comment) ``` define <vscale x 1 x i32> @foo(<vscale x 1 x i1> %x, <vscale x 1 x i2> %y) { %a = zext <vscale x 1 x i1> %x to <vscale x 1 x i32> %b = zext <vscale x 1 x i1> %y to <vscale x 1 x i32> %c = add <vscale x 1 x i32> %a, %b ret <vscale x 1 x i32> %c } ``` The previous patch didn't check if the semantic of `ISD::ZERO_EXTEND` and `ISD::ZERO_EXTEND` is equivalent to the `vsext.vf2` or `vzext.vf2` (not ensuring the SEW condition on widening Vector Arithmetic Instructions). Thanks for @topperc pointing out this bug. ## The original description This PR mainly aims at resolving the below missed-optimization case, while it could also be considered as an extension of the previous patch https://reviews.llvm.org/D133739?id= ### Missed-Optimization Case Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh ### Source Code: ``` define <vscale x 2 x i16> @multiple_users(ptr %x, ptr %y, ptr %z) { %a = load <vscale x 2 x i8>, ptr %x %b = load <vscale x 2 x i8>, ptr %y %b2 = load <vscale x 2 x i8>, ptr %z %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16> %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16> %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16> %e = mul <vscale x 2 x i16> %c, %d %f = add <vscale x 2 x i16> %c, %d2 %g = sub <vscale x 2 x i16> %c, %d2 %h = or <vscale x 2 x i16> %e, %f %i = or <vscale x 2 x i16> %h, %g ret <vscale x 2 x i16> %i } ``` ### Before This Patch ``` # %bb.0: vsetvli a3, zero, e16, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) svf2 v11, v8 vsext.vf2 v8, v9 vsext.vf2 v9, v10 vmul.vv v8, v11, v8 vadd.vv v10, v11, v9 vsub.vv v9, v11, v9 vor.vv v8, v8, v10 vor.vv v8, v8, v9 ret ``` ### After This Patch ``` # %bb.0: vsetvli a3, zero, e8, mf4, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) vwmul.vv v11, v8, v9 vwadd.vv v9, v8, v10 vwsub.vv v12, v8, v10 vsetvli zero, zero, e16, mf2, ta, ma vor.vv v8, v11, v9 vor.vv v8, v8, v12 ret ``` We can see Add/Sub/Mul are combined with the Sign Extension. ### Relation to the Patch D133739 The patch D133739 introduced an optimization for folding `ADD_VL`/ `SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did not consider the case of non-fixed length vector case, thus this PR could also be considered as an extension for the D133739.
…n. (#77883) Previously there were two ways to override the verbose abort function which gets called when a hardening assertion is triggered: - compile-time: define the `_LIBCPP_VERBOSE_ABORT` macro; - link-time: provide a definition of `__libcpp_verbose_abort` function. This patch adds a new configure-time approach: the vendor can provide a path to a custom header file which will get copied into the build by CMake and included by the library. The header must provide a definition of the `_LIBCPP_ASSERTION_HANDLER` macro which is what will get called should a hardening assertion fail. As of this patch, overriding `_LIBCPP_VERBOSE_ABORT` will still work, but the previous mechanisms will be effectively removed in a follow-up patch, making the configure-time mechanism the sole way of overriding the default handler. Note that `_LIBCPP_ASSERTION_HANDLER` only gets invoked when a hardening assertion fails. It does not affect other cases where `_LIBCPP_VERBOSE_ABORT` is currently used (e.g. when an exception is thrown in the `-fno-exceptions` mode). The library provides a default version of the custom header file that will get used if it's not overridden by the vendor. That allows us to always test the override mechanism and reduces the difference in configuration between the pristine version of the library and a platform-specific version.
We can convert `complex.tan` op to [ctan/ctanf](https://sourceware.org/newlib/libm.html#ctan) function in libm in the complex to libm conversion.
This was emitting non-strict casts in ABI contexts for illegal types.
…(#78541) Reverts llvm/llvm-project#78387 The added tests are failing on several build bots.
This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm/llvm-project#72677 Requires to pass CI without fails: - llvm/llvm-project#75845 - llvm/llvm-project#75858 Annotating `std::basic_string` with default allocator is implemented in llvm/llvm-project#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm/llvm-project@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm/llvm-project@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
Upstream XROS support in the clang frontend and driver.
Fixing https://buildkite.com/llvm-project/github-pull-requests/builds/30321#018d1a4a-bf72-449e-a70a-444ded875255 Co-authored-by: XinWang10 <108658776+XinWang10@users.noreply.github.com>
except when `GRND_NONBLOCK` is present in the flags.
Co-authored-by: Nikolas Klauser <nikolasklauser@berlin.de>
SourceLocExpr that may produce a function name are marked dependent so that the non-instantiated name of a function does not get evaluated. In GH78128, the name('s size) is used as template argument to a `DeclRef` that is not otherwise dependent, and therefore cached and not transformed when the function is instantiated, leading to 2 different values existing at the same time for the same function. Fixes #78128
…in TokenBuffer::spelledForExpanded() (#78092) Such ranges can legitimately arise in the case of invalid code, such as a declaration missing an ending brace. Fixes clangd/clangd#1559
Add `-start/stop-before/after` support for CodeGenPassBuilder. Part of #69879.
This fixes a crash where `path::parent_path` causes an invalid access on a string upon receiving a path that consists of a single colon. On Windows machine, with runtime checks enabled build, upon `clang -I: test.cc` produces: ``` Assertion failed: Index < Length && "Invalid index!", file llvm\include\llvm/ADT/StringRef.h, line 232 ... intel#6 0x00007ff7816201eb `anonymous namespace'::parent_path_end llvm\lib\Support\Path.cpp:144:0 intel#7 0x00007ff781620135 llvm::sys::path::parent_path(class llvm::StringRef, enum llvm::sys::path::Style) llvm\lib\Support\Path.cpp:470:0 ``` Ideally, we can look for the last colon starting from the last character, but we can instead start from second to last, and handle empty paths by abusing `0 - 1 == npos`.
…78424) This decouples the Arm type attributes from other bits, which means the data will only be allocated when a function uses these Arm attributes. The first patch adds the bit `HasArmTypeAttributes` to `FunctionTypeBitfields`, which grows from 62 bits to 63 bits. In the second patch, I've moved this bit (`HasArmTypeAttributes`) to `FunctionTypeExtraBitfields`, because it looks like the bits in `FunctionTypeBitfields` are precious and we really don't want that struct to grow beyond 64 bits. I've split this out into two patches to explain the rationale, but those can be squashed before merging.
Without this gcc warned ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3585:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses] 3584 | ((&Current == &AccelDebugNames) && | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3585 | (Unit.getUnitDie().getTag() != dwarf::DW_TAG_type_unit)) && | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ 3586 | "Kind is CU but TU is being processed."); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3589:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses] 3588 | ((&Current == &AccelTypeUnitsDebugNames) && | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3589 | (Unit.getUnitDie().getTag() == dwarf::DW_TAG_type_unit)) && | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ 3590 | "Kind is TU but CU is being processed."); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a pseudo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427
…cord ctor". (#78423) The CFG doesn't contain a CFGElement for the `CXXDefaultInitExpr::getInit()`, so it makes sense to consider the `CXXDefaultInitExpr` to be the expression that originally constructs the object.
…perators. (#72242) Operators that are overloadable may be parsed as `CXXOperatorCallExpr` or as `UnaryOperator` (or `BinaryOperator`). This depends on the context and can be different if a similar construct is imported into an existing AST. The two "forms" of the operator call AST nodes should be detected as equivalent to allow AST import of these cases. This fix has probably other consequences because if a structure is imported that has `CXXOperatorCallExpr` into an AST with an existing similar structure that has `UnaryOperator` (or binary), the additional data in the `CXXOperatorCallExpr` node is lost at the import (because the existing node will be used). I am not sure if this can cause problems.
LoopVectorizer is aware when a target can replace a scalable frem instruction with a vector library call for a given VF and it returns the relevant cost. Otherwise, it returns an invalid cost (as previously). Add test that check costs on AArch64, when there is no vector library available and when there is (with and without tail-folding). NOTE: Invoking CostModel directly (not through LV) would still return invalid costs.
Add an `-allow-incomplete-ir` flag to the IR parser, which allows reading IR with missing declarations. This is intended to produce a best-effort interpretation of the IR, along the same lines of what we would manually do when taking, for example, a function from `-print-after-all` output and fixing it up to be valid IR. This patch only supports dropping references to undeclared metadata, either by dropping metadata attachments from instructions/functions, or by dropping calls to certain intrinsics (like debug intrinsics). I will implement support for inserting missing function/global declarations in a followup patch. We don't have real use lists for metadata, so the approach here is to iterate over the whole IR and identify metadata that needs to be dropped. This does not support all possible cases, but should handle anything that's relevant for the function-only IR use case.
Fixes #70221 Fix a bug in FileCheck that corrects the error message when multiple prefixes are provided through --check-prefixes and one of them is a PREFIX-NOT. Earlier, only the first of the provided prefixes was displayed as the erroneous prefix, while the actual error might be on the prefix that occurred at the end of the prefix list in the input file. Now, the right NOT prefix is shown in the error message.
… (#70822) Fixes #68654 Depends on llvm/llvm-project#70790
GFX12 has subword scalar loads so there is no need to do this.
In most cases the hazards no longer apply, so just assert that we are not on GFX12.
This was already done for LLVM. This patch just updates the Clang builtin handling to match.
Provides some context for failing to generate LLVM IR for `target enter|exit|update` directives when `nowait` is provided. This is directly helpful for flang users since they would get this error message if they tried to use `nowait`. Before that we had a very generic message. This is a follow-up to llvm/llvm-project#78269, please only review the latest commit (the one with the same commit message as the PR title).
For GlobalISel this was already done in AMDGPUInstructionSelector::selectBufferLoadLds.
The arm_sme.td file was still using `IsSharedZA` and `IsPreservesZA`, which should be changed to match the new state attributes added in #76971. This patch adds `IsInZA`, `IsOutZA` and `IsInOutZA` as the state for the Clang builtins and fixes up the code in SemaChecking and SveEmitter to match. Note that the code is written in such a way that it can be easily extended with ZT0 state (to follow in a future patch).
… (#78703) Having it return a `std::optional<bool>` is unnecessarily confusing. This patch changes it to a simple 'bool'. This patch also removes the 'BodyOverridesInterface' operand because there is only a single use for this which is easily rewritten.
Summary: We use `add_libc_test' now because it works for both hermetic and unit tests. If the test needs to be unit test only you use `UNIT_TEST_ONLY` as an argument.
…ests (#73067) The @expectedFailureAll and @skipIf decorators will mark the test case as xfail/skip if _all_ conditions passed in match, including debug_info. * If debug_info is not one of the matching conditions, we can immediately evaluate the check and decide if it should be decorated. * If debug_info *is* present as a match condition, we need to defer whether or not to decorate until when the `LLDBTestCaseFactory` metaclass expands the test case into its potential variants. This is still early enough that the standard `unittest` framework will recognize the test as xfail/skip by the time the test actually runs. TestDecorators exhibits the edge cases more thoroughly. With the exception of `@expectedFailureIf` (added by this commit), all those test cases pass prior to this commit. This is a followup to 212a60e.
This implements the ideas discussed in [1]. To summarize, this commit changes AsmPrinter so that it outputs DW_IDX_parent information for debug_name entries. It will enable debuggers to speed up queries for fully qualified types (based on a DWARFDeclContext) significantly, as debuggers will no longer need to parse the entire CU in order to inspect the parent chain of a DIE. Instead, a debugger can simply take the parent DIE offset from the accelerator table and peek at its name in the debug_info/debug_str sections. The implementation uses two types of DW_FORM for the DW_IDX_parent attribute: 1. DW_FORM_ref4, which points to the accelerator table entry for the parent. 2. DW_FORM_flag_present, when the entry has a parent that is not in the table (that is, the parent doesn't have a name, or isn't allowed to be in the table as per the DWARF spec). This is space-efficient, since it takes 0 bytes. The implementation works by: 1. Changing how abbreviations are encoded (so that they encode which form, if any, was used to encode IDX_Parent) 2. Creating an MCLabel per accelerator table entry, so that they may be referred by IDX_parent references. When all patches related to this are merged, we are able to show that evaluating an expression such as: ``` lldb --batch -o 'b CodeGenFunction::GenerateCode' -o run -o 'expr Fn' -- \ clang++ -c -g test.cpp -o /dev/null ``` is far faster: from ~5000 ms to ~1500ms. Building llvm-project + clang with and without this patch, and looking at its impact on object file size: ``` ls -la $(find build_stage2_Debug_idx_parent_assert_dwarf5 -name \*.cpp.o) | awk '{s+=$5} END {printf "%\047d\n", s}' 11,507,327,592 -la $(find build_stage2_Debug_no_idx_parent_assert_dwarf5 -name \*.cpp.o) | awk '{s+=$5} END {printf "%\047d\n", s}' 11,436,446,616 ``` That is, an increase of 0.62% in total object file size. Looking only at debug_names: ``` $stage1_build/bin/llvm-objdump --section-headers $(find build_stage2_Debug_idx_parent_assert_dwarf5 -name \*.cpp.o) | grep __debug_names | awk '{s+="0x"$3} END {printf "%\047d\n", s}' 440,772,348 $stage1_build/bin/llvm-objdump --section-headers $(find build_stage2_Debug_no_idx_parent_assert_dwarf5 -name \*.cpp.o) | grep __debug_names | awk '{s+="0x"$3} END {printf "%\047d\n", s}' 369,867,920 ``` That is an increase of 19%. DWARF Linkers need to be changed in order to support this. This commit already brings support to "base" linker, but it does not attempt to modify the parallel linker. Accelerator entries refer to the corresponding DIE offset, and this patch also requires the parent DIE offset -- it's not clear how the parallel linker can access this. It may be obvious to someone familiar with it, but it would be nice to get help from its authors. [1]: https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151/
…-symbols (#78643) When undefined functions exist in the final link we need to create stub functions (otherwise direct calls to those functions could not be generated). We were creating those stub when `--unresolved-symbols=ignore-all` was passed but overlooked the fact that `--warn-unresolved-symbols` essentially has the same effect (i.e. undefined function can exist in the final link). Fixes: #53987
…#78351) TSan's shadow mappings only support 30-bits of ASLR entropy on x86 Linux, and it is not practical to support the maximum of 32-bits (due to pointer compression and the overhead of shadow mappings). Instead, this patch changes TSan to re-exec without ASLR if it encounters an incompatible memory layout, as suggested by Dmitry in google/sanitizers#1716. If ASLR is already disabled but the memory layout is still incompatible, it will abort. This patch involves a bit of refactoring, because the old code is: 1. InitializePlatformEarly() 2. InitializeAllocator() 3. InitializePlatform(): CheckAndProtect() but it may already segfault during InitializeAllocator() if the memory layout is incompatible, before we get a chance to check in CheckAndProtect(). This patch adds CheckAndProtect() during InitializePlatformEarly(), before the allocator is initialized. Naturally, it is necessary to ensure that CheckAndProtect() does *not* allow the heap regions to be occupied here, hence we generalize CheckAndProtect() to optionally check the heap regions. We keep the original behavior of CheckAndProtect() in InitializePlatform() as a last line of defense. We need to be careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec for other reasons (e.g., unlimited stack size); we implement this by moving all the re-exec logic into ReExecIfNeeded().
Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
etiotto
approved these changes
Feb 5, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.