Merge remote-tracking branch 'upstream/sycl' into 'sycl-mlir' #12293

victor-eds · 2024-01-04T11:24:49Z

Review only files with conflict:

.github/workflows/sycl_mlir_linux_precommit.yml
mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
mlir/test/lib/Transforms/TestIntRangeInference.cpp
sycl/include/sycl/handler.hpp
sycl/test-e2e/Basic/built-ins/vec_common.cpp
sycl/test-e2e/Basic/built-ins/vec_geometric.cpp
sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp
sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp
sycl/test-e2e/format.py

And commits:

Add four extra option resets in `disableUnsupportedOptions` when parsing SYCL code. All four are needed to make clangd happily parse code when `-fsycl` is specified in compile commands. Without each of them, the following happens: * without `SYCLIsDevice=false`: crash on !nullptr assert * without `DeclareSPIRVBuiltins=false`: crash on !nullptr assert * without `SYCLUnnamedLambda=false`: `ref_non_value` error, `'T' does not refer to a value` * without `Triple=HostTriple`: `pp_file_not_found` error, `'gnu/stubs-32.h' file not found` Fixes intel#11088 and clangd/clangd#1097 Also add a LIT test which parses the following code: ```cpp #include <sycl/sycl.hpp> sycl::queue q{}; ``` and checks `symbolInfo` for the symbol `q` against a reference.

llvm/llvm-project#73077 added -Wswitch-default diagnostic but it produced false positives in templates. This PR will address that. llvm/llvm-project#75943

We can't distinguish UAR and UAS, but by definition UAR is already UAS.

Similar to `vector.transfer_read`/`vector.transfer_write`, allow 0-D vectors. This commit fixes `mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir` when verifying the IR after each pattern (#74270). That test produces a temporary 0-D load/store op.

The runners-32 group is broken, for reasons... The easiest fix is to move the jobs to runners-8. (which needs to be renamed, because they're all actually 30 core machines)

`VEX_4V` does not look simpler than `VEX, VVVV`. It's kind of confusing b/c classes like `VEX_L`, `VEX_LIG` do not imply `VEX` but it does. For APX, we have promote EVEX, NDD, NF and NDD_NF instructions. All of the 4 variants are in EVEX space and NDD/NDD_NF set the VVVV fields. To extract the common fields (e.g EVEX) into a class and set VVVV conditionally, we need VVVV to not imply other prefixes.

This change adds (runtime) bounds checks for `memref` ops using the existing `RuntimeVerifiableOpInterface`. For `memref.load` and `memref.store`, we check that the indices are in-bounds of the memref's index space. For `memref.reinterpret_cast` and `memref.subview` we check that the resulting address space is in-bounds of the input memref's address space.

@progbits

…74686) `clang -g -gpubnames -fdebug-types-section` now emits .debug_names section with references to local type unit entries defined in COMDAT .debug_info sections. ``` .section .debug_info,"G",@progbits,5657452045627120676,comdat .Ltu_begin0: ... .section .debug_names,"",@progbits ... // DWARF32 .long .Ltu_begin0 # Type unit 0 // DWARF64 // .long .Ltu_begin0 # Type unit 0 ``` When `.Ltu_begin0` is relative to a non-prevailing .debug_info section, the relocation resolves to 0, which is a valid offset within the .debug_info section. ``` cat > a.cc <<e struct A { int x; }; inline A foo() { return {1}; } int main() { foo(); } e cat > b.cc <<e struct A { int x; }; inline A foo() { return {1}; } void use() { foo(); } e clang++ -g -gpubnames -fdebug-types-section -fuse-ld=lld a.cc b.cc -o old ``` ``` % llvm-dwarfdump old ... Local Type Unit offsets [ LocalTU[0]: 0x00000000 ] ... Local Type Unit offsets [ LocalTU[0]: 0x00000000 // indistinguishable from a valid offset within .debug_info ] ``` https://dwarfstd.org/issues/231013.1.html proposes that we use a tombstone value instead to inform consumers. This patch implements the idea. The second LocalTU entry will now use 0xffffffff. https://reviews.llvm.org/D84825 has a TODO that we should switch the tombstone value for most `.debug_*` sections to UINT64_MAX. We have postponed the change for more than three years for consumers to migrate. At some point we shall make the change, so that .debug_names is no long different from other debug section that is not .debug_loc/.debug_ranges. Co-authored-by: Alexander Yermolovich <ayermolo@meta.com>

This test demonstrates how the PhysicalStorageBuffer extension can be used end-2-end in a spir-v module. This module has been verified to pass serialization, deserialization, and validation with spirv-val.

The legalize action for uint_to_fp/sint_to_fp uses the source integer type, not the result FP type so setting an action on an FP type does nothing.

Issue #76202 This reverts commit f568763.

https://lab.llvm.org/buildbot/#/builders/77/builds/33152/steps/21/logs/stdio

This patch removes the explicit llvm-objdump man page. By enabling sphinx man page output with `-DLLVM_ENABLE_SPHINX=ON` and `-DSPHINX_OUTPUT_MAN=ON`, we can generate man pages for all the llvm binary utilities from the restructured text documentation. Having an additional man page upstream increases fragementation and maintenance.

After c718336 the output of print-before/print-after was changed. A semicolon was added before the banner "*** IR Dump ..." and this change broke chunk-print-before-all script. This change makes it more resilient to small output format variations.

This avoids a regression in a future commit

CONFLICT (content): Merge conflict in openmp/libomptarget/test/lit.cfg

R16-R31 was added into GPRs in llvm/llvm-project#70958, This patch supports the encoding/decoding for promoted CET instruction in EVEX space. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4

And fusion features are added to processor definition.

add sycl-rel-** to the pre-commit/post-commit workflow supported branches

We split `TuneShiftedZExtFusion` into three fusions to make them reusable and match the GCC implementation[1]. The zexth/zextw fusions can be reused by XiangShan[2] and other commercial processors, but shifted zero extension is not so common. `macro-fusions-veyron-v1.mir` is renamed so it's not relevant to specific processor. References: [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637303.html [2] https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/decode

Before this patch, there was a regression in comment formatting due to some code formatting in bd3e8eb. This was fixed in 428660c. Github interprets a tab before a string as starting code formatting. The message that indicted the code formatting in a PR had been fixed was refactored to a python multi-line string, but with a tab in front, causing these messages to be rendered as code blocks in Github, instead of as intended. This patch builds upon the original fix to reformat the strings so that they fit within ~80 character lines and are simpler to modify in the future, hopefully removing traps like the one that caused the original issue.

… (#76125) R16-R31 was added into GPRs in llvm/llvm-project#70958, This patch supports the encoding/decoding for promoted CMPCCXADD instruction in EVEX space. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4

LLVM: llvm/llvm-project@bc8c4bb SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@e63fdb8

intel#12219) Co-authored-by: Artur Gainullin <artur.gainullin@intel.com>

…2211) Apply intel#11623 to `annotated_arg` class

…erent types (intel#12244)

Signed-off-by: Victor Perez <victor.perez@codeplay.com>

…onAnalysis` Latest upstream changes made data-flow analyses interprocedural. Our `setToEntryState()` function is not working as expected, which is causing issues. Overriding this function and using previous behavior is thus needed for our analysis to work. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

Update tests to use `scf.reduce` instead of `scf.yield` in `scf.parallel` operations. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

…s()` Define shortcut to get alias analysis being used in accessor analysis in order to query alias between accessors.

Drop following passing tests from list: - `Basic/half_type.cpp` - `Basic/image/image_accessor_readwrite_half.cpp` - `Basic/image/image_write_fp16.cpp` - `ESIMD/accessor_load_store_stateless.cpp` - `ESIMD/api/svm_gather_scatter_64.cpp` - `ESIMD/regression/globals.cpp` - `InvokeSimd/Spec/ImplicitSubgroup/uniform_retval.cpp` - `NonUniformGroups/fixed_size_group_algorithms.cpp` Signed-off-by: Victor Perez <victor.perez@codeplay.com>

Add address space to global constants so that they can be translated to SPIR-V. Also add alignment in the process. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

cgeist `-O0` E2E tests are failing. `XFAIL` them till further investigation. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

- `ESIMD/fp_controls.cpp` - `ESIMD/noinline_call_from_func.cpp` - `ESIMD/private_memory/private_memory.cpp` - `ESIMD/regression/bfloat16_half_vector_plus_eq_scalar.cpp` - `ESIMD/unified_memory_api/atomic_update_acc.cpp` - `ESIMD/unified_memory_api/atomic_update_acc_64.cpp` - `ESIMD/unified_memory_api/atomic_update_acc_cmpxchg.cpp` - `ESIMD/unified_memory_api/atomic_update_slm.cpp` - `ESIMD/unified_memory_api/atomic_update_slm_acc.cpp` - `ESIMD/unified_memory_api/atomic_update_slm_acc_cmpxchg.cpp` - `ESIMD/unified_memory_api/atomic_update_slm_cmpxchg.cpp` - `ESIMD/unified_memory_api/block_store_slm.cpp` - `ESIMD/unified_memory_api/block_store_slm_acc.cpp` - `Graph/Explicit/sub_group_prop.cpp` - `Graph/Explicit/work_group_size_prop.cpp` - `Graph/RecordReplay/sub_group_prop.cpp` - `Graph/RecordReplay/work_group_size_prop.cpp` - `InvokeSimd/Regression/address_space_cast.cpp` - `KernelAndProgram/disable-caching.cpp` - `Plugin/level_zero_queue_profiling.cpp` - `Plugin/level_zero_usm_residency.cpp` Signed-off-by: Victor Perez <victor.perez@codeplay.com>

victor-eds · 2024-01-04T11:36:13Z

Diff is too big to be rendered by browser, so here's the diff for each conflicting file:

diff --git a/.github/workflows/sycl_mlir_linux_precommit.yml b/.github/workflows/sycl_mlir_linux_precommit.yml
index 23e3a20e6627..f8f64cd7a8bb 100644
--- a/.github/workflows/sycl_mlir_linux_precommit.yml
+++ b/.github/workflows/sycl_mlir_linux_precommit.yml
@@ -20,12 +22,18 @@ on:
     - 'devops/containers/**'
     - 'devops/actions/build_container/**'
 
+concurrency:
+  #  Cancel a currently running workflow from the same PR, branch or tag.
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
 jobs:
   detect_changes:
     uses: ./.github/workflows/sycl_detect_changes.yml
 
   lint:
     runs-on: [Linux, build]
+    if: ${{ always() && !contains(github.event.pull_request.labels.*.name, 'disable-lint') }}
     container:
       image: ghcr.io/intel/llvm/sycl_ubuntu2204_nightly:no-drivers
       options: -u 1001:1001
@@ -56,7 +64,7 @@ jobs:
     needs: [lint, detect_changes]
     if: |
       always()
-      && (success() || contains(github.event.pull_request.labels.*.name, 'disable-lint'))
+      && (success() || needs.lint.result == 'skipped')
     uses: ./.github/workflows/sycl_linux_build.yml
     with:
       build_ref: ${{ github.sha }}
@@ -64,6 +72,8 @@ jobs:
       build_cache_root: "/__w/"
       build_artifact_suffix: "default"
       build_cache_suffix: "default"
+      # FIXME: UR can't find /opt/rocm/hsa with rocm-6.0.0. Previous rocm-5.7.0 had it.
+      build_image: "ghcr.io/intel/llvm/ubuntu2204_build:7ed894ab0acc8ff09262113fdb08940d22654a30"
       changes: ${{ needs.detect_changes.outputs.filters }}
 
   test:
diff --git a/mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp b/mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
index 0cc8ea886093..65c1daee6711 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
@@ -50,11 +50,31 @@ static bool hasLifetimeMarkers(LLVM::AllocaOp allocaOp) {
 static void
 handleInlinedAllocas(Operation *call,
                      iterator_range<Region::iterator> inlinedBlocks) {
+  // Locate the entry block of the closest callsite ancestor that has either the
+  // IsolatedFromAbove or AutomaticAllocationScope trait. In pure LLVM dialect
+  // programs, this is the LLVMFuncOp containing the call site. However, in
+  // mixed-dialect programs, the callsite might be nested in another operation
+  // that carries one of these traits. In such scenarios, this traversal stops
+  // at the closest ancestor with either trait, ensuring visibility post
+  // relocation and respecting allocation scopes.
+  Block *callerEntryBlock = nullptr;
+  Operation *currentOp = call;
+  while (Operation *parentOp = currentOp->getParentOp()) {
+    if (parentOp->mightHaveTrait<OpTrait::IsIsolatedFromAbove>() ||
+        parentOp->mightHaveTrait<OpTrait::AutomaticAllocationScope>()) {
+      callerEntryBlock = &currentOp->getParentRegion()->front();
+      break;
+    }
+    currentOp = parentOp;
+  }
+
+  // Avoid relocating the alloca operations if the call has been inlined into
+  // the entry block already, which is typically the encompassing
+  // LLVM function, or if the relevant entry block cannot be identified.
   Block *calleeEntryBlock = &(*inlinedBlocks.begin());
-  Block *callerEntryBlock = &(*calleeEntryBlock->getParent()->begin());
-  if (calleeEntryBlock == callerEntryBlock)
-    // Nothing to do.
+  if (!callerEntryBlock || callerEntryBlock == calleeEntryBlock)
     return;
+
   SmallVector<std::tuple<LLVM::AllocaOp, IntegerAttr, bool>> allocasToMove;
   bool shouldInsertLifetimes = false;
   bool hasDynamicAlloca = false;
@@ -496,14 +516,14 @@ static void handleAccessGroups(Operation *call,
 /// If `requestedAlignment` is higher than the alignment specified on `alloca`,
 /// realigns `alloca` if this does not exceed the natural stack alignment.
 /// Returns the post-alignment of `alloca`, whether it was realigned or not.
-static unsigned tryToEnforceAllocaAlignment(LLVM::AllocaOp alloca,
-                                            unsigned requestedAlignment,
+static uint64_t tryToEnforceAllocaAlignment(LLVM::AllocaOp alloca,
+                                            uint64_t requestedAlignment,
                                             DataLayout const &dataLayout) {
-  unsigned allocaAlignment = alloca.getAlignment().value_or(1);
+  uint64_t allocaAlignment = alloca.getAlignment().value_or(1);
   if (requestedAlignment <= allocaAlignment)
     // No realignment necessary.
     return allocaAlignment;
-  unsigned naturalStackAlignmentBits = dataLayout.getStackAlignment();
+  uint64_t naturalStackAlignmentBits = dataLayout.getStackAlignment();
   // If the natural stack alignment is not specified, the data layout returns
   // zero. Optimistically allow realignment in this case.
   if (naturalStackAlignmentBits == 0 ||
@@ -525,7 +545,7 @@ static unsigned tryToEnforceAllocaAlignment(LLVM::AllocaOp alloca,
 /// the pointer, then returns the resulting post-alignment, regardless of
 /// whether it was realigned or not. If no existing alignment attribute is
 /// found, returns 1 (i.e., assume that no alignment is guaranteed).
-static unsigned tryToEnforceAlignment(Value value, unsigned requestedAlignment,
+static uint64_t tryToEnforceAlignment(Value value, uint64_t requestedAlignment,
                                       DataLayout const &dataLayout) {
   if (Operation *definingOp = value.getDefiningOp()) {
     if (auto alloca = dyn_cast<LLVM::AllocaOp>(definingOp))
@@ -557,8 +577,8 @@ static unsigned tryToEnforceAlignment(Value value, unsigned requestedAlignment,
 /// the address of the new alloca, then returns the value of the new alloca.
 static Value handleByValArgumentInit(OpBuilder &builder, Location loc,
                                      Value argument, Type elementType,
-                                     unsigned elementTypeSize,
-                                     unsigned targetAlignment) {
+                                     uint64_t elementTypeSize,
+                                     uint64_t targetAlignment) {
   // Allocate the new value on the stack.
   Value allocaOp;
   {
@@ -587,7 +607,7 @@ static Value handleByValArgumentInit(OpBuilder &builder, Location loc,
 /// attribute (or 1 if no align attribute was set).
 static Value handleByValArgument(OpBuilder &builder, Operation *callable,
                                  Value argument, Type elementType,
-                                 unsigned requestedAlignment) {
+                                 uint64_t requestedAlignment) {
   auto func = cast<LLVM::LLVMFuncOp>(callable);
   LLVM::MemoryEffectsAttr memoryEffects = func.getMemoryAttr();
   // If there is no memory effects attribute, assume that the function is
@@ -597,16 +617,16 @@ static Value handleByValArgument(OpBuilder &builder, Operation *callable,
                     memoryEffects.getArgMem() != LLVM::ModRefInfo::Mod;
   // Check if there's an alignment mismatch requiring us to copy.
   DataLayout dataLayout = DataLayout::closest(callable);
-  unsigned minimumAlignment = dataLayout.getTypeABIAlignment(elementType);
+  uint64_t minimumAlignment = dataLayout.getTypeABIAlignment(elementType);
   if (isReadOnly) {
     if (requestedAlignment <= minimumAlignment)
       return argument;
-    unsigned currentAlignment =
+    uint64_t currentAlignment =
         tryToEnforceAlignment(argument, requestedAlignment, dataLayout);
     if (currentAlignment >= requestedAlignment)
       return argument;
   }
-  unsigned targetAlignment = std::max(requestedAlignment, minimumAlignment);
+  uint64_t targetAlignment = std::max(requestedAlignment, minimumAlignment);
   return handleByValArgumentInit(builder, func.getLoc(), argument, elementType,
                                  dataLayout.getTypeSize(elementType),
                                  targetAlignment);
@@ -632,8 +652,7 @@ struct LLVMInlinerInterface : public DialectInlinerInterface {
                        bool wouldBeCloned) const final {
     if (!wouldBeCloned)
       return false;
-    auto callOp = dyn_cast<LLVM::CallOp>(call);
-    if (!callOp) {
+    if (!isa<LLVM::CallOp>(call)) {
       LLVM_DEBUG(llvm::dbgs()
                  << "Cannot inline: call is not an LLVM::CallOp\n");
       return false;
@@ -684,7 +703,6 @@ struct LLVMInlinerInterface : public DialectInlinerInterface {
     return true;
   }
 
-  /// All operations in the LLVM dialect are legal to inline.
   bool isLegalToInline(Operation *op, Region *, bool, IRMapping &) const final {
     return true;
   }
@@ -706,15 +724,13 @@ struct LLVMInlinerInterface : public DialectInlinerInterface {
   /// Handle the given inlined return by replacing the uses of the call with the
   /// operands of the return. This overload is called when the inlined region
   /// only contains one block.
-  void handleTerminator(Operation *op,
-                        ArrayRef<Value> valuesToRepl) const final {
+  void handleTerminator(Operation *op, ValueRange valuesToRepl) const final {
     // Return will be the only terminator present.
     auto returnOp = cast<LLVM::ReturnOp>(op);
 
     // Replace the values directly with the return operands.
     assert(returnOp.getNumOperands() == valuesToRepl.size());
-    for (const auto &[dst, src] :
-         llvm::zip(valuesToRepl, returnOp.getOperands()))
+    for (auto [dst, src] : llvm::zip(valuesToRepl, returnOp.getOperands()))
       dst.replaceAllUsesWith(src);
   }
 
@@ -724,7 +740,7 @@ struct LLVMInlinerInterface : public DialectInlinerInterface {
     if (std::optional<NamedAttribute> attr =
             argumentAttrs.getNamed(LLVM::LLVMDialect::getByValAttrName())) {
       Type elementType = cast<TypeAttr>(attr->getValue()).getValue();
-      unsigned requestedAlignment = 1;
+      uint64_t requestedAlignment = 1;
       if (std::optional<NamedAttribute> alignAttr =
               argumentAttrs.getNamed(LLVM::LLVMDialect::getAlignAttrName())) {
         requestedAlignment = cast<IntegerAttr>(alignAttr->getValue())
diff --git a/mlir/test/lib/Transforms/TestIntRangeInference.cpp b/mlir/test/lib/Transforms/TestIntRangeInference.cpp
index 7a0c97913af4..5758f6acf2f0 100644
--- a/mlir/test/lib/Transforms/TestIntRangeInference.cpp
+++ b/mlir/test/lib/Transforms/TestIntRangeInference.cpp
@@ -40,9 +40,8 @@ static LogicalResult replaceWithConstant(DataFlowSolver &solver, OpBuilder &b,
       maybeDefiningOp ? maybeDefiningOp->getDialect()
                       : value.getParentRegion()->getParentOp()->getDialect();
   Attribute constAttr = b.getIntegerAttr(value.getType(), *maybeConstValue);
-  Value constant =
-      folder.getOrCreateConstant(b.getInsertionBlock(), valueDialect, constAttr,
-                                 value.getType(), value.getLoc());
+  Value constant = folder.getOrCreateConstant(
+      b.getInsertionBlock(), valueDialect, constAttr, value.getType());
   if (!constant)
     return failure();
 
@@ -108,7 +107,7 @@ struct TestIntRangeInference
     Operation *op = getOperation();
     DataFlowSolver solver;
     solver.load<DeadCodeAnalysis>();
-    solver.load<dataflow::SparseConstantPropagation>();
+    solver.load<SparseConstantPropagation>();
     solver.load<IntegerRangeAnalysis>();
     if (failed(solver.initializeAndRun(op)))
       return signalPassFailure();
diff --git a/sycl/include/sycl/handler.hpp b/sycl/include/sycl/handler.hpp
index 858f056d0196..5fb9d234322a 100644
--- a/sycl/include/sycl/handler.hpp
+++ b/sycl/include/sycl/handler.hpp
@@ -29,6 +29,7 @@
 #include <sycl/event.hpp>
 #include <sycl/exception.hpp>
 #include <sycl/exception_list.hpp>
+#include <sycl/ext/intel/experimental/fp_control_kernel_properties.hpp>
 #include <sycl/ext/intel/experimental/kernel_execution_properties.hpp>
 #include <sycl/ext/oneapi/bindless_images_descriptor.hpp>
 #include <sycl/ext/oneapi/bindless_images_interop.hpp>
@@ -913,11 +914,20 @@ private:
   ///
   /// Stores information about kernel properties into the handler.
   template <
+      typename KernelName,
       typename PropertiesT = ext::oneapi::experimental::empty_properties_t>
   void processProperties(PropertiesT Props) {
+    using KI = detail::KernelInfo<KernelName>;
     static_assert(
         ext::oneapi::experimental::is_property_list<PropertiesT>::value,
         "Template type is not a property list.");
+    static_assert(
+        !PropertiesT::template has_property<
+            sycl::ext::intel::experimental::fp_control_key>() ||
+            (PropertiesT::template has_property<
+                 sycl::ext::intel::experimental::fp_control_key>() &&
+             KI::isESIMD()),
+        "Floating point control property is supported for ESIMD kernels only.");
     if constexpr (PropertiesT::template has_property<
                       sycl::ext::intel::experimental::cache_config_key>()) {
       auto Config = Props.template get_property<
@@ -1111,9 +1121,12 @@ private:
   };
 
   std::optional<std::array<size_t, 3>> getMaxWorkGroups();
+  // We need to use this version to support gcc 7.5.0. Remove when minimal
+  // supported gcc version is bumped.
+  std::tuple<std::array<size_t, 3>, bool> getMaxWorkGroups_v2();
 
   template <int Dims>
-  std::optional<range<Dims>> getRoundedRange(range<Dims> UserRange) {
+  std::tuple<range<Dims>, bool> getRoundedRange(range<Dims> UserRange) {
     range<Dims> RoundedRange = UserRange;
     // Disable the rounding-up optimizations under these conditions:
     // 1. The env var SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING is set.
@@ -1133,7 +1146,7 @@ private:
 
     // Perform range rounding if rounding-up is enabled.
     if (this->DisableRangeRounding())
-      return {};
+      return {range<Dims>{}, false};
 
     // Range should be a multiple of this for reasonable performance.
     size_t MinFactorX = 16;
@@ -1157,8 +1170,8 @@ private:
     // kernel in a 32-bit global range.
     auto Dev = detail::getSyclObjImpl(detail::getDeviceFromHandler(*this));
     id<Dims> MaxNWGs = [&] {
-      auto PiResult = getMaxWorkGroups();
-      if (!PiResult.has_value()) {
+      auto [MaxWGs, HasMaxWGs] = getMaxWorkGroups_v2();
+      if (!HasMaxWGs) {
         id<Dims> Default;
         for (int i = 0; i < Dims; ++i)
           Default[i] = (std::numeric_limits<int32_t>::max)();
@@ -1168,7 +1181,7 @@ private:
       id<Dims> IdResult;
       size_t Limit = (std::numeric_limits<int>::max)();
       for (int i = 0; i < Dims; ++i)
-        IdResult[i] = (std::min)(Limit, (*PiResult)[Dims - i - 1]);
+        IdResult[i] = (std::min)(Limit, MaxWGs[Dims - i - 1]);
       return IdResult;
     }();
     auto M = (std::numeric_limits<uint32_t>::max)();
@@ -1204,8 +1217,8 @@ private:
         Adjust(i, MaxRange[i]);
 
     if (!DidAdjust)
-      return {};
-    return RoundedRange;
+      return {range<Dims>{}, false};
+    return {RoundedRange, true};
   }
 
   /// Defines and invokes a SYCL kernel function for the specified range.
@@ -1253,9 +1266,10 @@ private:
                   "first argument of sycl::item type, or of a type which is "
                   "implicitly convertible from sycl::item");
 
+    using RefLambdaArgType = std::add_lvalue_reference_t<LambdaArgType>;
     static_assert(
-        (std::is_invocable_v<KernelType, LambdaArgType> ||
-         std::is_invocable_v<KernelType, LambdaArgType, kernel_handler>),
+        (std::is_invocable_v<KernelType, RefLambdaArgType> ||
+         std::is_invocable_v<KernelType, RefLambdaArgType, kernel_handler>),
         "SYCL kernel lambda/functor has an unexpected signature, it should be "
         "invocable with sycl::item and optionally sycl::kernel_handler");
 #endif
@@ -1273,7 +1287,8 @@ private:
 #if !defined(__SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING__) &&                  \
     !defined(DPCPP_HOST_DEVICE_OPENMP) &&                                      \
     !defined(DPCPP_HOST_DEVICE_PERF_NATIVE) && SYCL_LANGUAGE_VERSION >= 202001
-    if (auto RoundedRange = getRoundedRange(UserRange)) {
+    auto [RoundedRange, HasRoundedRange] = getRoundedRange(UserRange);
+    if (HasRoundedRange) {
       using NameWT = typename detail::get_kernel_wrapper_name_t<NameT>::name;
       auto Wrapper =
           getRangeRoundedKernelLambda<NameWT, TransformedArgType, Dims>(
@@ -1291,7 +1306,7 @@ private:
       // __SYCL_ASSUME_INT can still be violated. So check the bounds
       // of the user range, instead of the rounded range.
       detail::checkValueRange<Dims>(UserRange);
-      MNDRDesc.set(*RoundedRange);
+      MNDRDesc.set(RoundedRange);
       StoreLambda<KName, decltype(Wrapper), Dims, TransformedArgType>(
           std::move(Wrapper));
       setType(detail::CG::Kernel);
@@ -1306,7 +1321,7 @@ private:
       kernel_parallel_for_wrapper<NameT, TransformedArgType, KernelType,
                                   PropertiesT>(KernelFunc);
 #ifndef __SYCL_DEVICE_ONLY__
-      processProperties<PropertiesT>(Props);
+      processProperties<NameT, PropertiesT>(Props);
       detail::checkValueRange<Dims>(UserRange);
       MNDRDesc.set(std::move(UserRange));
       StoreLambda<NameT, KernelType, Dims, TransformedArgType>(
@@ -1361,7 +1376,7 @@ private:
     kernel_parallel_for_wrapper<NameT, TransformedArgType, KernelType,
                                 PropertiesT>(KernelFunc);
 #ifndef __SYCL_DEVICE_ONLY__
-    processProperties<PropertiesT>(Props);
+    processProperties<NameT, PropertiesT>(Props);
     detail::checkValueRange<Dims>(ExecutionRange);
     MNDRDesc.set(std::move(ExecutionRange));
     StoreLambda<NameT, KernelType, Dims, TransformedArgType>(
@@ -1418,7 +1433,7 @@ private:
     kernel_parallel_for_work_group_wrapper<NameT, LambdaArgType, KernelType,
                                            PropertiesT>(KernelFunc);
 #ifndef __SYCL_DEVICE_ONLY__
-    processProperties<PropertiesT>(Props);
+    processProperties<NameT, PropertiesT>(Props);
     detail::checkValueRange<Dims>(NumWorkGroups);
     MNDRDesc.setNumWorkGroups(NumWorkGroups);
     StoreLambda<NameT, KernelType, Dims, LambdaArgType>(std::move(KernelFunc));
@@ -1458,7 +1473,7 @@ private:
     kernel_parallel_for_work_group_wrapper<NameT, LambdaArgType, KernelType,
                                            PropertiesT>(KernelFunc);
 #ifndef __SYCL_DEVICE_ONLY__
-    processProperties<PropertiesT>(Props);
+    processProperties<NameT, PropertiesT>(Props);
     nd_range<Dims> ExecRange =
         nd_range<Dims>(NumWorkGroups * WorkGroupSize, WorkGroupSize);
     detail::checkValueRange<Dims>(ExecRange);
@@ -1638,8 +1653,8 @@ private:
   //   * Provide explicit template type parameters for the call
   //
   // Couldn't think of a better way to achieve both.
-  template <typename KernelType, typename PropertiesT, bool HasKernelHandlerArg,
-            typename FuncTy>
+  template <typename KernelName, typename KernelType, typename PropertiesT,
+            bool HasKernelHandlerArg, typename FuncTy>
   void unpack(_KERNELFUNCPARAM(KernelFunc), FuncTy Lambda) {
 #ifdef __SYCL_DEVICE_ONLY__
     detail::CheckDeviceCopyable<KernelType>();
@@ -1648,13 +1663,15 @@ private:
         typename detail::GetMergedKernelProperties<KernelType,
                                                    PropertiesT>::type;
     using Unpacker = KernelPropertiesUnpacker<MergedPropertiesT>;
+#ifndef __SYCL_DEVICE_ONLY__
     // If there are properties provided by get method then process them.
     if constexpr (ext::oneapi::experimental::detail::
                       HasKernelPropertiesGetMethod<
                           _KERNELFUNCPARAMTYPE>::value) {
-      processProperties(
+      processProperties<KernelName>(
           KernelFunc.get(ext::oneapi::experimental::properties_tag{}));
     }
+#endif
     if constexpr (HasKernelHandlerArg) {
       kernel_handler KH;
       Lambda(Unpacker{}, this, KernelFunc, KH);
@@ -1670,7 +1687,7 @@ private:
       typename KernelName, typename KernelType,
       typename PropertiesT = ext::oneapi::experimental::empty_properties_t>
   void kernel_single_task_wrapper(_KERNELFUNCPARAM(KernelFunc)) {
-    unpack<KernelType, PropertiesT,
+    unpack<KernelName, KernelType, PropertiesT,
            detail::KernelLambdaHasKernelHandlerArgT<KernelType>::value>(
         KernelFunc, [&](auto Unpacker, auto... args) {
           Unpacker.template kernel_single_task_unpack<KernelName, KernelType>(
@@ -1682,7 +1699,7 @@ private:
       typename KernelName, typename ElementType, typename KernelType,
       typename PropertiesT = ext::oneapi::experimental::empty_properties_t>
   void kernel_parallel_for_wrapper(_KERNELFUNCPARAM(KernelFunc)) {
-    unpack<KernelType, PropertiesT,
+    unpack<KernelName, KernelType, PropertiesT,
            detail::KernelLambdaHasKernelHandlerArgT<KernelType,
                                                     ElementType>::value>(
         KernelFunc, [&](auto Unpacker, auto... args) {
@@ -1695,7 +1712,7 @@ private:
       typename KernelName, typename ElementType, typename KernelType,
       typename PropertiesT = ext::oneapi::experimental::empty_properties_t>
   void kernel_parallel_for_work_group_wrapper(_KERNELFUNCPARAM(KernelFunc)) {
-    unpack<KernelType, PropertiesT,
+    unpack<KernelName, KernelType, PropertiesT,
            detail::KernelLambdaHasKernelHandlerArgT<KernelType,
                                                     ElementType>::value>(
         KernelFunc, [&](auto Unpacker, auto... args) {
@@ -1729,7 +1746,7 @@ private:
     // No need to check if range is out of INT_MAX limits as it's compile-time
     // known constant.
     MNDRDesc.set(range<1>{1});
-    processProperties<PropertiesT>(Props);
+    processProperties<NameT, PropertiesT>(Props);
     StoreLambda<NameT, KernelType, /*Dims*/ 1, void>(KernelFunc);
     setType(detail::CG::Kernel);
 #endif
@@ -2332,7 +2349,6 @@ public:
   std::enable_if_t<
       ext::oneapi::experimental::is_property_list<PropertiesT>::value>
   single_task(PropertiesT Props, _KERNELFUNCPARAM(KernelFunc)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     single_task_lambda_impl<KernelName, KernelType, PropertiesT>(Props,
                                                                  KernelFunc);
   }
@@ -2343,7 +2359,6 @@ public:
       ext::oneapi::experimental::is_property_list<PropertiesT>::value>
   parallel_for(range<1> NumWorkItems __SYCL_ANNOTATE(range), PropertiesT Props,
                _KERNELFUNCPARAM(KernelFunc) __SYCL_ANNOTATE(kernel)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_lambda_impl<KernelName, KernelType, 1, PropertiesT>(
         NumWorkItems, Props, std::move(KernelFunc));
   }
@@ -2354,7 +2369,6 @@ public:
       ext::oneapi::experimental::is_property_list<PropertiesT>::value>
   parallel_for(range<2> NumWorkItems __SYCL_ANNOTATE(range), PropertiesT Props,
                _KERNELFUNCPARAM(KernelFunc) __SYCL_ANNOTATE(kernel)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_lambda_impl<KernelName, KernelType, 2, PropertiesT>(
         NumWorkItems, Props, std::move(KernelFunc));
   }
@@ -2365,7 +2379,6 @@ public:
       ext::oneapi::experimental::is_property_list<PropertiesT>::value>
   parallel_for(range<3> NumWorkItems __SYCL_ANNOTATE(range), PropertiesT Props,
                _KERNELFUNCPARAM(KernelFunc) __SYCL_ANNOTATE(kernel)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_lambda_impl<KernelName, KernelType, 3, PropertiesT>(
         NumWorkItems, Props, std::move(KernelFunc));
   }
@@ -2377,7 +2390,6 @@ public:
   parallel_for(nd_range<Dims> Range __SYCL_ANNOTATE(nd_range),
                PropertiesT Properties,
                _KERNELFUNCPARAM(KernelFunc) __SYCL_ANNOTATE(kernel)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_impl<KernelName>(Range, Properties, std::move(KernelFunc));
   }
 
@@ -2392,7 +2404,6 @@ public:
   parallel_for(range<1> Range, PropertiesT Properties, RestT &&...Rest) {
     throwIfGraphAssociated<ext::oneapi::experimental::detail::
                                UnsupportedGraphFeatures::sycl_reductions>();
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     detail::reduction_parallel_for<KernelName>(*this, Range, Properties,
                                                std::forward<RestT>(Rest)...);
   }
@@ -2406,7 +2417,6 @@ public:
   parallel_for(range<2> Range, PropertiesT Properties, RestT &&...Rest) {
     throwIfGraphAssociated<ext::oneapi::experimental::detail::
                                UnsupportedGraphFeatures::sycl_reductions>();
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     detail::reduction_parallel_for<KernelName>(*this, Range, Properties,
                                                std::forward<RestT>(Rest)...);
   }
@@ -2420,7 +2430,6 @@ public:
   parallel_for(range<3> Range, PropertiesT Properties, RestT &&...Rest) {
     throwIfGraphAssociated<ext::oneapi::experimental::detail::
                                UnsupportedGraphFeatures::sycl_reductions>();
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     detail::reduction_parallel_for<KernelName>(*this, Range, Properties,
                                                std::forward<RestT>(Rest)...);
   }
@@ -2481,7 +2490,6 @@ public:
             int Dims, typename PropertiesT>
   void parallel_for_work_group(range<Dims> NumWorkGroups, PropertiesT Props,
                                _KERNELFUNCPARAM(KernelFunc)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_work_group_lambda_impl<KernelName, KernelType, Dims,
                                         PropertiesT>(NumWorkGroups, Props,
                                                      KernelFunc);
@@ -2492,7 +2500,6 @@ public:
   void parallel_for_work_group(range<Dims> NumWorkGroups,
                                range<Dims> WorkGroupSize, PropertiesT Props,
                                _KERNELFUNCPARAM(KernelFunc)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_work_group_lambda_impl<KernelName, KernelType, Dims,
                                         PropertiesT>(
         NumWorkGroups, WorkGroupSize, Props, KernelFunc);
@@ -3641,17 +3648,6 @@ private:
                             "handler::require() before it can be used.");
   }
 
-  template <typename PropertiesT>
-  std::enable_if_t<
-      ext::oneapi::experimental::is_property_list<PropertiesT>::value>
-  throwIfGraphAssociatedAndKernelProperties() const {
-    if (!std::is_same_v<PropertiesT,
-                        ext::oneapi::experimental::empty_properties_t>)
-      throwIfGraphAssociated<
-          ext::oneapi::experimental::detail::UnsupportedGraphFeatures::
-              sycl_ext_oneapi_kernel_properties>();
-  }
-
   // Set value of the gpu cache configuration for the kernel.
   void setKernelCacheConfig(sycl::detail::pi::PiKernelCacheConfig);
 
diff --git a/sycl/test-e2e/Basic/built-ins/vec_common.cpp b/sycl/test-e2e/Basic/built-ins/vec_common.cpp
index 4ddb88883090..5cbc1efe98e9 100644
--- a/sycl/test-e2e/Basic/built-ins/vec_common.cpp
+++ b/sycl/test-e2e/Basic/built-ins/vec_common.cpp
@@ -1,7 +1,7 @@
 // RUN: %{build} -o %t.out
 // RUN: %{run} %t.out
-// RUN: %if preview-breaking-changes-supported %{ %clangxx -fsycl -fpreview-breaking-changes %s -o %t2.out %}
-// RUN: %if preview-breaking-changes-supported %{  %{run} %t2.out %}
+// RUN: %if preview-breaking-changes-supported %{ %{build} -fpreview-breaking-changes -o %t2.out %}
+// RUN: %if preview-breaking-changes-supported %{ %{run} %t2.out %}
 
 // https://github.com/intel/llvm/issues/7585 to fix the failure:
 // XFAIL: gpu
@@ -60,6 +60,88 @@ int main() {
 
   // sycl::clamp
   TEST(sycl::clamp, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, va2, va3);
+  TEST(sycl::clamp, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, 1.0f, 3.0f);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::clamp, double, 2, EXPECTED(double, 1.0, 2.0), 0, va4, 1.0, 3.0);
+  }
+  // sycl::degrees
+  TEST(sycl::degrees, float, 3, EXPECTED(float, 180, 180, 180), 0, va5);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::degrees, double, 3, EXPECTED(double, 180, 180, 180), 0, va6);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::degrees, sycl::half, 3, EXPECTED(sycl::half, 180, 180, 180), 0.2,
+         va7);
+  }
+  // sycl::max
+  TEST(sycl::max, float, 2, EXPECTED(float, 3.0f, 2.0f), 0, va1, va3);
+  TEST(sycl::max, float, 2, EXPECTED(float, 1.5f, 2.0f), 0, va1, 1.5f);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::max, double, 2, EXPECTED(double, 1.5, 2.0), 0, va4, 1.5);
+  }
+  // sycl::min
+  TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, va3);
+  TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 1.5f), 0, va1, 1.5f);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::min, double, 2, EXPECTED(double, 1.0, 1.5), 0, va4, 1.5);
+  }
+  // sycl::mix
+  TEST(sycl::mix, float, 2, EXPECTED(float, 1.6f, 2.0f), 0, va1, va3, va8);
+  TEST(sycl::mix, float, 2, EXPECTED(float, 1.4f, 2.0f), 0, va1, va3, 0.2);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::mix, double, 2, EXPECTED(double, 3.0, 5.0), 0, va4, va9, 0.5);
+  }
+  // sycl::radians
+  TEST(sycl::radians, float, 3, EXPECTED(float, M_PI, M_PI, M_PI), 0, va10);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::radians, double, 3, EXPECTED(double, M_PI, M_PI, M_PI), 0, va11);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::radians, sycl::half, 3, EXPECTED(sycl::half, M_PI, M_PI, M_PI),
+         0.002, va12);
+  }
+  // sycl::step
+  TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va1, va3);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::step, double, 2, EXPECTED(double, 1.0, 1.0), 0, va4, va9);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
+         va12, va13);
+  }
+  TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 0.0f), 0, 2.5f, va3);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::step, double, 2, EXPECTED(double, 0.0f, 1.0f), 0, 6.0f, va9);
+  }
+  // sycl::smoothstep
+  TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va8, va1,
+       va2);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0, 1.0f), 0.00000001,
+         va4, va9, va9);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::smoothstep, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0),
+         0, va7, va12, va13);
+  }
+  TEST(sycl::smoothstep, float, 2, EXPECTED(float, 0.0553936f, 0.0f), 0.0000001,
+       2.5f, 6.0f, va3);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::smoothstep, double, 2, EXPECTED(double, 0.0f, 1.0f), 0, 6.0f,
+         8.0f, va9);
+  }
+  // sign
+  TEST(sycl::sign, float, 2, EXPECTED(float, +0.0f, -1.0f), 0, va14);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::sign, double, 2, EXPECTED(double, -0.0, 1.0), 0, va15);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::sign, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0), 0,
+         va12);
+  }
+
+#ifdef __INTEL_PREVIEW_BREAKING_CHANGES
+  // sycl::clamp swizzled
   TEST(sycl::clamp, float, 2, EXPECTED(float, 3.0f, 2.0f), 0,
        va16.swizzle<1, 0>(), va2, va3);
   TEST(sycl::clamp, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, va2,
@@ -70,59 +152,46 @@ int main() {
        va16.swizzle<1, 0>(), va16.swizzle<1, 0>());
   TEST(sycl::clamp, float, 2, EXPECTED(float, 360.0f, 180.0f), 0,
        va16.swizzle<1, 0>(), va16.swizzle<1, 0>(), va16.swizzle<1, 0>());
-  TEST(sycl::clamp, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, 1.0f, 3.0f);
   TEST(sycl::clamp, float, 2, EXPECTED(float, 3.0f, 3.0f), 0,
        va16.swizzle<1, 0>(), 1.0f, 3.0f);
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::clamp, double, 2, EXPECTED(double, 1.0, 2.0), 0, va4, 1.0, 3.0);
     TEST(sycl::clamp, double, 2, EXPECTED(double, 3.0, 3.0), 0,
          va11.swizzle<1, 0>(), 1.0, 3.0);
   }
-  // sycl::degrees
-  TEST(sycl::degrees, float, 3, EXPECTED(float, 180, 180, 180), 0, va5);
+  // sycl::degrees swizzled
   TEST(sycl::degrees, float, 2, EXPECTED(float, 180, 180), 0,
        va5.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::degrees, double, 3, EXPECTED(double, 180, 180, 180), 0, va6);
     TEST(sycl::degrees, double, 2, EXPECTED(double, 180, 180), 0,
          va6.swizzle<1, 0>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::degrees, sycl::half, 3, EXPECTED(sycl::half, 180, 180, 180), 0.2,
-         va7);
     TEST(sycl::degrees, sycl::half, 2, EXPECTED(sycl::half, 180, 180), 0.2,
          va7.swizzle<1, 0>());
   }
-  // sycl::max
-  TEST(sycl::max, float, 2, EXPECTED(float, 3.0f, 2.0f), 0, va1, va3);
+  // sycl::max swizzled
   TEST(sycl::max, float, 2, EXPECTED(float, 360.0f, 180.0f), 0,
        va16.swizzle<1, 0>(), va3);
   TEST(sycl::max, float, 2, EXPECTED(float, 360.0f, 180.0f), 0, va1,
        va16.swizzle<1, 0>());
-  TEST(sycl::max, float, 2, EXPECTED(float, 1.5f, 2.0f), 0, va1, 1.5f);
   TEST(sycl::max, float, 2, EXPECTED(float, 360.0f, 190.0f), 0,
        va16.swizzle<1, 0>(), 190.0f);
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::max, double, 2, EXPECTED(double, 1.5, 2.0), 0, va4, 1.5);
     TEST(sycl::max, double, 2, EXPECTED(double, 360.0, 190.0), 0,
          va17.swizzle<1, 0>(), 190.0);
   }
-  // sycl::min
-  TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, va3);
+  // sycl::min swizzled
   TEST(sycl::min, float, 2, EXPECTED(float, 3.0f, 2.0f), 0,
        va16.swizzle<1, 0>(), va3);
   TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1,
        va16.swizzle<1, 0>());
-  TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 1.5f), 0, va1, 1.5f);
   TEST(sycl::min, float, 2, EXPECTED(float, 190.0f, 180.0f), 0,
        va16.swizzle<1, 0>(), 190.0f);
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::min, double, 2, EXPECTED(double, 1.0, 1.5), 0, va4, 1.5);
     TEST(sycl::min, double, 2, EXPECTED(double, 190.0f, 180.0f), 0,
          va17.swizzle<1, 0>(), 190.0);
   }
-  // sycl::mix
-  TEST(sycl::mix, float, 2, EXPECTED(float, 1.6f, 2.0f), 0, va1, va3, va8);
+  // sycl::mix swizzled
   TEST(sycl::mix, float, 2, EXPECTED(float, 252.9f, 73.2f), 0,
        va16.swizzle<1, 0>(), va3, va8);
   TEST(sycl::mix, float, 2, EXPECTED(float, 252.9f, 73.2f), 0,
@@ -137,9 +206,7 @@ int main() {
        va16.swizzle<1, 0>(), va16.swizzle<1, 0>(), va18.swizzle<0, 1>());
   TEST(sycl::mix, float, 2, EXPECTED(float, 1.6f, 2.0f), 0, va1, va3,
        va18.swizzle<0, 1>());
-  TEST(sycl::mix, float, 2, EXPECTED(float, 1.4f, 2.0f), 0, va1, va3, 0.2);
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::mix, double, 2, EXPECTED(double, 3.0, 5.0), 0, va4, va9, 0.5);
     TEST(sycl::mix, double, 2, EXPECTED(double, 182.5, 94.0), 0,
          va17.swizzle<1, 0>(), va9, 0.5);
     TEST(sycl::mix, double, 2, EXPECTED(double, 180.5, 91.0), 0, va4,
@@ -147,23 +214,18 @@ int main() {
     TEST(sycl::mix, double, 2, EXPECTED(double, 360.0, 180.0), 0,
          va17.swizzle<1, 0>(), va17.swizzle<1, 0>(), 0.5);
   }
-  // sycl::radians
-  TEST(sycl::radians, float, 3, EXPECTED(float, M_PI, M_PI, M_PI), 0, va10);
+  // sycl::radians swizzled
   TEST(sycl::radians, float, 2, EXPECTED(float, M_PI, M_PI), 0,
        va10.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::radians, double, 3, EXPECTED(double, M_PI, M_PI, M_PI), 0, va11);
     TEST(sycl::radians, double, 2, EXPECTED(double, M_PI, M_PI), 0,
          va11.swizzle<1, 0>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::radians, sycl::half, 3, EXPECTED(sycl::half, M_PI, M_PI, M_PI),
-         0.002, va12);
     TEST(sycl::radians, sycl::half, 2, EXPECTED(sycl::half, M_PI, M_PI), 0.002,
          va12.swizzle<1, 0>());
   }
-  // sycl::step
-  TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va1, va3);
+  // sycl::step swizzled
   TEST(sycl::step, float, 2, EXPECTED(float, 0.0f, 0.0f), 0,
        va16.swizzle<1, 0>(), va3);
   TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va1,
@@ -171,7 +233,6 @@ int main() {
   TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 1.0f), 0,
        va16.swizzle<1, 0>(), va16.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::step, double, 2, EXPECTED(double, 1.0, 1.0), 0, va4, va9);
     TEST(sycl::step, double, 2, EXPECTED(double, 0.0, 0.0), 0,
          va17.swizzle<1, 0>(), va9);
     TEST(sycl::step, double, 2, EXPECTED(double, 1.0, 1.0), 0, va4,
@@ -180,8 +241,6 @@ int main() {
          va17.swizzle<1, 0>(), va17.swizzle<1, 0>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
-         va12, va13);
     TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
          va12.swizzle<0, 1, 2>(), va13);
     TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
@@ -189,17 +248,13 @@ int main() {
     TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
          va12.swizzle<0, 1, 2>(), va13.swizzle<0, 1, 2>());
   }
-  TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 0.0f), 0, 2.5f, va3);
   TEST(sycl::step, float, 2, EXPECTED(float, 0.0f, 1.0f), 0, 2.5f,
        va3.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::step, double, 2, EXPECTED(double, 0.0f, 1.0f), 0, 6.0f, va9);
     TEST(sycl::step, double, 2, EXPECTED(double, 1.0f, 0.0f), 0, 6.0f,
          va9.swizzle<1, 0>());
   }
-  // sycl::smoothstep
-  TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va8, va1,
-       va2);
+  // sycl::smoothstep swizzled
   TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0,
        va8.swizzle<0, 1>(), va1, va2);
   TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va8,
@@ -215,8 +270,6 @@ int main() {
   TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0,
        va8.swizzle<0, 1>(), va1.swizzle<0, 1>(), va2.swizzle<0, 1>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0, 1.0f), 0.00000001,
-         va4, va9, va9);
     TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0, 1.0f), 0.00000001,
          va4.swizzle<0, 1>(), va9, va9);
     TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0, 1.0f), 0.00000001,
@@ -233,8 +286,6 @@ int main() {
          va4.swizzle<0, 1>(), va9.swizzle<0, 1>(), va9.swizzle<0, 1>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::smoothstep, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0),
-         0, va7, va12, va13);
     TEST(sycl::smoothstep, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0),
          0, va7.swizzle<0, 1, 2>(), va12, va13);
     TEST(sycl::smoothstep, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0),
@@ -251,8 +302,6 @@ int main() {
          0, va7.swizzle<0, 1, 2>(), va12.swizzle<0, 1, 2>(),
          va13.swizzle<0, 1, 2>());
   }
-  TEST(sycl::smoothstep, float, 2, EXPECTED(float, 0.0553936f, 0.0f), 0.0000001,
-       2.5f, 6.0f, va3);
   TEST(sycl::smoothstep, float, 2, EXPECTED(float, 0.0f, 0.0553936f), 0.0000001,
        2.5f, 6.0f, va3.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
@@ -261,21 +310,18 @@ int main() {
     TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0f, 0.0f), 0, 6.0f,
          8.0f, va9.swizzle<1, 0>());
   }
-  // sign
-  TEST(sycl::sign, float, 2, EXPECTED(float, +0.0f, -1.0f), 0, va14);
+  // sign swizzled
   TEST(sycl::sign, float, 2, EXPECTED(float, -1.0f, +0.0f), 0,
        va14.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::sign, double, 2, EXPECTED(double, -0.0, 1.0), 0, va15);
     TEST(sycl::sign, double, 2, EXPECTED(double, 1.0, -0.0), 0,
          va15.swizzle<1, 0>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::sign, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0), 0,
-         va12);
     TEST(sycl::sign, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0), 0,
          va12.swizzle<2, 1, 0>());
   }
+#endif // __INTEL_PREVIEW_BREAKING_CHANGES
 
   return 0;
 }
diff --git a/sycl/test-e2e/Basic/built-ins/vec_geometric.cpp b/sycl/test-e2e/Basic/built-ins/vec_geometric.cpp
index 0d0acdee0c82..5b9e6fdad06e 100644
--- a/sycl/test-e2e/Basic/built-ins/vec_geometric.cpp
+++ b/sycl/test-e2e/Basic/built-ins/vec_geometric.cpp
@@ -1,7 +1,7 @@
 // RUN: %{build} -o %t.out
 // RUN: %{run} %t.out
-// RUN: %if preview-breaking-changes-supported %{ %clangxx -fsycl -fpreview-breaking-changes %s -o %t2.out %}
-// RUN: %if preview-breaking-changes-supported %{  %{run} %t2.out %}
+// RUN: %if preview-breaking-changes-supported %{ %{build} -fpreview-breaking-changes -o %t2.out %}
+// RUN: %if preview-breaking-changes-supported %{ %{run} %t2.out %}
 
 // https://github.com/intel/llvm/issues/7585 to fix the failure:
 // XFAIL: gpu
@@ -69,70 +69,38 @@ int main() {
        VFloatD3_2);
   TEST(sycl::cross, float, 4, EXPECTED(float, -1.f, -4.f, 3.f, 0.f), 0,
        VFloatD4, VFloatD4_2);
-  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0,
-       VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
-  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0, VFloatD3,
-       VFloatD4_2.swizzle<0, 1, 2>());
-  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0,
-       VFloatD4.swizzle<0, 1, 2>(), VFloatD4_2.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
          VDoubleD3, VDoubleD3_2);
     TEST(sycl::cross, double, 4, EXPECTED(double, -1.f, -4.f, 3.f, 0.f), 0,
          VDoubleD4, VDoubleD4_2);
-    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
-         VDoubleD3, VDoubleD4_2.swizzle<0, 1, 2>());
-    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
-         VDoubleD4.swizzle<0, 1, 2>(), VDoubleD3_2);
-    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
-         VDoubleD4.swizzle<0, 1, 2>(), VDoubleD4_2.swizzle<0, 1, 2>());
   }
 
   TEST2(sycl::dot, float, 13.f, 0, VFloatD2, VFloatD2_2);
   TEST2(sycl::dot, float, 32.f, 0, VFloatD3, VFloatD3_2);
   TEST2(sycl::dot, float, 48.f, 0, VFloatD4, VFloatD4_2);
-  TEST2(sycl::dot, float, 32.f, 0, VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
-  TEST2(sycl::dot, float, 32.f, 0, VFloatD3, VFloatD4_2.swizzle<0, 1, 2>());
-  TEST2(sycl::dot, float, 32.f, 0, VFloatD4.swizzle<0, 1, 2>(),
-        VFloatD4_2.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST2(sycl::dot, double, 13, 0, VDoubleD2, VDoubleD2_2);
     TEST2(sycl::dot, double, 32, 0, VDoubleD3, VDoubleD3_2);
     TEST2(sycl::dot, double, 48, 0, VDoubleD4, VDoubleD4_2);
-    TEST2(sycl::dot, double, 32, 0, VDoubleD4.swizzle<0, 1, 2>(), VDoubleD3_2);
-    TEST2(sycl::dot, double, 32, 0, VDoubleD3, VDoubleD4_2.swizzle<0, 1, 2>());
-    TEST2(sycl::dot, double, 32, 0, VDoubleD4.swizzle<0, 1, 2>(),
-          VDoubleD4_2.swizzle<0, 1, 2>());
   }
 
   TEST2(sycl::length, float, 2.236068f, 1e-6, VFloatD2);
   TEST2(sycl::length, float, 3.741657f, 1e-6, VFloatD3);
   TEST2(sycl::length, float, 5.477225f, 1e-6, VFloatD4);
-  TEST2(sycl::length, float, 3.741657f, 1e-6, VFloatD4.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST2(sycl::length, double, 2.236068, 1e-6, VDoubleD2);
     TEST2(sycl::length, double, 3.741657, 1e-6, VDoubleD3);
     TEST2(sycl::length, double, 5.477225, 1e-6, VDoubleD4);
-    TEST2(sycl::length, double, 3.741657, 1e-6, VDoubleD4.swizzle<0, 1, 2>());
   }
 
   TEST2(sycl::distance, float, 3.605551f, 1e-6, VFloatD2, VFloatD2_2);
   TEST2(sycl::distance, float, 5.f, 0, VFloatD3, VFloatD3_2);
   TEST2(sycl::distance, float, 5.f, 0, VFloatD4, VFloatD4_2);
-  TEST2(sycl::distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
-  TEST2(sycl::distance, float, 5.f, 0, VFloatD3, VFloatD4_2.swizzle<0, 1, 2>());
-  TEST2(sycl::distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
-        VFloatD4_2.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST2(sycl::distance, double, 3.605551, 1e-6, VDoubleD2, VDoubleD2_2);
     TEST2(sycl::distance, double, 5.0, 0, VDoubleD3, VDoubleD3_2);
     TEST2(sycl::distance, double, 5.0, 0, VDoubleD4, VDoubleD4_2);
-    TEST2(sycl::distance, double, 5.0, 0, VDoubleD4.swizzle<0, 1, 2>(),
-          VDoubleD3_2);
-    TEST2(sycl::distance, double, 5.0, 0, VDoubleD3,
-          VDoubleD4_2.swizzle<0, 1, 2>());
-    TEST2(sycl::distance, double, 5.0, 0, VDoubleD4.swizzle<0, 1, 2>(),
-          VDoubleD4_2.swizzle<0, 1, 2>());
   }
 
   TEST(sycl::normalize, float, 2, EXPECTED(float, 0.447213f, 0.894427f), 1e-6,
@@ -142,8 +110,6 @@ int main() {
   TEST(sycl::normalize, float, 4,
        EXPECTED(float, 0.182574f, 0.365148f, 0.547723f, 0.730297f), 1e-6,
        VFloatD4);
-  TEST(sycl::normalize, float, 3, EXPECTED(float, 0.267261, 0.534522, 0.801784),
-       1e-6, VFloatD4.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST(sycl::normalize, double, 2, EXPECTED(double, 0.447213, 0.894427), 1e-6,
          VDoubleD2);
@@ -152,25 +118,15 @@ int main() {
     TEST(sycl::normalize, double, 4,
          EXPECTED(double, 0.182574, 0.365148, 0.547723, 0.730297), 1e-6,
          VDoubleD4);
-    TEST(sycl::normalize, double, 3,
-         EXPECTED(double, 0.267261, 0.534522, 0.801784), 1e-6,
-         VDoubleD4.swizzle<0, 1, 2>());
   }
 
   TEST2(sycl::fast_distance, float, 3.605551f, 1e-6, VFloatD2, VFloatD2_2);
   TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD3, VFloatD3_2);
   TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4, VFloatD4_2);
-  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
-        VFloatD3_2);
-  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD3,
-        VFloatD4_2.swizzle<0, 1, 2>());
-  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
-        VFloatD4_2.swizzle<0, 1, 2>());
 
   TEST2(sycl::fast_length, float, 2.236068f, 1e-6, VFloatD2);
   TEST2(sycl::fast_length, float, 3.741657f, 1e-6, VFloatD3);
   TEST2(sycl::fast_length, float, 5.477225f, 1e-6, VFloatD4);
-  TEST2(sycl::fast_length, float, 3.741657f, 1e-6, VFloatD4.swizzle<0, 1, 2>());
 
   TEST(sycl::fast_normalize, float, 2, EXPECTED(float, 0.447213f, 0.894427f),
        1e-3, VFloatD2);
@@ -179,9 +135,73 @@ int main() {
   TEST(sycl::fast_normalize, float, 4,
        EXPECTED(float, 0.182574f, 0.365148f, 0.547723f, 0.730297f), 1e-3,
        VFloatD4);
+
+#ifdef __INTEL_PREVIEW_BREAKING_CHANGES
+  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0,
+       VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
+  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0, VFloatD3,
+       VFloatD4_2.swizzle<0, 1, 2>());
+  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0,
+       VFloatD4.swizzle<0, 1, 2>(), VFloatD4_2.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
+         VDoubleD3, VDoubleD4_2.swizzle<0, 1, 2>());
+    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
+         VDoubleD4.swizzle<0, 1, 2>(), VDoubleD3_2);
+    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
+         VDoubleD4.swizzle<0, 1, 2>(), VDoubleD4_2.swizzle<0, 1, 2>());
+  }
+
+  TEST2(sycl::dot, float, 32.f, 0, VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
+  TEST2(sycl::dot, float, 32.f, 0, VFloatD3, VFloatD4_2.swizzle<0, 1, 2>());
+  TEST2(sycl::dot, float, 32.f, 0, VFloatD4.swizzle<0, 1, 2>(),
+        VFloatD4_2.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST2(sycl::dot, double, 32, 0, VDoubleD4.swizzle<0, 1, 2>(), VDoubleD3_2);
+    TEST2(sycl::dot, double, 32, 0, VDoubleD3, VDoubleD4_2.swizzle<0, 1, 2>());
+    TEST2(sycl::dot, double, 32, 0, VDoubleD4.swizzle<0, 1, 2>(),
+          VDoubleD4_2.swizzle<0, 1, 2>());
+  }
+
+  TEST2(sycl::length, float, 3.741657f, 1e-6, VFloatD4.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST2(sycl::length, double, 3.741657, 1e-6, VDoubleD4.swizzle<0, 1, 2>());
+  }
+
+  TEST2(sycl::distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
+  TEST2(sycl::distance, float, 5.f, 0, VFloatD3, VFloatD4_2.swizzle<0, 1, 2>());
+  TEST2(sycl::distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
+        VFloatD4_2.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST2(sycl::distance, double, 5.0, 0, VDoubleD4.swizzle<0, 1, 2>(),
+          VDoubleD3_2);
+    TEST2(sycl::distance, double, 5.0, 0, VDoubleD3,
+          VDoubleD4_2.swizzle<0, 1, 2>());
+    TEST2(sycl::distance, double, 5.0, 0, VDoubleD4.swizzle<0, 1, 2>(),
+          VDoubleD4_2.swizzle<0, 1, 2>());
+  }
+
+  TEST(sycl::normalize, float, 3, EXPECTED(float, 0.267261, 0.534522, 0.801784),
+       1e-6, VFloatD4.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::normalize, double, 3,
+         EXPECTED(double, 0.267261, 0.534522, 0.801784), 1e-6,
+         VDoubleD4.swizzle<0, 1, 2>());
+  }
+
+  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
+        VFloatD3_2);
+  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD3,
+        VFloatD4_2.swizzle<0, 1, 2>());
+  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
+        VFloatD4_2.swizzle<0, 1, 2>());
+
+  TEST2(sycl::fast_length, float, 3.741657f, 1e-6, VFloatD4.swizzle<0, 1, 2>());
+
   TEST(sycl::fast_normalize, float, 3,
        EXPECTED(float, 0.267261f, 0.534522f, 0.801784f), 1e-3,
        VFloatD4.swizzle<0, 1, 2>());
+#endif // __INTEL_PREVIEW_BREAKING_CHANGES
 
   return 0;
 }
diff --git a/sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp b/sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp
index f12bce4091a2..1e668d47356a 100644
--- a/sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp
+++ b/sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp
@@ -5,8 +5,8 @@
 
 // RUN: %{build} -o %t.out
 // RUN: %{run} %t.out
-// Extra run to check for leaks in Level Zero using ZE_DEBUG
-// RUN: %if ext_oneapi_level_zero %{env ZE_DEBUG=4 %{run} %t.out 2>&1 | FileCheck %s %}
+// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
+// RUN: %if level_zero %{env UR_L0_LEAKS_DEBUG=1 %{run} %t.out 2>&1 | FileCheck %s %}
 //
 // CHECK-NOT: LEAK
 
diff --git a/sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp b/sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp
index eef91f0258e4..7548bf71f24d 100644
--- a/sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp
+++ b/sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp
@@ -7,6 +7,10 @@
 //===----------------------------------------------------------------------===//
 // REQUIRES: matrix-xmx8
 
+// TODO: Currently fails and regularly times out on DG2. Re-enable when this has
+//       been addressed.
+// UNSUPPORTED: gpu-intel-dg2
+
 // RUN: %{build} -o %t.out
 // RUN: %{run} %t.out
 
diff --git a/sycl/test-e2e/format.py b/sycl/test-e2e/format.py
index c44fadde2680..4060e0c99ab2 100644
--- a/sycl/test-e2e/format.py
+++ b/sycl/test-e2e/format.py
@@ -12,9 +12,9 @@ import os
 import re
 
 def get_triple(test, backend):
-    if backend == 'ext_oneapi_cuda':
+    if backend == 'cuda':
         return 'nvptx64-nvidia-cuda-syclmlir'
-    if backend == 'ext_oneapi_hip':
+    if backend == 'hip':
         if test.config.hip_platform == 'NVIDIA':
             return 'nvptx64-nvidia-cuda-syclmlir'
         else:
@@ -164,10 +164,13 @@ class SYCLEndToEndTest(lit.formats.ShTest):
             # so that device might still be accessible to some of the tests yet
             # we won't set the environment variable below for such scenario.
             extra_env = []
-            if 'ext_oneapi_level_zero:gpu' in sycl_devices and litConfig.params.get('ze_debug'):
-                extra_env.append('ZE_DEBUG={}'.format(test.config.ze_debug))
+            if 'level_zero:gpu' in sycl_devices and litConfig.params.get('ur_l0_debug'):
+                extra_env.append('UR_L0_DEBUG={}'.format(test.config.ur_l0_debug))
 
-            if 'ext_oneapi_cuda:gpu' in sycl_devices:
+            if 'level_zero:gpu' in sycl_devices and litConfig.params.get('ur_l0_leaks_debug'):
+                extra_env.append('UR_L0_LEAKS_DEBUG={}'.format(test.config.ur_l0_leaks_debug))
+
+            if 'cuda:gpu' in sycl_devices:
                 extra_env.append('SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1')
 
             return extra_env
@@ -203,9 +206,9 @@ class SYCLEndToEndTest(lit.formats.ShTest):
                 # Expand device-specific condtions (%if ... %{ ... %}).
                 tmp_script = [ cmd ]
                 conditions = {x: True for x in sycl_device.split(':')}
-                for op_sys in ['linux', 'windows']:
-                    if op_sys in test.config.available_features:
-                        conditions[op_sys] = True
+                for cond_features in ['linux', 'windows', 'preview-breaking-changes-supported']:
+                    if cond_features in test.config.available_features:
+                        conditions[cond_features] = True
 
                 tmp_script = lit.TestRunner.applySubstitutions(
                     tmp_script, [], conditions, recursion_limit=test.config.recursiveExpansionLimit)

69e5edd broke CI as it was not running on `sycl-mlir` PRs. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

Add new: - `Basic/image/image_accessor_readwrite.cpp` - `Basic/image/image_write_fp16.cpp` - `Basic/swizzle_op.cpp` - `Basic/vector/int-convert.cpp` - `InvokeSimd/Spec/ImplicitSubgroup/uniform_retval.cpp` - `Regression/vec_logical_ops.cpp` - `Regression/vec_rel_swizzle_ops.cpp` Drop: - `Plugin/level_zero_queue_profiling.cpp` - `Plugin/level_zero_usm_residency.cpp` Rename: - `sycl/test-e2e/Basic/vector/vec_bool.cpp` -> `sycl/test-e2e/Basic/vector/bool.cpp` - `sycl/test-e2e/Basic/vector/vector_operators.cpp` -> `sycl/test-e2e/Basic/vector/operators.cpp` - `sycl/test-e2e/Basic/vector/scalar_vec_access.cpp` -> `sycl/test-e2e/Basic/vector/scalar_access.cpp` Signed-off-by: Victor Perez <victor.perez@codeplay.com>

victor-eds · 2024-01-05T14:58:09Z

Opened issues arising from this rebase:

victor-eds · 2024-01-05T15:02:18Z

Hi @keryell!
It was a big one, but we have merged DPC++ commits until 0913045 in SYCL-MLIR. We have a couple of regressions I mention in this PR. Only the -O0 one is user-facing. Thanks for your efforts trying to rebase. Feel free to close that PR now 😄

rafbiels and others added 30 commits December 21, 2023 16:36

[Clang][Sema] Fix Wswitch-default bad warning in template (#76007)

033ec09

llvm/llvm-project#73077 added -Wswitch-default diagnostic but it produced false positives in templates. This PR will address that. llvm/llvm-project#75943

[hwasan] Classify stack overflow, and use after scope (#76133)

7db28dd

We can't distinguish UAR and UAS, but by definition UAR is already UAS.

libc++-infa: Hotfix runner group

c03745d

The runners-32 group is broken, for reasons... The easiest fix is to move the jobs to runners-8. (which needs to be renamed, because they're all actually 30 core machines)

[mlir][spirv] Add physical storage buffer extension test. NFC. (#76196)

a03c53c

This test demonstrates how the PhysicalStorageBuffer extension can be used end-2-end in a spir-v module. This module has been verified to pass serialization, deserialization, and validation with spirv-val.

AMDGPU: Remove pointless setOperationAction for xint_to_fp

248fba0

The legalize action for uint_to_fp/sint_to_fp uses the source integer type, not the result FP type so setting an action on an FP type does nothing.

Revert "[AArch64] Fold more load.x into load.i with large offset"

0ccc1e7

Issue #76202 This reverts commit f568763.

[test][hwasan] Try to fix Android bot

beffa1e

https://lab.llvm.org/buildbot/#/builders/77/builds/33152/steps/21/logs/stdio

AMDGPU: Workaround a divergent return value bug in test

50ed3b1

AMDGPU: Add a few more bfloat codegen tests

c7952d8

[test][hwasan] Update another test after #76133

bb7f8f7

Fix chunk-print-before-all script

782cf12

After c718336 the output of print-before/print-after was changed. A semicolon was added before the banner "*** IR Dump ..." and this change broke chunk-print-before-all script. This change makes it more resilient to small output format variations.

DAG: Handle promotion of fcanonicalize

4d1cd38

This avoids a regression in a future commit

Merge from 'sycl' to 'sycl-web'

b296dfd

Merge from 'main' to 'sycl-web' (269 commits)

297704c

CONFLICT (content): Merge conflict in openmp/libomptarget/test/lit.cfg

[test][hwasan] XFAIL new test which fails for unknown reason

f25bcfb

[RISCV] Rename TuneVeyronFusions to TuneVentanaVeyron

90f816e

And fusion features are added to processor definition.

[CI] Add support for sycl-rel-** branches to CI (intel#12237)

37af029

add sycl-rel-** to the pre-commit/post-commit workflow supported branches

[RISCV] Fix macro-fusions.mir

59eebb4

Omited resolution change (intel#12238)

33707fe

[clang][NFC] Remove trailing whitespace characters

e2d0f50

bb-sycl and others added 12 commits December 28, 2023 18:27

LLVM and SPIRV-LLVM-Translator pulldown (WW52)

9bfc3ae

LLVM: llvm/llvm-project@bc8c4bb SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@e63fdb8

[SYCL] add dependent-load flag to exclude CWD from default search path (

9c504a5

intel#12219) Co-authored-by: Artur Gainullin <artur.gainullin@intel.com>

[SYCL] Refine annotated_arg addrspace with USM kind property (intel#1…

865092f

…2211) Apply intel#11623 to `annotated_arg` class

[SYCL][ESIMD] Fix lsc_load_2d API issue that prevented usage for diff…

0913045

…erent types (intel#12244)

Merge remote-tracking branch 'upstream/sycl' into 'sycl-mlir'

69e5edd

Signed-off-by: Victor Perez <victor.perez@codeplay.com>

[SYCL-MLIR][cgeist][test] Update Verification/omp*.c tests

eae8006

Update tests to use `scf.reduce` instead of `scf.yield` in `scf.parallel` operations. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

[SYCL-MLIR][sycl][test] Define `SYCLAccessorAnalysis::getAliasAnalysi…

721512a

…s()` Define shortcut to get alias analysis being used in accessor analysis in order to query alias between accessors.

[SYCL-MLIR][cgeist] Add alignment and address space to string constants

d8bdf7d

Add address space to global constants so that they can be translated to SPIR-V. Also add alignment in the process. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

[SYCL-MLIR][test-e2e] XFAIL cgeist -O0 tests

dc69fb6

cgeist `-O0` E2E tests are failing. `XFAIL` them till further investigation. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

victor-eds added disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch labels Jan 4, 2024

victor-eds self-assigned this Jan 4, 2024

[SYCL-MLIR][CI] Run CI on sycl-mlir pull requests

dd77b31

69e5edd broke CI as it was not running on `sycl-mlir` PRs. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

victor-eds force-pushed the rebase branch from a098575 to a7088b3 Compare January 4, 2024 16:01

victor-eds closed this Jan 4, 2024

victor-eds reopened this Jan 4, 2024

victor-eds force-pushed the rebase branch from a7088b3 to e0df215 Compare January 5, 2024 09:49

victor-eds marked this pull request as ready for review January 5, 2024 11:18

victor-eds requested a review from a team as a code owner January 5, 2024 11:18

whitneywhtsang approved these changes Jan 5, 2024

View reviewed changes

sommerlukas approved these changes Jan 5, 2024

View reviewed changes

victor-eds merged commit e0df215 into intel:sycl-mlir Jan 5, 2024
15 checks passed

victor-eds deleted the rebase branch January 5, 2024 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge remote-tracking branch 'upstream/sycl' into 'sycl-mlir' #12293

Merge remote-tracking branch 'upstream/sycl' into 'sycl-mlir' #12293

victor-eds commented Jan 4, 2024 •

edited

Loading

victor-eds commented Jan 4, 2024 •

edited

Loading

victor-eds commented Jan 5, 2024

victor-eds commented Jan 5, 2024

Merge remote-tracking branch 'upstream/sycl' into 'sycl-mlir' #12293

Merge remote-tracking branch 'upstream/sycl' into 'sycl-mlir' #12293

Conversation

victor-eds commented Jan 4, 2024 • edited Loading

victor-eds commented Jan 4, 2024 • edited Loading

victor-eds commented Jan 5, 2024

victor-eds commented Jan 5, 2024

victor-eds commented Jan 4, 2024 •

edited

Loading

victor-eds commented Jan 4, 2024 •

edited

Loading