Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge remote-tracking branch 'upstream/sycl' into 'sycl-mlir' #12293

Merged
merged 5,026 commits into from
Jan 5, 2024

Conversation

victor-eds
Copy link
Contributor

@victor-eds victor-eds commented Jan 4, 2024

Review only files with conflict:

  • .github/workflows/sycl_mlir_linux_precommit.yml
  • mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
  • mlir/test/lib/Transforms/TestIntRangeInference.cpp
  • sycl/include/sycl/handler.hpp
  • sycl/test-e2e/Basic/built-ins/vec_common.cpp
  • sycl/test-e2e/Basic/built-ins/vec_geometric.cpp
  • sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp
  • sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp
  • sycl/test-e2e/format.py

And commits:

rafbiels and others added 30 commits December 21, 2023 16:36
Add four extra option resets in `disableUnsupportedOptions` when parsing
SYCL code. All four are needed to make clangd happily parse code when
`-fsycl` is specified in compile commands. Without each of them, the
following happens:
* without `SYCLIsDevice=false`: crash on !nullptr assert
* without `DeclareSPIRVBuiltins=false`: crash on !nullptr assert
* without `SYCLUnnamedLambda=false`: `ref_non_value` error, `'T' does
not refer to a value`
* without `Triple=HostTriple`: `pp_file_not_found` error,
`'gnu/stubs-32.h' file not found`

Fixes intel#11088 and clangd/clangd#1097


Also add a LIT test which parses the following code:
```cpp
#include <sycl/sycl.hpp>
sycl::queue q{};
```
and checks `symbolInfo` for the symbol `q` against a reference.
llvm/llvm-project#73077 added -Wswitch-default
diagnostic but it produced false positives in templates. This PR will
address that. llvm/llvm-project#75943
We can't distinguish UAR and UAS, but by definition
UAR is already UAS.
Similar to `vector.transfer_read`/`vector.transfer_write`, allow 0-D
vectors.

This commit fixes
`mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir`
when verifying the IR after each pattern (#74270). That test produces a
temporary 0-D load/store op.
The runners-32 group is broken, for reasons...
The easiest fix is to move the jobs to runners-8.
(which needs to be renamed, because they're all actually 30 core
machines)
`VEX_4V` does not look simpler than `VEX, VVVV`. It's kind of confusing
b/c classes like `VEX_L`, `VEX_LIG` do not imply `VEX` but it does.

For APX, we have promote EVEX, NDD, NF and NDD_NF instructions. All of
the 4 variants are in EVEX space and NDD/NDD_NF set the VVVV fields.
To extract the common fields (e.g EVEX) into a class and set VVVV
conditionally, we need VVVV to not imply other prefixes.
This change adds (runtime) bounds checks for `memref` ops using the
existing `RuntimeVerifiableOpInterface`. For `memref.load` and
`memref.store`, we check that the indices are in-bounds of the memref's
index space. For `memref.reinterpret_cast` and `memref.subview` we check
that the resulting address space is in-bounds of the input memref's
address space.
…74686)

`clang -g -gpubnames -fdebug-types-section` now emits .debug_names
section with references to local type unit entries defined in COMDAT
.debug_info sections.

```
.section        .debug_info,"G",@progbits,5657452045627120676,comdat
.Ltu_begin0:
...

.section        .debug_names,"",@progbits
...
// DWARF32
.long   .Ltu_begin0                     # Type unit 0
// DWARF64
// .long   .Ltu_begin0                     # Type unit 0
```

When `.Ltu_begin0` is relative to a non-prevailing .debug_info section,
the relocation resolves to 0, which is a valid offset within the
.debug_info section.

```
cat > a.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
int main() { foo(); }
e
cat > b.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
void use() { foo(); }
e
clang++ -g -gpubnames -fdebug-types-section -fuse-ld=lld a.cc b.cc -o old
```
```
% llvm-dwarfdump old
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000
  ]
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000  // indistinguishable from a valid offset within .debug_info
  ]
```

https://dwarfstd.org/issues/231013.1.html proposes that we use a
tombstone value instead to inform consumers. This patch implements the
idea. The second LocalTU entry will now use 0xffffffff.

https://reviews.llvm.org/D84825 has a TODO that we should switch the
tombstone value for most `.debug_*` sections to UINT64_MAX. We have
postponed the change for more than three years for consumers to migrate.
At some point we shall make the change, so that .debug_names is no long
different from other debug section that is not .debug_loc/.debug_ranges.

Co-authored-by: Alexander Yermolovich <ayermolo@meta.com>
This test demonstrates how the PhysicalStorageBuffer extension can be
used end-2-end in a spir-v module.

This module has been verified to pass serialization, deserialization,
and validation with spirv-val.
The legalize action for uint_to_fp/sint_to_fp uses the source integer
type, not the result FP type so setting an action on an FP type does
nothing.
This patch removes the explicit llvm-objdump man page. By enabling
sphinx man page output with `-DLLVM_ENABLE_SPHINX=ON` and
`-DSPHINX_OUTPUT_MAN=ON`, we can generate man pages for all the llvm
binary utilities from the restructured text documentation. Having an
additional man page upstream increases fragementation and maintenance.
After c718336 the output of print-before/print-after was changed.
A semicolon was added before the banner "*** IR Dump ..." and this
change broke chunk-print-before-all script.

This change makes it more resilient to small output format variations.
This avoids a regression in a future commit
  CONFLICT (content): Merge conflict in openmp/libomptarget/test/lit.cfg
R16-R31 was added into GPRs in
llvm/llvm-project#70958,
This patch supports the encoding/decoding for promoted CET instruction
in EVEX space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
And fusion features are added to processor definition.
add sycl-rel-** to the pre-commit/post-commit workflow supported
branches
We split `TuneShiftedZExtFusion` into three fusions to make them
reusable and match the GCC implementation[1].

The zexth/zextw fusions can be reused by XiangShan[2] and other
commercial processors, but shifted zero extension is not so common.

`macro-fusions-veyron-v1.mir` is renamed so it's not relevant to
specific processor.

References:
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637303.html
[2] https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/decode
Before this patch, there was a regression in comment formatting due to
some code formatting in bd3e8eb. This
was fixed in 428660c. Github interprets
a tab before a string as starting code formatting. The message that
indicted the code formatting in a PR had been fixed was refactored to a
python multi-line string, but with a tab in front, causing these
messages to be rendered as code blocks in Github, instead of as
intended. This patch builds upon the original fix to reformat the
strings so that they fit within ~80 character lines and are simpler to
modify in the future, hopefully removing traps like the one that caused
the original issue.
… (#76125)

R16-R31 was added into GPRs in
llvm/llvm-project#70958,
This patch supports the encoding/decoding for promoted CMPCCXADD
instruction in EVEX space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
bb-sycl and others added 12 commits December 28, 2023 18:27
intel#12219)

Co-authored-by: Artur Gainullin <artur.gainullin@intel.com>
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
…onAnalysis`

Latest upstream changes made data-flow analyses interprocedural. Our
`setToEntryState()` function is not working as expected, which is
causing issues. Overriding this function and using previous behavior
is thus needed for our analysis to work.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Update tests to use `scf.reduce` instead of `scf.yield` in
`scf.parallel` operations.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
…s()`

Define shortcut to get alias analysis being used in accessor analysis
in order to query alias between accessors.
Drop following passing tests from list:

- `Basic/half_type.cpp`
- `Basic/image/image_accessor_readwrite_half.cpp`
- `Basic/image/image_write_fp16.cpp`
- `ESIMD/accessor_load_store_stateless.cpp`
- `ESIMD/api/svm_gather_scatter_64.cpp`
- `ESIMD/regression/globals.cpp`
- `InvokeSimd/Spec/ImplicitSubgroup/uniform_retval.cpp`
- `NonUniformGroups/fixed_size_group_algorithms.cpp`

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Add address space to global constants so that they can be translated
to SPIR-V. Also add alignment in the process.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
cgeist `-O0` E2E tests are failing. `XFAIL` them till further
investigation.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
- `ESIMD/fp_controls.cpp`
- `ESIMD/noinline_call_from_func.cpp`
- `ESIMD/private_memory/private_memory.cpp`
- `ESIMD/regression/bfloat16_half_vector_plus_eq_scalar.cpp`
- `ESIMD/unified_memory_api/atomic_update_acc.cpp`
- `ESIMD/unified_memory_api/atomic_update_acc_64.cpp`
- `ESIMD/unified_memory_api/atomic_update_acc_cmpxchg.cpp`
- `ESIMD/unified_memory_api/atomic_update_slm.cpp`
- `ESIMD/unified_memory_api/atomic_update_slm_acc.cpp`
- `ESIMD/unified_memory_api/atomic_update_slm_acc_cmpxchg.cpp`
- `ESIMD/unified_memory_api/atomic_update_slm_cmpxchg.cpp`
- `ESIMD/unified_memory_api/block_store_slm.cpp`
- `ESIMD/unified_memory_api/block_store_slm_acc.cpp`
- `Graph/Explicit/sub_group_prop.cpp`
- `Graph/Explicit/work_group_size_prop.cpp`
- `Graph/RecordReplay/sub_group_prop.cpp`
- `Graph/RecordReplay/work_group_size_prop.cpp`
- `InvokeSimd/Regression/address_space_cast.cpp`
- `KernelAndProgram/disable-caching.cpp`
- `Plugin/level_zero_queue_profiling.cpp`
- `Plugin/level_zero_usm_residency.cpp`

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
@victor-eds victor-eds added disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch labels Jan 4, 2024
@victor-eds victor-eds self-assigned this Jan 4, 2024
@victor-eds
Copy link
Contributor Author

victor-eds commented Jan 4, 2024

Diff is too big to be rendered by browser, so here's the diff for each conflicting file:

diff --git a/.github/workflows/sycl_mlir_linux_precommit.yml b/.github/workflows/sycl_mlir_linux_precommit.yml
index 23e3a20e6627..f8f64cd7a8bb 100644
--- a/.github/workflows/sycl_mlir_linux_precommit.yml
+++ b/.github/workflows/sycl_mlir_linux_precommit.yml
@@ -20,12 +22,18 @@ on:
     - 'devops/containers/**'
     - 'devops/actions/build_container/**'
 
+concurrency:
+  #  Cancel a currently running workflow from the same PR, branch or tag.
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
 jobs:
   detect_changes:
     uses: ./.github/workflows/sycl_detect_changes.yml
 
   lint:
     runs-on: [Linux, build]
+    if: ${{ always() && !contains(github.event.pull_request.labels.*.name, 'disable-lint') }}
     container:
       image: ghcr.io/intel/llvm/sycl_ubuntu2204_nightly:no-drivers
       options: -u 1001:1001
@@ -56,7 +64,7 @@ jobs:
     needs: [lint, detect_changes]
     if: |
       always()
-      && (success() || contains(github.event.pull_request.labels.*.name, 'disable-lint'))
+      && (success() || needs.lint.result == 'skipped')
     uses: ./.github/workflows/sycl_linux_build.yml
     with:
       build_ref: ${{ github.sha }}
@@ -64,6 +72,8 @@ jobs:
       build_cache_root: "/__w/"
       build_artifact_suffix: "default"
       build_cache_suffix: "default"
+      # FIXME: UR can't find /opt/rocm/hsa with rocm-6.0.0. Previous rocm-5.7.0 had it.
+      build_image: "ghcr.io/intel/llvm/ubuntu2204_build:7ed894ab0acc8ff09262113fdb08940d22654a30"
       changes: ${{ needs.detect_changes.outputs.filters }}
 
   test:
diff --git a/mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp b/mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
index 0cc8ea886093..65c1daee6711 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
@@ -50,11 +50,31 @@ static bool hasLifetimeMarkers(LLVM::AllocaOp allocaOp) {
 static void
 handleInlinedAllocas(Operation *call,
                      iterator_range<Region::iterator> inlinedBlocks) {
+  // Locate the entry block of the closest callsite ancestor that has either the
+  // IsolatedFromAbove or AutomaticAllocationScope trait. In pure LLVM dialect
+  // programs, this is the LLVMFuncOp containing the call site. However, in
+  // mixed-dialect programs, the callsite might be nested in another operation
+  // that carries one of these traits. In such scenarios, this traversal stops
+  // at the closest ancestor with either trait, ensuring visibility post
+  // relocation and respecting allocation scopes.
+  Block *callerEntryBlock = nullptr;
+  Operation *currentOp = call;
+  while (Operation *parentOp = currentOp->getParentOp()) {
+    if (parentOp->mightHaveTrait<OpTrait::IsIsolatedFromAbove>() ||
+        parentOp->mightHaveTrait<OpTrait::AutomaticAllocationScope>()) {
+      callerEntryBlock = &currentOp->getParentRegion()->front();
+      break;
+    }
+    currentOp = parentOp;
+  }
+
+  // Avoid relocating the alloca operations if the call has been inlined into
+  // the entry block already, which is typically the encompassing
+  // LLVM function, or if the relevant entry block cannot be identified.
   Block *calleeEntryBlock = &(*inlinedBlocks.begin());
-  Block *callerEntryBlock = &(*calleeEntryBlock->getParent()->begin());
-  if (calleeEntryBlock == callerEntryBlock)
-    // Nothing to do.
+  if (!callerEntryBlock || callerEntryBlock == calleeEntryBlock)
     return;
+
   SmallVector<std::tuple<LLVM::AllocaOp, IntegerAttr, bool>> allocasToMove;
   bool shouldInsertLifetimes = false;
   bool hasDynamicAlloca = false;
@@ -496,14 +516,14 @@ static void handleAccessGroups(Operation *call,
 /// If `requestedAlignment` is higher than the alignment specified on `alloca`,
 /// realigns `alloca` if this does not exceed the natural stack alignment.
 /// Returns the post-alignment of `alloca`, whether it was realigned or not.
-static unsigned tryToEnforceAllocaAlignment(LLVM::AllocaOp alloca,
-                                            unsigned requestedAlignment,
+static uint64_t tryToEnforceAllocaAlignment(LLVM::AllocaOp alloca,
+                                            uint64_t requestedAlignment,
                                             DataLayout const &dataLayout) {
-  unsigned allocaAlignment = alloca.getAlignment().value_or(1);
+  uint64_t allocaAlignment = alloca.getAlignment().value_or(1);
   if (requestedAlignment <= allocaAlignment)
     // No realignment necessary.
     return allocaAlignment;
-  unsigned naturalStackAlignmentBits = dataLayout.getStackAlignment();
+  uint64_t naturalStackAlignmentBits = dataLayout.getStackAlignment();
   // If the natural stack alignment is not specified, the data layout returns
   // zero. Optimistically allow realignment in this case.
   if (naturalStackAlignmentBits == 0 ||
@@ -525,7 +545,7 @@ static unsigned tryToEnforceAllocaAlignment(LLVM::AllocaOp alloca,
 /// the pointer, then returns the resulting post-alignment, regardless of
 /// whether it was realigned or not. If no existing alignment attribute is
 /// found, returns 1 (i.e., assume that no alignment is guaranteed).
-static unsigned tryToEnforceAlignment(Value value, unsigned requestedAlignment,
+static uint64_t tryToEnforceAlignment(Value value, uint64_t requestedAlignment,
                                       DataLayout const &dataLayout) {
   if (Operation *definingOp = value.getDefiningOp()) {
     if (auto alloca = dyn_cast<LLVM::AllocaOp>(definingOp))
@@ -557,8 +577,8 @@ static unsigned tryToEnforceAlignment(Value value, unsigned requestedAlignment,
 /// the address of the new alloca, then returns the value of the new alloca.
 static Value handleByValArgumentInit(OpBuilder &builder, Location loc,
                                      Value argument, Type elementType,
-                                     unsigned elementTypeSize,
-                                     unsigned targetAlignment) {
+                                     uint64_t elementTypeSize,
+                                     uint64_t targetAlignment) {
   // Allocate the new value on the stack.
   Value allocaOp;
   {
@@ -587,7 +607,7 @@ static Value handleByValArgumentInit(OpBuilder &builder, Location loc,
 /// attribute (or 1 if no align attribute was set).
 static Value handleByValArgument(OpBuilder &builder, Operation *callable,
                                  Value argument, Type elementType,
-                                 unsigned requestedAlignment) {
+                                 uint64_t requestedAlignment) {
   auto func = cast<LLVM::LLVMFuncOp>(callable);
   LLVM::MemoryEffectsAttr memoryEffects = func.getMemoryAttr();
   // If there is no memory effects attribute, assume that the function is
@@ -597,16 +617,16 @@ static Value handleByValArgument(OpBuilder &builder, Operation *callable,
                     memoryEffects.getArgMem() != LLVM::ModRefInfo::Mod;
   // Check if there's an alignment mismatch requiring us to copy.
   DataLayout dataLayout = DataLayout::closest(callable);
-  unsigned minimumAlignment = dataLayout.getTypeABIAlignment(elementType);
+  uint64_t minimumAlignment = dataLayout.getTypeABIAlignment(elementType);
   if (isReadOnly) {
     if (requestedAlignment <= minimumAlignment)
       return argument;
-    unsigned currentAlignment =
+    uint64_t currentAlignment =
         tryToEnforceAlignment(argument, requestedAlignment, dataLayout);
     if (currentAlignment >= requestedAlignment)
       return argument;
   }
-  unsigned targetAlignment = std::max(requestedAlignment, minimumAlignment);
+  uint64_t targetAlignment = std::max(requestedAlignment, minimumAlignment);
   return handleByValArgumentInit(builder, func.getLoc(), argument, elementType,
                                  dataLayout.getTypeSize(elementType),
                                  targetAlignment);
@@ -632,8 +652,7 @@ struct LLVMInlinerInterface : public DialectInlinerInterface {
                        bool wouldBeCloned) const final {
     if (!wouldBeCloned)
       return false;
-    auto callOp = dyn_cast<LLVM::CallOp>(call);
-    if (!callOp) {
+    if (!isa<LLVM::CallOp>(call)) {
       LLVM_DEBUG(llvm::dbgs()
                  << "Cannot inline: call is not an LLVM::CallOp\n");
       return false;
@@ -684,7 +703,6 @@ struct LLVMInlinerInterface : public DialectInlinerInterface {
     return true;
   }
 
-  /// All operations in the LLVM dialect are legal to inline.
   bool isLegalToInline(Operation *op, Region *, bool, IRMapping &) const final {
     return true;
   }
@@ -706,15 +724,13 @@ struct LLVMInlinerInterface : public DialectInlinerInterface {
   /// Handle the given inlined return by replacing the uses of the call with the
   /// operands of the return. This overload is called when the inlined region
   /// only contains one block.
-  void handleTerminator(Operation *op,
-                        ArrayRef<Value> valuesToRepl) const final {
+  void handleTerminator(Operation *op, ValueRange valuesToRepl) const final {
     // Return will be the only terminator present.
     auto returnOp = cast<LLVM::ReturnOp>(op);
 
     // Replace the values directly with the return operands.
     assert(returnOp.getNumOperands() == valuesToRepl.size());
-    for (const auto &[dst, src] :
-         llvm::zip(valuesToRepl, returnOp.getOperands()))
+    for (auto [dst, src] : llvm::zip(valuesToRepl, returnOp.getOperands()))
       dst.replaceAllUsesWith(src);
   }
 
@@ -724,7 +740,7 @@ struct LLVMInlinerInterface : public DialectInlinerInterface {
     if (std::optional<NamedAttribute> attr =
             argumentAttrs.getNamed(LLVM::LLVMDialect::getByValAttrName())) {
       Type elementType = cast<TypeAttr>(attr->getValue()).getValue();
-      unsigned requestedAlignment = 1;
+      uint64_t requestedAlignment = 1;
       if (std::optional<NamedAttribute> alignAttr =
               argumentAttrs.getNamed(LLVM::LLVMDialect::getAlignAttrName())) {
         requestedAlignment = cast<IntegerAttr>(alignAttr->getValue())
diff --git a/mlir/test/lib/Transforms/TestIntRangeInference.cpp b/mlir/test/lib/Transforms/TestIntRangeInference.cpp
index 7a0c97913af4..5758f6acf2f0 100644
--- a/mlir/test/lib/Transforms/TestIntRangeInference.cpp
+++ b/mlir/test/lib/Transforms/TestIntRangeInference.cpp
@@ -40,9 +40,8 @@ static LogicalResult replaceWithConstant(DataFlowSolver &solver, OpBuilder &b,
       maybeDefiningOp ? maybeDefiningOp->getDialect()
                       : value.getParentRegion()->getParentOp()->getDialect();
   Attribute constAttr = b.getIntegerAttr(value.getType(), *maybeConstValue);
-  Value constant =
-      folder.getOrCreateConstant(b.getInsertionBlock(), valueDialect, constAttr,
-                                 value.getType(), value.getLoc());
+  Value constant = folder.getOrCreateConstant(
+      b.getInsertionBlock(), valueDialect, constAttr, value.getType());
   if (!constant)
     return failure();
 
@@ -108,7 +107,7 @@ struct TestIntRangeInference
     Operation *op = getOperation();
     DataFlowSolver solver;
     solver.load<DeadCodeAnalysis>();
-    solver.load<dataflow::SparseConstantPropagation>();
+    solver.load<SparseConstantPropagation>();
     solver.load<IntegerRangeAnalysis>();
     if (failed(solver.initializeAndRun(op)))
       return signalPassFailure();
diff --git a/sycl/include/sycl/handler.hpp b/sycl/include/sycl/handler.hpp
index 858f056d0196..5fb9d234322a 100644
--- a/sycl/include/sycl/handler.hpp
+++ b/sycl/include/sycl/handler.hpp
@@ -29,6 +29,7 @@
 #include <sycl/event.hpp>
 #include <sycl/exception.hpp>
 #include <sycl/exception_list.hpp>
+#include <sycl/ext/intel/experimental/fp_control_kernel_properties.hpp>
 #include <sycl/ext/intel/experimental/kernel_execution_properties.hpp>
 #include <sycl/ext/oneapi/bindless_images_descriptor.hpp>
 #include <sycl/ext/oneapi/bindless_images_interop.hpp>
@@ -913,11 +914,20 @@ private:
   ///
   /// Stores information about kernel properties into the handler.
   template <
+      typename KernelName,
       typename PropertiesT = ext::oneapi::experimental::empty_properties_t>
   void processProperties(PropertiesT Props) {
+    using KI = detail::KernelInfo<KernelName>;
     static_assert(
         ext::oneapi::experimental::is_property_list<PropertiesT>::value,
         "Template type is not a property list.");
+    static_assert(
+        !PropertiesT::template has_property<
+            sycl::ext::intel::experimental::fp_control_key>() ||
+            (PropertiesT::template has_property<
+                 sycl::ext::intel::experimental::fp_control_key>() &&
+             KI::isESIMD()),
+        "Floating point control property is supported for ESIMD kernels only.");
     if constexpr (PropertiesT::template has_property<
                       sycl::ext::intel::experimental::cache_config_key>()) {
       auto Config = Props.template get_property<
@@ -1111,9 +1121,12 @@ private:
   };
 
   std::optional<std::array<size_t, 3>> getMaxWorkGroups();
+  // We need to use this version to support gcc 7.5.0. Remove when minimal
+  // supported gcc version is bumped.
+  std::tuple<std::array<size_t, 3>, bool> getMaxWorkGroups_v2();
 
   template <int Dims>
-  std::optional<range<Dims>> getRoundedRange(range<Dims> UserRange) {
+  std::tuple<range<Dims>, bool> getRoundedRange(range<Dims> UserRange) {
     range<Dims> RoundedRange = UserRange;
     // Disable the rounding-up optimizations under these conditions:
     // 1. The env var SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING is set.
@@ -1133,7 +1146,7 @@ private:
 
     // Perform range rounding if rounding-up is enabled.
     if (this->DisableRangeRounding())
-      return {};
+      return {range<Dims>{}, false};
 
     // Range should be a multiple of this for reasonable performance.
     size_t MinFactorX = 16;
@@ -1157,8 +1170,8 @@ private:
     // kernel in a 32-bit global range.
     auto Dev = detail::getSyclObjImpl(detail::getDeviceFromHandler(*this));
     id<Dims> MaxNWGs = [&] {
-      auto PiResult = getMaxWorkGroups();
-      if (!PiResult.has_value()) {
+      auto [MaxWGs, HasMaxWGs] = getMaxWorkGroups_v2();
+      if (!HasMaxWGs) {
         id<Dims> Default;
         for (int i = 0; i < Dims; ++i)
           Default[i] = (std::numeric_limits<int32_t>::max)();
@@ -1168,7 +1181,7 @@ private:
       id<Dims> IdResult;
       size_t Limit = (std::numeric_limits<int>::max)();
       for (int i = 0; i < Dims; ++i)
-        IdResult[i] = (std::min)(Limit, (*PiResult)[Dims - i - 1]);
+        IdResult[i] = (std::min)(Limit, MaxWGs[Dims - i - 1]);
       return IdResult;
     }();
     auto M = (std::numeric_limits<uint32_t>::max)();
@@ -1204,8 +1217,8 @@ private:
         Adjust(i, MaxRange[i]);
 
     if (!DidAdjust)
-      return {};
-    return RoundedRange;
+      return {range<Dims>{}, false};
+    return {RoundedRange, true};
   }
 
   /// Defines and invokes a SYCL kernel function for the specified range.
@@ -1253,9 +1266,10 @@ private:
                   "first argument of sycl::item type, or of a type which is "
                   "implicitly convertible from sycl::item");
 
+    using RefLambdaArgType = std::add_lvalue_reference_t<LambdaArgType>;
     static_assert(
-        (std::is_invocable_v<KernelType, LambdaArgType> ||
-         std::is_invocable_v<KernelType, LambdaArgType, kernel_handler>),
+        (std::is_invocable_v<KernelType, RefLambdaArgType> ||
+         std::is_invocable_v<KernelType, RefLambdaArgType, kernel_handler>),
         "SYCL kernel lambda/functor has an unexpected signature, it should be "
         "invocable with sycl::item and optionally sycl::kernel_handler");
 #endif
@@ -1273,7 +1287,8 @@ private:
 #if !defined(__SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING__) &&                  \
     !defined(DPCPP_HOST_DEVICE_OPENMP) &&                                      \
     !defined(DPCPP_HOST_DEVICE_PERF_NATIVE) && SYCL_LANGUAGE_VERSION >= 202001
-    if (auto RoundedRange = getRoundedRange(UserRange)) {
+    auto [RoundedRange, HasRoundedRange] = getRoundedRange(UserRange);
+    if (HasRoundedRange) {
       using NameWT = typename detail::get_kernel_wrapper_name_t<NameT>::name;
       auto Wrapper =
           getRangeRoundedKernelLambda<NameWT, TransformedArgType, Dims>(
@@ -1291,7 +1306,7 @@ private:
       // __SYCL_ASSUME_INT can still be violated. So check the bounds
       // of the user range, instead of the rounded range.
       detail::checkValueRange<Dims>(UserRange);
-      MNDRDesc.set(*RoundedRange);
+      MNDRDesc.set(RoundedRange);
       StoreLambda<KName, decltype(Wrapper), Dims, TransformedArgType>(
           std::move(Wrapper));
       setType(detail::CG::Kernel);
@@ -1306,7 +1321,7 @@ private:
       kernel_parallel_for_wrapper<NameT, TransformedArgType, KernelType,
                                   PropertiesT>(KernelFunc);
 #ifndef __SYCL_DEVICE_ONLY__
-      processProperties<PropertiesT>(Props);
+      processProperties<NameT, PropertiesT>(Props);
       detail::checkValueRange<Dims>(UserRange);
       MNDRDesc.set(std::move(UserRange));
       StoreLambda<NameT, KernelType, Dims, TransformedArgType>(
@@ -1361,7 +1376,7 @@ private:
     kernel_parallel_for_wrapper<NameT, TransformedArgType, KernelType,
                                 PropertiesT>(KernelFunc);
 #ifndef __SYCL_DEVICE_ONLY__
-    processProperties<PropertiesT>(Props);
+    processProperties<NameT, PropertiesT>(Props);
     detail::checkValueRange<Dims>(ExecutionRange);
     MNDRDesc.set(std::move(ExecutionRange));
     StoreLambda<NameT, KernelType, Dims, TransformedArgType>(
@@ -1418,7 +1433,7 @@ private:
     kernel_parallel_for_work_group_wrapper<NameT, LambdaArgType, KernelType,
                                            PropertiesT>(KernelFunc);
 #ifndef __SYCL_DEVICE_ONLY__
-    processProperties<PropertiesT>(Props);
+    processProperties<NameT, PropertiesT>(Props);
     detail::checkValueRange<Dims>(NumWorkGroups);
     MNDRDesc.setNumWorkGroups(NumWorkGroups);
     StoreLambda<NameT, KernelType, Dims, LambdaArgType>(std::move(KernelFunc));
@@ -1458,7 +1473,7 @@ private:
     kernel_parallel_for_work_group_wrapper<NameT, LambdaArgType, KernelType,
                                            PropertiesT>(KernelFunc);
 #ifndef __SYCL_DEVICE_ONLY__
-    processProperties<PropertiesT>(Props);
+    processProperties<NameT, PropertiesT>(Props);
     nd_range<Dims> ExecRange =
         nd_range<Dims>(NumWorkGroups * WorkGroupSize, WorkGroupSize);
     detail::checkValueRange<Dims>(ExecRange);
@@ -1638,8 +1653,8 @@ private:
   //   * Provide explicit template type parameters for the call
   //
   // Couldn't think of a better way to achieve both.
-  template <typename KernelType, typename PropertiesT, bool HasKernelHandlerArg,
-            typename FuncTy>
+  template <typename KernelName, typename KernelType, typename PropertiesT,
+            bool HasKernelHandlerArg, typename FuncTy>
   void unpack(_KERNELFUNCPARAM(KernelFunc), FuncTy Lambda) {
 #ifdef __SYCL_DEVICE_ONLY__
     detail::CheckDeviceCopyable<KernelType>();
@@ -1648,13 +1663,15 @@ private:
         typename detail::GetMergedKernelProperties<KernelType,
                                                    PropertiesT>::type;
     using Unpacker = KernelPropertiesUnpacker<MergedPropertiesT>;
+#ifndef __SYCL_DEVICE_ONLY__
     // If there are properties provided by get method then process them.
     if constexpr (ext::oneapi::experimental::detail::
                       HasKernelPropertiesGetMethod<
                           _KERNELFUNCPARAMTYPE>::value) {
-      processProperties(
+      processProperties<KernelName>(
           KernelFunc.get(ext::oneapi::experimental::properties_tag{}));
     }
+#endif
     if constexpr (HasKernelHandlerArg) {
       kernel_handler KH;
       Lambda(Unpacker{}, this, KernelFunc, KH);
@@ -1670,7 +1687,7 @@ private:
       typename KernelName, typename KernelType,
       typename PropertiesT = ext::oneapi::experimental::empty_properties_t>
   void kernel_single_task_wrapper(_KERNELFUNCPARAM(KernelFunc)) {
-    unpack<KernelType, PropertiesT,
+    unpack<KernelName, KernelType, PropertiesT,
            detail::KernelLambdaHasKernelHandlerArgT<KernelType>::value>(
         KernelFunc, [&](auto Unpacker, auto... args) {
           Unpacker.template kernel_single_task_unpack<KernelName, KernelType>(
@@ -1682,7 +1699,7 @@ private:
       typename KernelName, typename ElementType, typename KernelType,
       typename PropertiesT = ext::oneapi::experimental::empty_properties_t>
   void kernel_parallel_for_wrapper(_KERNELFUNCPARAM(KernelFunc)) {
-    unpack<KernelType, PropertiesT,
+    unpack<KernelName, KernelType, PropertiesT,
            detail::KernelLambdaHasKernelHandlerArgT<KernelType,
                                                     ElementType>::value>(
         KernelFunc, [&](auto Unpacker, auto... args) {
@@ -1695,7 +1712,7 @@ private:
       typename KernelName, typename ElementType, typename KernelType,
       typename PropertiesT = ext::oneapi::experimental::empty_properties_t>
   void kernel_parallel_for_work_group_wrapper(_KERNELFUNCPARAM(KernelFunc)) {
-    unpack<KernelType, PropertiesT,
+    unpack<KernelName, KernelType, PropertiesT,
            detail::KernelLambdaHasKernelHandlerArgT<KernelType,
                                                     ElementType>::value>(
         KernelFunc, [&](auto Unpacker, auto... args) {
@@ -1729,7 +1746,7 @@ private:
     // No need to check if range is out of INT_MAX limits as it's compile-time
     // known constant.
     MNDRDesc.set(range<1>{1});
-    processProperties<PropertiesT>(Props);
+    processProperties<NameT, PropertiesT>(Props);
     StoreLambda<NameT, KernelType, /*Dims*/ 1, void>(KernelFunc);
     setType(detail::CG::Kernel);
 #endif
@@ -2332,7 +2349,6 @@ public:
   std::enable_if_t<
       ext::oneapi::experimental::is_property_list<PropertiesT>::value>
   single_task(PropertiesT Props, _KERNELFUNCPARAM(KernelFunc)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     single_task_lambda_impl<KernelName, KernelType, PropertiesT>(Props,
                                                                  KernelFunc);
   }
@@ -2343,7 +2359,6 @@ public:
       ext::oneapi::experimental::is_property_list<PropertiesT>::value>
   parallel_for(range<1> NumWorkItems __SYCL_ANNOTATE(range), PropertiesT Props,
                _KERNELFUNCPARAM(KernelFunc) __SYCL_ANNOTATE(kernel)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_lambda_impl<KernelName, KernelType, 1, PropertiesT>(
         NumWorkItems, Props, std::move(KernelFunc));
   }
@@ -2354,7 +2369,6 @@ public:
       ext::oneapi::experimental::is_property_list<PropertiesT>::value>
   parallel_for(range<2> NumWorkItems __SYCL_ANNOTATE(range), PropertiesT Props,
                _KERNELFUNCPARAM(KernelFunc) __SYCL_ANNOTATE(kernel)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_lambda_impl<KernelName, KernelType, 2, PropertiesT>(
         NumWorkItems, Props, std::move(KernelFunc));
   }
@@ -2365,7 +2379,6 @@ public:
       ext::oneapi::experimental::is_property_list<PropertiesT>::value>
   parallel_for(range<3> NumWorkItems __SYCL_ANNOTATE(range), PropertiesT Props,
                _KERNELFUNCPARAM(KernelFunc) __SYCL_ANNOTATE(kernel)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_lambda_impl<KernelName, KernelType, 3, PropertiesT>(
         NumWorkItems, Props, std::move(KernelFunc));
   }
@@ -2377,7 +2390,6 @@ public:
   parallel_for(nd_range<Dims> Range __SYCL_ANNOTATE(nd_range),
                PropertiesT Properties,
                _KERNELFUNCPARAM(KernelFunc) __SYCL_ANNOTATE(kernel)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_impl<KernelName>(Range, Properties, std::move(KernelFunc));
   }
 
@@ -2392,7 +2404,6 @@ public:
   parallel_for(range<1> Range, PropertiesT Properties, RestT &&...Rest) {
     throwIfGraphAssociated<ext::oneapi::experimental::detail::
                                UnsupportedGraphFeatures::sycl_reductions>();
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     detail::reduction_parallel_for<KernelName>(*this, Range, Properties,
                                                std::forward<RestT>(Rest)...);
   }
@@ -2406,7 +2417,6 @@ public:
   parallel_for(range<2> Range, PropertiesT Properties, RestT &&...Rest) {
     throwIfGraphAssociated<ext::oneapi::experimental::detail::
                                UnsupportedGraphFeatures::sycl_reductions>();
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     detail::reduction_parallel_for<KernelName>(*this, Range, Properties,
                                                std::forward<RestT>(Rest)...);
   }
@@ -2420,7 +2430,6 @@ public:
   parallel_for(range<3> Range, PropertiesT Properties, RestT &&...Rest) {
     throwIfGraphAssociated<ext::oneapi::experimental::detail::
                                UnsupportedGraphFeatures::sycl_reductions>();
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     detail::reduction_parallel_for<KernelName>(*this, Range, Properties,
                                                std::forward<RestT>(Rest)...);
   }
@@ -2481,7 +2490,6 @@ public:
             int Dims, typename PropertiesT>
   void parallel_for_work_group(range<Dims> NumWorkGroups, PropertiesT Props,
                                _KERNELFUNCPARAM(KernelFunc)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_work_group_lambda_impl<KernelName, KernelType, Dims,
                                         PropertiesT>(NumWorkGroups, Props,
                                                      KernelFunc);
@@ -2492,7 +2500,6 @@ public:
   void parallel_for_work_group(range<Dims> NumWorkGroups,
                                range<Dims> WorkGroupSize, PropertiesT Props,
                                _KERNELFUNCPARAM(KernelFunc)) {
-    throwIfGraphAssociatedAndKernelProperties<PropertiesT>();
     parallel_for_work_group_lambda_impl<KernelName, KernelType, Dims,
                                         PropertiesT>(
         NumWorkGroups, WorkGroupSize, Props, KernelFunc);
@@ -3641,17 +3648,6 @@ private:
                             "handler::require() before it can be used.");
   }
 
-  template <typename PropertiesT>
-  std::enable_if_t<
-      ext::oneapi::experimental::is_property_list<PropertiesT>::value>
-  throwIfGraphAssociatedAndKernelProperties() const {
-    if (!std::is_same_v<PropertiesT,
-                        ext::oneapi::experimental::empty_properties_t>)
-      throwIfGraphAssociated<
-          ext::oneapi::experimental::detail::UnsupportedGraphFeatures::
-              sycl_ext_oneapi_kernel_properties>();
-  }
-
   // Set value of the gpu cache configuration for the kernel.
   void setKernelCacheConfig(sycl::detail::pi::PiKernelCacheConfig);
 
diff --git a/sycl/test-e2e/Basic/built-ins/vec_common.cpp b/sycl/test-e2e/Basic/built-ins/vec_common.cpp
index 4ddb88883090..5cbc1efe98e9 100644
--- a/sycl/test-e2e/Basic/built-ins/vec_common.cpp
+++ b/sycl/test-e2e/Basic/built-ins/vec_common.cpp
@@ -1,7 +1,7 @@
 // RUN: %{build} -o %t.out
 // RUN: %{run} %t.out
-// RUN: %if preview-breaking-changes-supported %{ %clangxx -fsycl -fpreview-breaking-changes %s -o %t2.out %}
-// RUN: %if preview-breaking-changes-supported %{  %{run} %t2.out %}
+// RUN: %if preview-breaking-changes-supported %{ %{build} -fpreview-breaking-changes -o %t2.out %}
+// RUN: %if preview-breaking-changes-supported %{ %{run} %t2.out %}
 
 // https://github.com/intel/llvm/issues/7585 to fix the failure:
 // XFAIL: gpu
@@ -60,6 +60,88 @@ int main() {
 
   // sycl::clamp
   TEST(sycl::clamp, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, va2, va3);
+  TEST(sycl::clamp, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, 1.0f, 3.0f);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::clamp, double, 2, EXPECTED(double, 1.0, 2.0), 0, va4, 1.0, 3.0);
+  }
+  // sycl::degrees
+  TEST(sycl::degrees, float, 3, EXPECTED(float, 180, 180, 180), 0, va5);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::degrees, double, 3, EXPECTED(double, 180, 180, 180), 0, va6);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::degrees, sycl::half, 3, EXPECTED(sycl::half, 180, 180, 180), 0.2,
+         va7);
+  }
+  // sycl::max
+  TEST(sycl::max, float, 2, EXPECTED(float, 3.0f, 2.0f), 0, va1, va3);
+  TEST(sycl::max, float, 2, EXPECTED(float, 1.5f, 2.0f), 0, va1, 1.5f);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::max, double, 2, EXPECTED(double, 1.5, 2.0), 0, va4, 1.5);
+  }
+  // sycl::min
+  TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, va3);
+  TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 1.5f), 0, va1, 1.5f);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::min, double, 2, EXPECTED(double, 1.0, 1.5), 0, va4, 1.5);
+  }
+  // sycl::mix
+  TEST(sycl::mix, float, 2, EXPECTED(float, 1.6f, 2.0f), 0, va1, va3, va8);
+  TEST(sycl::mix, float, 2, EXPECTED(float, 1.4f, 2.0f), 0, va1, va3, 0.2);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::mix, double, 2, EXPECTED(double, 3.0, 5.0), 0, va4, va9, 0.5);
+  }
+  // sycl::radians
+  TEST(sycl::radians, float, 3, EXPECTED(float, M_PI, M_PI, M_PI), 0, va10);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::radians, double, 3, EXPECTED(double, M_PI, M_PI, M_PI), 0, va11);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::radians, sycl::half, 3, EXPECTED(sycl::half, M_PI, M_PI, M_PI),
+         0.002, va12);
+  }
+  // sycl::step
+  TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va1, va3);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::step, double, 2, EXPECTED(double, 1.0, 1.0), 0, va4, va9);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
+         va12, va13);
+  }
+  TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 0.0f), 0, 2.5f, va3);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::step, double, 2, EXPECTED(double, 0.0f, 1.0f), 0, 6.0f, va9);
+  }
+  // sycl::smoothstep
+  TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va8, va1,
+       va2);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0, 1.0f), 0.00000001,
+         va4, va9, va9);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::smoothstep, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0),
+         0, va7, va12, va13);
+  }
+  TEST(sycl::smoothstep, float, 2, EXPECTED(float, 0.0553936f, 0.0f), 0.0000001,
+       2.5f, 6.0f, va3);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::smoothstep, double, 2, EXPECTED(double, 0.0f, 1.0f), 0, 6.0f,
+         8.0f, va9);
+  }
+  // sign
+  TEST(sycl::sign, float, 2, EXPECTED(float, +0.0f, -1.0f), 0, va14);
+  if (dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::sign, double, 2, EXPECTED(double, -0.0, 1.0), 0, va15);
+  }
+  if (dev.has(sycl::aspect::fp16)) {
+    TEST(sycl::sign, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0), 0,
+         va12);
+  }
+
+#ifdef __INTEL_PREVIEW_BREAKING_CHANGES
+  // sycl::clamp swizzled
   TEST(sycl::clamp, float, 2, EXPECTED(float, 3.0f, 2.0f), 0,
        va16.swizzle<1, 0>(), va2, va3);
   TEST(sycl::clamp, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, va2,
@@ -70,59 +152,46 @@ int main() {
        va16.swizzle<1, 0>(), va16.swizzle<1, 0>());
   TEST(sycl::clamp, float, 2, EXPECTED(float, 360.0f, 180.0f), 0,
        va16.swizzle<1, 0>(), va16.swizzle<1, 0>(), va16.swizzle<1, 0>());
-  TEST(sycl::clamp, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, 1.0f, 3.0f);
   TEST(sycl::clamp, float, 2, EXPECTED(float, 3.0f, 3.0f), 0,
        va16.swizzle<1, 0>(), 1.0f, 3.0f);
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::clamp, double, 2, EXPECTED(double, 1.0, 2.0), 0, va4, 1.0, 3.0);
     TEST(sycl::clamp, double, 2, EXPECTED(double, 3.0, 3.0), 0,
          va11.swizzle<1, 0>(), 1.0, 3.0);
   }
-  // sycl::degrees
-  TEST(sycl::degrees, float, 3, EXPECTED(float, 180, 180, 180), 0, va5);
+  // sycl::degrees swizzled
   TEST(sycl::degrees, float, 2, EXPECTED(float, 180, 180), 0,
        va5.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::degrees, double, 3, EXPECTED(double, 180, 180, 180), 0, va6);
     TEST(sycl::degrees, double, 2, EXPECTED(double, 180, 180), 0,
          va6.swizzle<1, 0>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::degrees, sycl::half, 3, EXPECTED(sycl::half, 180, 180, 180), 0.2,
-         va7);
     TEST(sycl::degrees, sycl::half, 2, EXPECTED(sycl::half, 180, 180), 0.2,
          va7.swizzle<1, 0>());
   }
-  // sycl::max
-  TEST(sycl::max, float, 2, EXPECTED(float, 3.0f, 2.0f), 0, va1, va3);
+  // sycl::max swizzled
   TEST(sycl::max, float, 2, EXPECTED(float, 360.0f, 180.0f), 0,
        va16.swizzle<1, 0>(), va3);
   TEST(sycl::max, float, 2, EXPECTED(float, 360.0f, 180.0f), 0, va1,
        va16.swizzle<1, 0>());
-  TEST(sycl::max, float, 2, EXPECTED(float, 1.5f, 2.0f), 0, va1, 1.5f);
   TEST(sycl::max, float, 2, EXPECTED(float, 360.0f, 190.0f), 0,
        va16.swizzle<1, 0>(), 190.0f);
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::max, double, 2, EXPECTED(double, 1.5, 2.0), 0, va4, 1.5);
     TEST(sycl::max, double, 2, EXPECTED(double, 360.0, 190.0), 0,
          va17.swizzle<1, 0>(), 190.0);
   }
-  // sycl::min
-  TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1, va3);
+  // sycl::min swizzled
   TEST(sycl::min, float, 2, EXPECTED(float, 3.0f, 2.0f), 0,
        va16.swizzle<1, 0>(), va3);
   TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 2.0f), 0, va1,
        va16.swizzle<1, 0>());
-  TEST(sycl::min, float, 2, EXPECTED(float, 1.0f, 1.5f), 0, va1, 1.5f);
   TEST(sycl::min, float, 2, EXPECTED(float, 190.0f, 180.0f), 0,
        va16.swizzle<1, 0>(), 190.0f);
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::min, double, 2, EXPECTED(double, 1.0, 1.5), 0, va4, 1.5);
     TEST(sycl::min, double, 2, EXPECTED(double, 190.0f, 180.0f), 0,
          va17.swizzle<1, 0>(), 190.0);
   }
-  // sycl::mix
-  TEST(sycl::mix, float, 2, EXPECTED(float, 1.6f, 2.0f), 0, va1, va3, va8);
+  // sycl::mix swizzled
   TEST(sycl::mix, float, 2, EXPECTED(float, 252.9f, 73.2f), 0,
        va16.swizzle<1, 0>(), va3, va8);
   TEST(sycl::mix, float, 2, EXPECTED(float, 252.9f, 73.2f), 0,
@@ -137,9 +206,7 @@ int main() {
        va16.swizzle<1, 0>(), va16.swizzle<1, 0>(), va18.swizzle<0, 1>());
   TEST(sycl::mix, float, 2, EXPECTED(float, 1.6f, 2.0f), 0, va1, va3,
        va18.swizzle<0, 1>());
-  TEST(sycl::mix, float, 2, EXPECTED(float, 1.4f, 2.0f), 0, va1, va3, 0.2);
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::mix, double, 2, EXPECTED(double, 3.0, 5.0), 0, va4, va9, 0.5);
     TEST(sycl::mix, double, 2, EXPECTED(double, 182.5, 94.0), 0,
          va17.swizzle<1, 0>(), va9, 0.5);
     TEST(sycl::mix, double, 2, EXPECTED(double, 180.5, 91.0), 0, va4,
@@ -147,23 +214,18 @@ int main() {
     TEST(sycl::mix, double, 2, EXPECTED(double, 360.0, 180.0), 0,
          va17.swizzle<1, 0>(), va17.swizzle<1, 0>(), 0.5);
   }
-  // sycl::radians
-  TEST(sycl::radians, float, 3, EXPECTED(float, M_PI, M_PI, M_PI), 0, va10);
+  // sycl::radians swizzled
   TEST(sycl::radians, float, 2, EXPECTED(float, M_PI, M_PI), 0,
        va10.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::radians, double, 3, EXPECTED(double, M_PI, M_PI, M_PI), 0, va11);
     TEST(sycl::radians, double, 2, EXPECTED(double, M_PI, M_PI), 0,
          va11.swizzle<1, 0>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::radians, sycl::half, 3, EXPECTED(sycl::half, M_PI, M_PI, M_PI),
-         0.002, va12);
     TEST(sycl::radians, sycl::half, 2, EXPECTED(sycl::half, M_PI, M_PI), 0.002,
          va12.swizzle<1, 0>());
   }
-  // sycl::step
-  TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va1, va3);
+  // sycl::step swizzled
   TEST(sycl::step, float, 2, EXPECTED(float, 0.0f, 0.0f), 0,
        va16.swizzle<1, 0>(), va3);
   TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va1,
@@ -171,7 +233,6 @@ int main() {
   TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 1.0f), 0,
        va16.swizzle<1, 0>(), va16.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::step, double, 2, EXPECTED(double, 1.0, 1.0), 0, va4, va9);
     TEST(sycl::step, double, 2, EXPECTED(double, 0.0, 0.0), 0,
          va17.swizzle<1, 0>(), va9);
     TEST(sycl::step, double, 2, EXPECTED(double, 1.0, 1.0), 0, va4,
@@ -180,8 +241,6 @@ int main() {
          va17.swizzle<1, 0>(), va17.swizzle<1, 0>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
-         va12, va13);
     TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
          va12.swizzle<0, 1, 2>(), va13);
     TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
@@ -189,17 +248,13 @@ int main() {
     TEST(sycl::step, sycl::half, 3, EXPECTED(sycl::half, 1.0, 0.0, 1.0), 0,
          va12.swizzle<0, 1, 2>(), va13.swizzle<0, 1, 2>());
   }
-  TEST(sycl::step, float, 2, EXPECTED(float, 1.0f, 0.0f), 0, 2.5f, va3);
   TEST(sycl::step, float, 2, EXPECTED(float, 0.0f, 1.0f), 0, 2.5f,
        va3.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::step, double, 2, EXPECTED(double, 0.0f, 1.0f), 0, 6.0f, va9);
     TEST(sycl::step, double, 2, EXPECTED(double, 1.0f, 0.0f), 0, 6.0f,
          va9.swizzle<1, 0>());
   }
-  // sycl::smoothstep
-  TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va8, va1,
-       va2);
+  // sycl::smoothstep swizzled
   TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0,
        va8.swizzle<0, 1>(), va1, va2);
   TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0, va8,
@@ -215,8 +270,6 @@ int main() {
   TEST(sycl::smoothstep, float, 2, EXPECTED(float, 1.0f, 1.0f), 0,
        va8.swizzle<0, 1>(), va1.swizzle<0, 1>(), va2.swizzle<0, 1>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0, 1.0f), 0.00000001,
-         va4, va9, va9);
     TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0, 1.0f), 0.00000001,
          va4.swizzle<0, 1>(), va9, va9);
     TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0, 1.0f), 0.00000001,
@@ -233,8 +286,6 @@ int main() {
          va4.swizzle<0, 1>(), va9.swizzle<0, 1>(), va9.swizzle<0, 1>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::smoothstep, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0),
-         0, va7, va12, va13);
     TEST(sycl::smoothstep, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0),
          0, va7.swizzle<0, 1, 2>(), va12, va13);
     TEST(sycl::smoothstep, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0),
@@ -251,8 +302,6 @@ int main() {
          0, va7.swizzle<0, 1, 2>(), va12.swizzle<0, 1, 2>(),
          va13.swizzle<0, 1, 2>());
   }
-  TEST(sycl::smoothstep, float, 2, EXPECTED(float, 0.0553936f, 0.0f), 0.0000001,
-       2.5f, 6.0f, va3);
   TEST(sycl::smoothstep, float, 2, EXPECTED(float, 0.0f, 0.0553936f), 0.0000001,
        2.5f, 6.0f, va3.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
@@ -261,21 +310,18 @@ int main() {
     TEST(sycl::smoothstep, double, 2, EXPECTED(double, 1.0f, 0.0f), 0, 6.0f,
          8.0f, va9.swizzle<1, 0>());
   }
-  // sign
-  TEST(sycl::sign, float, 2, EXPECTED(float, +0.0f, -1.0f), 0, va14);
+  // sign swizzled
   TEST(sycl::sign, float, 2, EXPECTED(float, -1.0f, +0.0f), 0,
        va14.swizzle<1, 0>());
   if (dev.has(sycl::aspect::fp64)) {
-    TEST(sycl::sign, double, 2, EXPECTED(double, -0.0, 1.0), 0, va15);
     TEST(sycl::sign, double, 2, EXPECTED(double, 1.0, -0.0), 0,
          va15.swizzle<1, 0>());
   }
   if (dev.has(sycl::aspect::fp16)) {
-    TEST(sycl::sign, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0), 0,
-         va12);
     TEST(sycl::sign, sycl::half, 3, EXPECTED(sycl::half, 1.0, 1.0, 1.0), 0,
          va12.swizzle<2, 1, 0>());
   }
+#endif // __INTEL_PREVIEW_BREAKING_CHANGES
 
   return 0;
 }
diff --git a/sycl/test-e2e/Basic/built-ins/vec_geometric.cpp b/sycl/test-e2e/Basic/built-ins/vec_geometric.cpp
index 0d0acdee0c82..5b9e6fdad06e 100644
--- a/sycl/test-e2e/Basic/built-ins/vec_geometric.cpp
+++ b/sycl/test-e2e/Basic/built-ins/vec_geometric.cpp
@@ -1,7 +1,7 @@
 // RUN: %{build} -o %t.out
 // RUN: %{run} %t.out
-// RUN: %if preview-breaking-changes-supported %{ %clangxx -fsycl -fpreview-breaking-changes %s -o %t2.out %}
-// RUN: %if preview-breaking-changes-supported %{  %{run} %t2.out %}
+// RUN: %if preview-breaking-changes-supported %{ %{build} -fpreview-breaking-changes -o %t2.out %}
+// RUN: %if preview-breaking-changes-supported %{ %{run} %t2.out %}
 
 // https://github.com/intel/llvm/issues/7585 to fix the failure:
 // XFAIL: gpu
@@ -69,70 +69,38 @@ int main() {
        VFloatD3_2);
   TEST(sycl::cross, float, 4, EXPECTED(float, -1.f, -4.f, 3.f, 0.f), 0,
        VFloatD4, VFloatD4_2);
-  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0,
-       VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
-  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0, VFloatD3,
-       VFloatD4_2.swizzle<0, 1, 2>());
-  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0,
-       VFloatD4.swizzle<0, 1, 2>(), VFloatD4_2.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
          VDoubleD3, VDoubleD3_2);
     TEST(sycl::cross, double, 4, EXPECTED(double, -1.f, -4.f, 3.f, 0.f), 0,
          VDoubleD4, VDoubleD4_2);
-    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
-         VDoubleD3, VDoubleD4_2.swizzle<0, 1, 2>());
-    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
-         VDoubleD4.swizzle<0, 1, 2>(), VDoubleD3_2);
-    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
-         VDoubleD4.swizzle<0, 1, 2>(), VDoubleD4_2.swizzle<0, 1, 2>());
   }
 
   TEST2(sycl::dot, float, 13.f, 0, VFloatD2, VFloatD2_2);
   TEST2(sycl::dot, float, 32.f, 0, VFloatD3, VFloatD3_2);
   TEST2(sycl::dot, float, 48.f, 0, VFloatD4, VFloatD4_2);
-  TEST2(sycl::dot, float, 32.f, 0, VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
-  TEST2(sycl::dot, float, 32.f, 0, VFloatD3, VFloatD4_2.swizzle<0, 1, 2>());
-  TEST2(sycl::dot, float, 32.f, 0, VFloatD4.swizzle<0, 1, 2>(),
-        VFloatD4_2.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST2(sycl::dot, double, 13, 0, VDoubleD2, VDoubleD2_2);
     TEST2(sycl::dot, double, 32, 0, VDoubleD3, VDoubleD3_2);
     TEST2(sycl::dot, double, 48, 0, VDoubleD4, VDoubleD4_2);
-    TEST2(sycl::dot, double, 32, 0, VDoubleD4.swizzle<0, 1, 2>(), VDoubleD3_2);
-    TEST2(sycl::dot, double, 32, 0, VDoubleD3, VDoubleD4_2.swizzle<0, 1, 2>());
-    TEST2(sycl::dot, double, 32, 0, VDoubleD4.swizzle<0, 1, 2>(),
-          VDoubleD4_2.swizzle<0, 1, 2>());
   }
 
   TEST2(sycl::length, float, 2.236068f, 1e-6, VFloatD2);
   TEST2(sycl::length, float, 3.741657f, 1e-6, VFloatD3);
   TEST2(sycl::length, float, 5.477225f, 1e-6, VFloatD4);
-  TEST2(sycl::length, float, 3.741657f, 1e-6, VFloatD4.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST2(sycl::length, double, 2.236068, 1e-6, VDoubleD2);
     TEST2(sycl::length, double, 3.741657, 1e-6, VDoubleD3);
     TEST2(sycl::length, double, 5.477225, 1e-6, VDoubleD4);
-    TEST2(sycl::length, double, 3.741657, 1e-6, VDoubleD4.swizzle<0, 1, 2>());
   }
 
   TEST2(sycl::distance, float, 3.605551f, 1e-6, VFloatD2, VFloatD2_2);
   TEST2(sycl::distance, float, 5.f, 0, VFloatD3, VFloatD3_2);
   TEST2(sycl::distance, float, 5.f, 0, VFloatD4, VFloatD4_2);
-  TEST2(sycl::distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
-  TEST2(sycl::distance, float, 5.f, 0, VFloatD3, VFloatD4_2.swizzle<0, 1, 2>());
-  TEST2(sycl::distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
-        VFloatD4_2.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST2(sycl::distance, double, 3.605551, 1e-6, VDoubleD2, VDoubleD2_2);
     TEST2(sycl::distance, double, 5.0, 0, VDoubleD3, VDoubleD3_2);
     TEST2(sycl::distance, double, 5.0, 0, VDoubleD4, VDoubleD4_2);
-    TEST2(sycl::distance, double, 5.0, 0, VDoubleD4.swizzle<0, 1, 2>(),
-          VDoubleD3_2);
-    TEST2(sycl::distance, double, 5.0, 0, VDoubleD3,
-          VDoubleD4_2.swizzle<0, 1, 2>());
-    TEST2(sycl::distance, double, 5.0, 0, VDoubleD4.swizzle<0, 1, 2>(),
-          VDoubleD4_2.swizzle<0, 1, 2>());
   }
 
   TEST(sycl::normalize, float, 2, EXPECTED(float, 0.447213f, 0.894427f), 1e-6,
@@ -142,8 +110,6 @@ int main() {
   TEST(sycl::normalize, float, 4,
        EXPECTED(float, 0.182574f, 0.365148f, 0.547723f, 0.730297f), 1e-6,
        VFloatD4);
-  TEST(sycl::normalize, float, 3, EXPECTED(float, 0.267261, 0.534522, 0.801784),
-       1e-6, VFloatD4.swizzle<0, 1, 2>());
   if (Dev.has(sycl::aspect::fp64)) {
     TEST(sycl::normalize, double, 2, EXPECTED(double, 0.447213, 0.894427), 1e-6,
          VDoubleD2);
@@ -152,25 +118,15 @@ int main() {
     TEST(sycl::normalize, double, 4,
          EXPECTED(double, 0.182574, 0.365148, 0.547723, 0.730297), 1e-6,
          VDoubleD4);
-    TEST(sycl::normalize, double, 3,
-         EXPECTED(double, 0.267261, 0.534522, 0.801784), 1e-6,
-         VDoubleD4.swizzle<0, 1, 2>());
   }
 
   TEST2(sycl::fast_distance, float, 3.605551f, 1e-6, VFloatD2, VFloatD2_2);
   TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD3, VFloatD3_2);
   TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4, VFloatD4_2);
-  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
-        VFloatD3_2);
-  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD3,
-        VFloatD4_2.swizzle<0, 1, 2>());
-  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
-        VFloatD4_2.swizzle<0, 1, 2>());
 
   TEST2(sycl::fast_length, float, 2.236068f, 1e-6, VFloatD2);
   TEST2(sycl::fast_length, float, 3.741657f, 1e-6, VFloatD3);
   TEST2(sycl::fast_length, float, 5.477225f, 1e-6, VFloatD4);
-  TEST2(sycl::fast_length, float, 3.741657f, 1e-6, VFloatD4.swizzle<0, 1, 2>());
 
   TEST(sycl::fast_normalize, float, 2, EXPECTED(float, 0.447213f, 0.894427f),
        1e-3, VFloatD2);
@@ -179,9 +135,73 @@ int main() {
   TEST(sycl::fast_normalize, float, 4,
        EXPECTED(float, 0.182574f, 0.365148f, 0.547723f, 0.730297f), 1e-3,
        VFloatD4);
+
+#ifdef __INTEL_PREVIEW_BREAKING_CHANGES
+  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0,
+       VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
+  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0, VFloatD3,
+       VFloatD4_2.swizzle<0, 1, 2>());
+  TEST(sycl::cross, float, 3, EXPECTED(float, -1.f, -4.f, 3.f), 0,
+       VFloatD4.swizzle<0, 1, 2>(), VFloatD4_2.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
+         VDoubleD3, VDoubleD4_2.swizzle<0, 1, 2>());
+    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
+         VDoubleD4.swizzle<0, 1, 2>(), VDoubleD3_2);
+    TEST(sycl::cross, double, 3, EXPECTED(double, -1.f, -4.f, 3.f), 0,
+         VDoubleD4.swizzle<0, 1, 2>(), VDoubleD4_2.swizzle<0, 1, 2>());
+  }
+
+  TEST2(sycl::dot, float, 32.f, 0, VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
+  TEST2(sycl::dot, float, 32.f, 0, VFloatD3, VFloatD4_2.swizzle<0, 1, 2>());
+  TEST2(sycl::dot, float, 32.f, 0, VFloatD4.swizzle<0, 1, 2>(),
+        VFloatD4_2.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST2(sycl::dot, double, 32, 0, VDoubleD4.swizzle<0, 1, 2>(), VDoubleD3_2);
+    TEST2(sycl::dot, double, 32, 0, VDoubleD3, VDoubleD4_2.swizzle<0, 1, 2>());
+    TEST2(sycl::dot, double, 32, 0, VDoubleD4.swizzle<0, 1, 2>(),
+          VDoubleD4_2.swizzle<0, 1, 2>());
+  }
+
+  TEST2(sycl::length, float, 3.741657f, 1e-6, VFloatD4.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST2(sycl::length, double, 3.741657, 1e-6, VDoubleD4.swizzle<0, 1, 2>());
+  }
+
+  TEST2(sycl::distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(), VFloatD3_2);
+  TEST2(sycl::distance, float, 5.f, 0, VFloatD3, VFloatD4_2.swizzle<0, 1, 2>());
+  TEST2(sycl::distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
+        VFloatD4_2.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST2(sycl::distance, double, 5.0, 0, VDoubleD4.swizzle<0, 1, 2>(),
+          VDoubleD3_2);
+    TEST2(sycl::distance, double, 5.0, 0, VDoubleD3,
+          VDoubleD4_2.swizzle<0, 1, 2>());
+    TEST2(sycl::distance, double, 5.0, 0, VDoubleD4.swizzle<0, 1, 2>(),
+          VDoubleD4_2.swizzle<0, 1, 2>());
+  }
+
+  TEST(sycl::normalize, float, 3, EXPECTED(float, 0.267261, 0.534522, 0.801784),
+       1e-6, VFloatD4.swizzle<0, 1, 2>());
+  if (Dev.has(sycl::aspect::fp64)) {
+    TEST(sycl::normalize, double, 3,
+         EXPECTED(double, 0.267261, 0.534522, 0.801784), 1e-6,
+         VDoubleD4.swizzle<0, 1, 2>());
+  }
+
+  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
+        VFloatD3_2);
+  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD3,
+        VFloatD4_2.swizzle<0, 1, 2>());
+  TEST2(sycl::fast_distance, float, 5.f, 0, VFloatD4.swizzle<0, 1, 2>(),
+        VFloatD4_2.swizzle<0, 1, 2>());
+
+  TEST2(sycl::fast_length, float, 3.741657f, 1e-6, VFloatD4.swizzle<0, 1, 2>());
+
   TEST(sycl::fast_normalize, float, 3,
        EXPECTED(float, 0.267261f, 0.534522f, 0.801784f), 1e-3,
        VFloatD4.swizzle<0, 1, 2>());
+#endif // __INTEL_PREVIEW_BREAKING_CHANGES
 
   return 0;
 }
diff --git a/sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp b/sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp
index f12bce4091a2..1e668d47356a 100644
--- a/sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp
+++ b/sycl/test-e2e/Graph/RecordReplay/buffer_ordering.cpp
@@ -5,8 +5,8 @@
 
 // RUN: %{build} -o %t.out
 // RUN: %{run} %t.out
-// Extra run to check for leaks in Level Zero using ZE_DEBUG
-// RUN: %if ext_oneapi_level_zero %{env ZE_DEBUG=4 %{run} %t.out 2>&1 | FileCheck %s %}
+// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
+// RUN: %if level_zero %{env UR_L0_LEAKS_DEBUG=1 %{run} %t.out 2>&1 | FileCheck %s %}
 //
 // CHECK-NOT: LEAK
 
diff --git a/sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp b/sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp
index eef91f0258e4..7548bf71f24d 100644
--- a/sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp
+++ b/sycl/test-e2e/Matrix/XMX8/element_wise_all_sizes.cpp
@@ -7,6 +7,10 @@
 //===----------------------------------------------------------------------===//
 // REQUIRES: matrix-xmx8
 
+// TODO: Currently fails and regularly times out on DG2. Re-enable when this has
+//       been addressed.
+// UNSUPPORTED: gpu-intel-dg2
+
 // RUN: %{build} -o %t.out
 // RUN: %{run} %t.out
 
diff --git a/sycl/test-e2e/format.py b/sycl/test-e2e/format.py
index c44fadde2680..4060e0c99ab2 100644
--- a/sycl/test-e2e/format.py
+++ b/sycl/test-e2e/format.py
@@ -12,9 +12,9 @@ import os
 import re
 
 def get_triple(test, backend):
-    if backend == 'ext_oneapi_cuda':
+    if backend == 'cuda':
         return 'nvptx64-nvidia-cuda-syclmlir'
-    if backend == 'ext_oneapi_hip':
+    if backend == 'hip':
         if test.config.hip_platform == 'NVIDIA':
             return 'nvptx64-nvidia-cuda-syclmlir'
         else:
@@ -164,10 +164,13 @@ class SYCLEndToEndTest(lit.formats.ShTest):
             # so that device might still be accessible to some of the tests yet
             # we won't set the environment variable below for such scenario.
             extra_env = []
-            if 'ext_oneapi_level_zero:gpu' in sycl_devices and litConfig.params.get('ze_debug'):
-                extra_env.append('ZE_DEBUG={}'.format(test.config.ze_debug))
+            if 'level_zero:gpu' in sycl_devices and litConfig.params.get('ur_l0_debug'):
+                extra_env.append('UR_L0_DEBUG={}'.format(test.config.ur_l0_debug))
 
-            if 'ext_oneapi_cuda:gpu' in sycl_devices:
+            if 'level_zero:gpu' in sycl_devices and litConfig.params.get('ur_l0_leaks_debug'):
+                extra_env.append('UR_L0_LEAKS_DEBUG={}'.format(test.config.ur_l0_leaks_debug))
+
+            if 'cuda:gpu' in sycl_devices:
                 extra_env.append('SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1')
 
             return extra_env
@@ -203,9 +206,9 @@ class SYCLEndToEndTest(lit.formats.ShTest):
                 # Expand device-specific condtions (%if ... %{ ... %}).
                 tmp_script = [ cmd ]
                 conditions = {x: True for x in sycl_device.split(':')}
-                for op_sys in ['linux', 'windows']:
-                    if op_sys in test.config.available_features:
-                        conditions[op_sys] = True
+                for cond_features in ['linux', 'windows', 'preview-breaking-changes-supported']:
+                    if cond_features in test.config.available_features:
+                        conditions[cond_features] = True
 
                 tmp_script = lit.TestRunner.applySubstitutions(
                     tmp_script, [], conditions, recursion_limit=test.config.recursiveExpansionLimit)

69e5edd broke CI as it was not running on `sycl-mlir` PRs.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Add new:
- `Basic/image/image_accessor_readwrite.cpp`
- `Basic/image/image_write_fp16.cpp`
- `Basic/swizzle_op.cpp`
- `Basic/vector/int-convert.cpp`
- `InvokeSimd/Spec/ImplicitSubgroup/uniform_retval.cpp`
- `Regression/vec_logical_ops.cpp`
- `Regression/vec_rel_swizzle_ops.cpp`

Drop:
- `Plugin/level_zero_queue_profiling.cpp`
- `Plugin/level_zero_usm_residency.cpp`

Rename:
- `sycl/test-e2e/Basic/vector/vec_bool.cpp` ->
  `sycl/test-e2e/Basic/vector/bool.cpp`
- `sycl/test-e2e/Basic/vector/vector_operators.cpp` ->
  `sycl/test-e2e/Basic/vector/operators.cpp`
- `sycl/test-e2e/Basic/vector/scalar_vec_access.cpp` ->
  `sycl/test-e2e/Basic/vector/scalar_access.cpp`

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
@victor-eds victor-eds marked this pull request as ready for review January 5, 2024 11:18
@victor-eds victor-eds requested a review from a team as a code owner January 5, 2024 11:18
@victor-eds
Copy link
Contributor Author

@victor-eds victor-eds merged commit e0df215 into intel:sycl-mlir Jan 5, 2024
15 checks passed
@victor-eds victor-eds deleted the rebase branch January 5, 2024 14:59
@victor-eds
Copy link
Contributor Author

Hi @keryell!
It was a big one, but we have merged DPC++ commits until 0913045 in SYCL-MLIR. We have a couple of regressions I mention in this PR. Only the -O0 one is user-facing. Thanks for your efforts trying to rebase. Feel free to close that PR now 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.