Skip to content

Conversation

@SF-N
Copy link
Owner

@SF-N SF-N commented Mar 19, 2025

No description provided.

tehrengruber and others added 30 commits October 11, 2024 18:41
Adds a pass that transforms expressions like
```
as_fieldop(λ(__arg0, __arg1) → ·__arg0 + ·__arg1, c⟨ IDimₕ: [0, 1) ⟩)(
  as_fieldop(λ(__arg0, __arg1) → ·__arg0 × ·__arg1, c⟨ IDimₕ: [0, 1) ⟩)(inp1, inp2), inp3
)
```
into
```
as_fieldop(λ(inp1, inp2, inp3) → ·inp1 × ·inp2 + ·inp3, c⟨ IDimₕ: [0, 1) ⟩)(inp1, inp2, inp3)
```
…1688)

Add IR transformation that removes cast expressions where the argument is
already in the target type.
In #1531 the `itir.Node` class got a `type` attribute, that until now
contributed to the hash computation of all nodes. As such two
`itir.SymRef` with the same `id`, but one with a type inferred and one
without (i.e. `None`) got a different hash value. Consequently the
`inline_lambda` pass did not recognize them as a reference to the same
symbol and erroneously inlined the expression even with
`opcount_preserving=True`. This PR fixes the hash computation, such that
again `node1 == node2` implies `hash(node1) == hash(node2)`.
…op` (#1689)

In case we don't have a domain argument to `as_fieldop` we can not infer
the exact result type. In order to still allow some passes which don't
need this information to run before the domain inference, we continue
with a dummy domain. One example is the CollapseTuple pass which only
needs information about the structure, e.g. how many tuple elements does
this node have, but not the dimensions of a field.

Note that it might appear as if using the TraceShift pass would allow us
to deduce the return type of `as_fieldop` without a domain, but this is
not the case, since we don't have information on the ordering of
dimensions. In this example
```
as_fieldop(it1, it2 -> deref(it1) + deref(it2))(i_field, j_field)
```
it is unclear if the result has dimension I, J or J, I.
…ectivity) (#1683)

This PR adds support for lowering of `map_` and `make_const_list`
builtin functions. However, the current implementation only supports
neighbor tables with full connectivity (no skip values). The support for
skip values will be added in next PR.

To be noted:
- This PR generalizes the handling of tasklets without arguments inside
a map scope. The return type for `input_connections` is extended to
contain a `TaskletConnection` variant, which is lowered to an empty edge
from map entry node to the tasklet node.
- The result of `make_const_list` is a scalar value to be broadcasted on
a local field. However, in order to keep the lowering simple, this value
is represented as a 1D 1-element array (`shape=(1,)`).
…1693)

This PR implements the lowering of expressions like:
`as_fieldop(deref, domain)(x[0] + (x[1][0] × 2.0 + x[1][1] × 3.0))`
where the elements of tuple `x` being accessed are all scalar values, so
`x[0] + (x[1][0] × 2.0 + x[1][1] × 3.0)` is a scalar expression.

Therefore, this PR allows to lower expressions where the argument to
`as_field_op` is a scalar expression rather than a lambda. Note that:
- in case of a lambda, the `as_fieldop` function computes the lambda
over the field domain;
- in case of scalar expression, it broadcasts the scalar value over the
field domain.

Additionally, some cleanup:
- Removed call to `gt_simplify` so that dace backend returns a plain
unoptimized SDFG.
- Removed `patch_gtir`, since the functionality of this IR pass is
provided by similar changes delivered in #1677 and #1689.
DaCe backends are optional backends in the cartesian version of GT4Py.
Currently, we silently drop support for DaCe backends if DaCe can't be
imported. This can can happen because DaCe isn't available or for a
couple other reasons (e.g. a circular import in the import path). With
this PR we thus add a warning message allowing developers to easily
figure out that/why DaCe backends are disabled.
New temporary extraction pass. Transforms an `itir.Program` like
```
testee(inp, out) {
  out @ c⟨ IDimₕ: [0, 1) ⟩
       ← as_fieldop(deref, c⟨ IDimₕ: [0, 1) ⟩)(as_fieldop(deref, c⟨ IDimₕ: [0, 1) ⟩)(inp));
}
```
into
```
testee(inp, out) {
  __tmp_1 = temporary(domain=c⟨ IDimₕ: [0, 1) ⟩, dtype=float64);
  __tmp_1 @ c⟨ IDimₕ: [0, 1) ⟩ ← as_fieldop(deref, c⟨ IDimₕ: [0, 1) ⟩)(inp);
  out @ c⟨ IDimₕ: [0, 1) ⟩ ← as_fieldop(deref, c⟨ IDimₕ: [0, 1) ⟩)(__tmp_1);
}
```
Note that this pass intentionally unconditionally extracts. In case you
don't want a temporary you should fuse the `as_fieldop` before. As such
the fusion pass (see #1670)
contains the heuristics on what to fuse.
This PR simplifies the representation of program scalar arguments in the
SDFG: instead of promoting them to symbols, we represent them as scalar
data. Scalar to symbol promotion is a transformation pass available in
DaCe that should be applied after lowering to SDFG, eventually.

Additional changes:
- Fixes a problem in naming of input connectors for scalar expression
tasklets: the connector name cannot match the argument name, otherwise
we cannot pass the same value to two connectors in expressions like `out
= tmp * tmp`.
- Only propagate to lambda scope the symbols that are referenced.
#1694)

This PR extends the solution for map-reduce provided in #1683 with the
support for connectivity tables with skip values.

The field definition is extended with a `local_offset` attribute that
stores the offset provider used to build the values in the local
dimension. In case the local dimension is built by the `neighbors`
expression, the `local_offset` corresponds to the offset provider used
to access the neighbor dimension. Since this information is carried
along the data itself, whenever the data is accessed it is also possible
to access the corresponding offset provider and check whether the
neighbor index is valid or if there is a skip value. For local
dimensions already present in the program argument, this information is
retrieved from the field domain (enabled in new test case).

The data is accessed in the `map_` and `reduce` expressions. Here it is
now possible to check for skip values. Therefore, the main objective of
this PR is the lowering of map-reduce with skip values.

A secondary objective is to pave the road to simplify the lowering
logic, by getting rid of the `reduce_identity` value. The current
approach is propagate the `reduce_identity` value while visiting the
arguments to `reduce` expressions. By introducing `local_offset`, the
argument visitor will return the information needed to implement
`reduce` of local values in presence of skip values.
Wrap every broadcast in an `as_fieldop` (not only scalars). The
materialization of intermediate broadcasted fields need to be optimized
by transformations.
When making a bad call to at a `gtscript.function` we add the function
name in the error message for quick reference.
Casting the condition expression on an inter-state edge to `bool`
prevented dead-state elimination in the SDFG in case the expression was
specialized to True/False.
This PR implements an alternative design to local expressions on lists
containing skip values. The previous design was based on the assumption
that such operations only exist in the context of a reduction (which is
the most common case, but necessarily the only one) and was carrying the
reduce identity value from a reduce node to the visiting context of the
arguments and all their child nodes. The reduce identity value was used
till fill the skip values when building lists of neighbors or when
applying map expressions. Although quite effective for reduce
expressions, this design led to a very complicated code. Besides, this
design was mixing lowering with optimization, by implicitly optimizing
the SDFG.

With this PR, we stop carrying the reduce identity to the arguments
context. Instead, in presence of skip values we just write a dummy
value. When it is time to reduce, we override the dummy value with the
reduce identity. In this way, the reduce identity is only used in the
context of the reduction expression.
This PR fixes one corner case of nested let-statements, discovered in
`test_tuple_unpacking_star_multi` during GTIR integration. Test case
added.

Additionally, fixed handling of symbol already defined in SDFG for
#1695.
Fix lowering of `as_fieldop` with broadcast expression after changes in
PR #1701.

Additional change:
- Add support for GT4Py zero-dimensional fields, equivalent of numpy
zero-dimensional arrays. Test case added.
… when buiding the SDFG for DaCe (#1709)

When building an SDFG for DaCe it is sometimes necessary to specify the
starting point of the graph. In the past, `is_start_state=` was used for
this. In never versions of Dace, `is_start_state` has been deprecated
and replaced with `is_start_block`. This PR replaces the two occurrences
in the `cartesian/` folder and removes warnings generated during test
runs. It looks like the `next/` folder has already been cleaned.
The license file configured in `pyproject.toml` was missing the `*.txt`
extension, leading to warnings in tests.

Parent: GEOS-ESM/SMT-Nebulae#89
This PR updates the GitHub Actions (GHA) workflows to use
`actions/checkout@v4` instead of `v3` (or `v2` in the cartesian case)
and `actions/setup-python@v5`. Development for `actions/checkout@v3` and
`actions/setup-python@v4` stopped ~1 year ago and GH is currently
enforcing newer node versions than the one that these actions were
designed with, leading to the following warnings

![image](https://github.com/user-attachments/assets/52924274-a00c-451c-a5a3-810fba1e6b27)
_warnings in next workflows_


![image](https://github.com/user-attachments/assets/af7f0027-f6d1-4998-a28a-c76c13dcaebb)
_warnings in cartesian workflows_

`deploy_release` action was following `actions/checkout@master`. Was
this on purpose? Happy to revert if so. Unless there's a good reason, I
suggest to keep all actions pinned at ideally the same major version.

`pre-commit/action` was updated to keep its dependencies up to date and
avoid transitive warnings similar to the ones above.

No changes made to currently disabled workflows, i.e. the ones under
`.github/workflows/_disabled/`.

Parent: GEOS-ESM/SMT-Nebulae#89
Use the `_map` function in all cases where mapping of with
as_fieldop/lifted stencil *and* mapping of lists is required.
updates minimum gridtools_cpp version for #1699
#1703)

Lists returned from an `as_fieldop`ed stencil will be turned into a local dimension of the the resulting field.
In case of a `make_const_list`, a magic local dimension `_CONST_DIM` is used. This is a hack, but localized to `itir.embedded`. A clean implementation will probably involve to tag the `make_const_list` with the neighborhood it is meant to be used with.
A couple of tests were skipped in the DaCe backend with issue
#1084 because they were
throwing compiler errors. Current versions of DaCe can handle these
cases again. Re-enabling the test cases.
DaCe requires C-compatible strings for the names of data containers,
such as arrays and scalars. GT4Py uses a unicode symbols (`ᐞ`) as name
separator in the SSA pass, which generates invalid symbols for DaCe.
This PR introduces a helper method to find new names for invalid symbols
present in the IR.
Adds index builtin for embedded and gtfn backends.
DaCe's `MapReduceFusion` and `MapWCRFusion` are interesting as they move
the initialization of the reduction accumulator away, which enables more
fusion.
However, they currently have a bug, as they assume that the reduction
node is in the global scope and not inside a map scope.
edopao and others added 29 commits February 18, 2025 09:10
The current mapping from GTIR logical operators to python code was
inconsistent. The mapping was using bitwise operators instead of logical
ones. This still resulted in functionally correct code because the
boolean type has an integer representation in python language. This PR
introduces the correct mapping, and leaves the dace toolchain and target
compiler the possibility to generate optimized code.
…tput of a mapped nested SDFG (#1877)

This PR provides a better fix than the one delivered earlier in
#1828. It adds a check to detect
whether the temporary output data has compile-time or runtime size. In
case of runtime size, the transient array on the output connector of a
mapped nested SDFG is removed. This is needed in order to avoid dynamic
memory allocation inside the cuda kernel that represents a parallel map
scope.
<!--
Delete this comment and add a proper description of the changes
contained in this PR. The text here will be used in the commit message
since the approved PRs are always squash-merged. The preferred format
is:

- PR Title: <type>[<scope>]: <one-line-summary>

    <type>:
- build: Changes that affect the build system or external dependencies
        - ci: Changes to our CI configuration files and scripts
        - docs: Documentation only changes
        - feat: A new feature
        - fix: A bug fix
        - perf: A code change that improves performance
- refactor: A code change that neither fixes a bug nor adds a feature
        - style: Changes that do not affect the meaning of the code
        - test: Adding missing tests or correcting existing tests

    <scope>: cartesian | eve | next | storage
    # ONLY if changes are limited to a specific subsystem

- PR Description:

Description of the main changes with links to appropriate
issues/documents/references/...
-->

## Description

16-bit integers were missing in the set of data types that return true
for `DataType.isinteger()`. Added missing test coverage.

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
…ble k offsets (#1882)

<!--
Delete this comment and add a proper description of the changes
contained in this PR. The text here will be used in the commit message
since the approved PRs are always squash-merged. The preferred format
is:

- PR Title: <type>[<scope>]: <one-line-summary>

    <type>:
- build: Changes that affect the build system or external dependencies
        - ci: Changes to our CI configuration files and scripts
        - docs: Documentation only changes
        - feat: A new feature
        - fix: A bug fix
        - perf: A code change that improves performance
- refactor: A code change that neither fixes a bug nor adds a feature
        - style: Changes that do not affect the meaning of the code
        - test: Adding missing tests or correcting existing tests

    <scope>: cartesian | eve | next | storage
    # ONLY if changes are limited to a specific subsystem

- PR Description:

Description of the main changes with links to appropriate
issues/documents/references/...
-->

## Description

We figured that DaCe backends are currently missing support for casting
in variable k offsets. This PR

- adds a codegen test with a cast in a variable k offset
- adds a node validator for the DaCe backends complaining about missing
for support.
- adds an `xfail` test for the node validator

This should be fixed down the road. Here's the issue
#1881 to keep track.

The PR also has two smaller and unrelated commits

- 741c448 increases test coverage with
another codgen test that has a couple of read after write access
patterns which were breaking the "new bridge" (see
GEOS-ESM/NDSL#53).
- e98ddc5 just forwards all keyword
arguments when visiting offsets. I don't think this was a problem until
now, but it's best practice to forward everything.

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
The problem was that in the auto optimizer the "set stride correctly"
function was called after the "go to GPU transformation". The GPU
transformation will call `CopyToMap` transformation, this is needed such
that we can set the GPU block size and their order. However, the
decision if we call `CopyToMap` also depends on the strides, so we need
to set them first.

---------

Co-authored-by: edopao <edoardo16@gmail.com>
Be strict about tests marked with xfail. If they pass unexpectedly, fail the testsuite instead of logging an XPASS.
Follow-up from PR #1882 to merge
two test cases which only differed in the expected outcome. With this
PR, we'll have one single test case with a conditional `xfail` as
@havogt proposed in the GridTools slack channel.

Related issue: #1881
…1883)

Changes the `InlineCenterDerefLiftVars` pass to respect evaluation order
by lazily evaluating the inlined values.

Consider the following case that is common in boundary conditions:
```
let(var, (↑deref)(it2))(if ·on_bc then 0 else ·var)
```
Then var should only be dereferenced in case `·on_bc` evalutes to False.
Previously we just evaluated all values unconditionally:
```
let(_icdlv_1, ·it)(if ·on_bc then 0 else _icdlv_1)
```
Now we instead create a 0-ary lambda function for `_icdlv_1` and
evaluate it when the value is needed.

```
let(_icdlv_1, λ() → ·it)(if ·on_bc then 0 else _icdlv_1())
```
Note that as a result we do evaluate the function multiple times. To
avoid redundant recompuations usage of the common subexpression
elimination is required.
## Description

The computation of the minimal k-ranges that happen during the
vaildate-args step is allowing for inconsistent computation to happen.
This PR stiffens the requirements on fields:

- intervals `[START+X ,END+y)` are now also considered:
- `interval(3,-1)` requires a minimal size of 5 for the interval to not
be empty
  - `interval(3,None)` now requires a minimal size of 4
- intervals `[START+X, START+Y)` and `[END+X,END+Y)` are not affected. 
- empty intervals are still allowed to have a 0-domain as to accomodate
2-dimensional fields
  - `interval(...)` still requires no k-size

## Requirements

- [x] All fixes and/or new features come with corresponding tests.

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
The transformation `MultiStateGlobalSelfCopyElimination` is very
similar to the `GT4PyGlobalSelfCopyElimination` but they target slightly
different cases. The transformation `GT4PyGlobalSelfCopyElimination`,
which is already there and was renamed to `SingleStateGlobalSelfCopyElimination`,
handles the pattern `(G) -> (T) -> (G)`, i.e. the global data `G` is copied into
the transient `T` is then immediately copied back into `G`.
Because of ADR-18 we know that this has no effect, because `G` is used
as input and output and must therefore be point wise, so `G[i, j]` in
the output can only be `G[i, j]` at the beginning.

The new transformation `MultiStateGlobalSelfCopyElimination` handles a
different case, it looks for patterns `(G) -> (T)` and `(T) -> (G)`,
which is essentially the same, but this time the definition of `T` and
the write back of `T` into `G` does not need to be in the same state.

In the long run, the two transformation should be combined.

---------

Co-authored-by: edopao <edoardo16@gmail.com>
This PR introduces a global ordering relation of dimensions, replacing
the previous mechanism in promote_dims. The ordering relation is:
sorting first by `Dimension.kind` (`HORIZONTAL` < `VERTICAL` < `LOCAL`)
and then lexicographically by `Dimension.value`.

Reason: 
An as_fieldop call as emitted by the frontend has no domain, and
inferring the type of the domain was not possible, since no global
ordering relation of dimensions existed, e.g. for an as_fieldop
operating on a `Vertex` and a `K` field it was unclear if the dimensions
of the output were `(Vertex, K)` or (`K, Vertex)`, which lead to many
negative consequences in other places, that will be tackled in [PR
1853](#1853) and following ones.
## Description

Following the [YAGNI
principle](https://github.com/GridTools/gt4py/blob/main/CODING_GUIDELINES.md#software-design),
stick to the device types that are actually supported in the codebase.
We can add support for other devices later.

@havogt as discussed the other day, let's have a discussion about
whether or not we need all these device types defined. I know that
@FlorianDeconinck would like to work towards over protection and
throwing errors rather sooner than later for unsupported things (very
much in general).

This is part of a clean-up series tracked in issue
#1880.

## Requirements

- [ ] All fixes and/or new features come with corresponding tests.
  N/A
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
## Description

This PR cleans up all the warnings that would be present in the
test_code_generation.
## Description

In preparation for PR #1894, pull
out some refactors and cleanups. Notable in this PR are the changes to
`src/gt4py/cartesian/gtc/dace/oir_to_dace.py`

- visit `stencil.vertical_loops` directly instead of calling
`generic_visit` (simplification since there's nothing else to visit)
- rename library nodes from `f"{sdfg_name}_computation_{id(node)}"` to
`f"{sdfg_name}_vloop_{counter}_{node.loop_order}_{id(node)}"`. This adds
a bit more information (because `sdfg_name` is the same for all library
nodes) and thus simplifies debugging workflows.

Related issue: GEOS-ESM/NDSL#53

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
  covered by existing test suite
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
The `domain` ir maker now only accepts Dimensions, not strings. This
simplifies the typing in some places and is less error prone, since one
can not accidentally create a domain with the wrong kind, e.g. by using
`"KDim"`.

Co-authored-by: Till Ehrengruber <till.ehrengruber@cscs.ch>
We use `CopyToMap` in CUDA lowering for copies between arrays that do
not necessarily have the same strides. This happens in case of
`concat_where`, where we copy a source array into a subset of the
destination array.

For this reason, the option `ignore_strides` must be set to `True`
(`False` by default).
This PR adds the `CopyChainRemover` transformation.

The introduction of `concat_where` introduced a new pattern, which is
essentially a chain of copies.
As an example imagine the case that a domain is split into 3 subdomain.
The result on the first subdomain is stored in `T1`, the one of the
second in `T2` and the one for the third domain in `T3`.
`T1` and `T2` are then copied into `T4`, finally `T4` together with `T3`
are then copied into `T5`.
This transformation will remove `T1`, `T2`, `T3` and `T4` thus the
results will be written into `T5` directly.

There are some limitation, if we have the pattern `(A1) -> (A2)`, then
we eliminate `A1` only if:
- `A1` is fully read; this is to avoid some nasty adjustments of map
bounds.
- There can only be one connection between `A1` and `A2`.

The transformation was added twice to the simplify pass, which allows us
to mitigate DaCe [issue#1959](spcl/dace#1959).

---------

Co-authored-by: edopao <edoardo16@gmail.com>
…#1910)

The issue was discovered by Edoardo (@edopao).
The underlying problem is that `get_src_subset()` tries to set the
direction of the Memlet, for that it looks for the source of the data.
However, when we called it then the source is unspecific because the new
edge is already there and the old one was not removed. This might
implicitly change the direction of the Memlet.

I have removed the calls where they are potentially dangerous and added
some counter measure to avoid the problem.
This PR slightly changes how the GPU transformation works.
It mainly changes how trivial Maps are eliminated.
First, there is now a dedicated function, `gt_remove_trivial_gpu_maps()`
for this. Furthermore, before it was only using the `TrivialGPUMapElimination`
transformation, which tries to promote and fuse the trivial maps, but
only with downstream maps. Now there is a second stage that tries to
fuse the trivial maps together. This is mostly to reduce the number of
kernel calls that we are doing.
Documents describing the contents of a (sub-)folder are commonly named README.md. This is so common that e.g. GitHub will display the contents of a README.md file when looking at the folder.
## Description

Documents describing the contents of a (sub-)folder are commonly named
`README.md`. This is so common that e.g. GitHub will display the
contents of a `README.md` file when looking at the folder, e.g.

https://github.com/GridTools/gt4py/tree/main/docs/development/ADRs

will show the contents of renamed file (after the proposed name change).

## Requirements

- [ ] All fixes and/or new features come with corresponding tests.
  N/A
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
## Description

This PR centralizes the definition of `CUPY_DEVICE_TYPE` in
`_core/definitions`. This effectively de-duplicates the definition of
`CUPY_DEVICE` in gt4py next and cartesian. Fixes a couple of typos along
the way.

Related issue: #1880

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
  Should be covered by existing tests.
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
The following new transformations are introduced:
- _CANONICALIZE_OP_FUNCALL_SYMREF_LITERAL_:
`literal, symref` -> `symref, literal`, prerequisite for
FOLD_FUNCALL_LITERAL, FOLD_NEUTRAL_OP
`literal, funcall` -> `funcall, literal`, prerequisite for
FOLD_FUNCALL_LITERAL, FOLD_NEUTRAL_OP
`funcall, op` -> `op, funcall` for s[0] + (s[0] + 1), prerequisite for
FOLD_MIN_MAX_PLUS
- _CANONICALIZE_MINUS_: `a - b` -> `a + (-b)`, prerequisite for
FOLD_MIN_MAX_PLUS
- _CANONICALIZE_MIN_MAX_: `maximum(a, maximum(...))` ->
`maximum(maximum(...), a)`, prerequisite for
- _FOLD_FUNCALL_LITERAL_:  `(a + 1) + 1` -> `a + (1 + 1)`
- _FOLD_MIN_MAX_:
`maximum(maximum(a, 1), a)` -> `maximum(a, 1)`
`maximum(maximum(a, 1), 1)` -> `maximum(a, 1)`
- FOLD_MIN_MAX_PLUS: `maximum(plus(a, 1), a)` -> `plus(a, 1)`
`maximum(plus(a, 1), plus(a, -1))` -> `plus(a, maximum(1, -1))`
- FOLD_NEUTRAL_OP: `a + 0` -> `a`, `a * 1` -> `a`

In the end, unary minuses are transformed back to `minus`-calls where
possible by `UndoCanonicalizeMinus`:
`a + (-b)` -> `a - b` , `-a + b` -> `b - a`, `-a + (-b)` -> `-a - b`

In addition the pass now transforms until a fixed point is reached (like
in the `CollapseTuple` pass) instead of just transforming once. The
`FixedPointTransformation` class from [PR
1826](#1826) is used here.

Previously, large nested maximum expressions like
`maximum(maximum(maximum(maximum(sym, 1),...), 1), maximum(maximum(sym,
1), 1))` caused timeouts in PMAP-GO as the runtime of the domain
inference increased significantly due to the large domain expressions
that could not be simplified.

This replaces [this PR](tehrengruber#4).

---------

Co-authored-by: Till Ehrengruber <till.ehrengruber@cscs.ch>
## Description

This PR refactors the GT4Py/DaCe bridge to expose control flow elements
(`if` statements and `while` loops) to DaCe. Previously, the whole
contents of a vertical loop was put in one big Tasklet. With this PR,
that Tasklet is broken apart in case control flow is found such that
control flow is visible in the SDFG. This allows DaCe to better analyze
code and will be crucial in future (within the current milestone)
performance optimization work.

The main ideas in this PR are the following

1. Introduce `oir.CodeBlock` to recursively break down
`oir.HorizontalExecution`s into smaller pieces that are either code flow
or evaluated in (smaller) Tasklets.
2. Introduce `dcir.Condition`and `dcir.WhileLoop` to represent if
statements and while loops that are translated into SDFG states. We keep
the current `dcir.MaskStmt` / `dcir.While` for if statements / while
loops inside horizontal regions, which aren't yet exposed to DaCe (see
#1900).
3. Add support for `if` statements and `while` loops in the state
machine of `sdfg_builder.py`
4. We are breaking up vertical loops inside stencils in multiple
Tasklets. It might thus happen that we write a "local" scalar in one
Tasklet and read it in another Tasklet (downstream). We thus create
output connectors for all scalar writes in a Tasklet and input
connectors for all reads (unless previously written in the same
Tasklet).
5. Memlets can't be generated per horizontal execution anymore and need
to be more fine grained. `TaskletAccessInfoCollector` does this work for
us, duplicating some logic in `AccessInfoCollector`. A refactor task has
been logged to fix/re-evaluate this later.

This PR depends on the following (downstream) DaCe fixes

- spcl/dace#1954
- spcl/dace#1955

which have been merged by now.

Follow-up issues

- unrelated changes have been moved to
#1895
- #1896
- #1898
- #1900

Related issue: GEOS-ESM/NDSL#53

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
Added new tests and increased coverage of horizontal regions with PRs
#1807 and #1851.
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
Docs are [in our knowledge
base](https://geos-esm.github.io/SMT-Nebulae/technical/backend/dace-bridge/)
for now. Will be ported.

---------

Co-authored-by: Roman Cattaneo <>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
…USE_HIP` (#1916)

## Description

This PR breaks the dependency cycle between `gt4py.cartesian` and
`gt4py.storage`. The last puzzle piece was the distinction between AMD
and NVIDIA GPUs. This was controlled by `GT4PY_USE_HIP`. With this PR we
re-use `CUPY_DEVICE_TYPE` (for `_core/definitions`) for this purpose.

`GT4PY_USE_HIP` could be set by an environment variable or be
auto-detected. Auto-detection for `GT4PY_USE_HIP` and `CUPY_DEVICE_TYPE`
is the same. And according to @stubbiali, the environment variable was
never[^1] needed because auto-detection worked well. We thus don't
expect issues removing the environment variable.

Related issue: #1880

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
  Assumed to covered by existing tests.
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

[^1]:
#1867 (comment)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
@SF-N SF-N merged commit 9759a1b into SF-N:main Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.