forked from GridTools/gt4py
-
Notifications
You must be signed in to change notification settings - Fork 0
Merge gt4py/main #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Merge gt4py/main #12
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds a pass that transforms expressions like ``` as_fieldop(λ(__arg0, __arg1) → ·__arg0 + ·__arg1, c⟨ IDimₕ: [0, 1) ⟩)( as_fieldop(λ(__arg0, __arg1) → ·__arg0 × ·__arg1, c⟨ IDimₕ: [0, 1) ⟩)(inp1, inp2), inp3 ) ``` into ``` as_fieldop(λ(inp1, inp2, inp3) → ·inp1 × ·inp2 + ·inp3, c⟨ IDimₕ: [0, 1) ⟩)(inp1, inp2, inp3) ```
…1688) Add IR transformation that removes cast expressions where the argument is already in the target type.
In #1531 the `itir.Node` class got a `type` attribute, that until now contributed to the hash computation of all nodes. As such two `itir.SymRef` with the same `id`, but one with a type inferred and one without (i.e. `None`) got a different hash value. Consequently the `inline_lambda` pass did not recognize them as a reference to the same symbol and erroneously inlined the expression even with `opcount_preserving=True`. This PR fixes the hash computation, such that again `node1 == node2` implies `hash(node1) == hash(node2)`.
…op` (#1689) In case we don't have a domain argument to `as_fieldop` we can not infer the exact result type. In order to still allow some passes which don't need this information to run before the domain inference, we continue with a dummy domain. One example is the CollapseTuple pass which only needs information about the structure, e.g. how many tuple elements does this node have, but not the dimensions of a field. Note that it might appear as if using the TraceShift pass would allow us to deduce the return type of `as_fieldop` without a domain, but this is not the case, since we don't have information on the ordering of dimensions. In this example ``` as_fieldop(it1, it2 -> deref(it1) + deref(it2))(i_field, j_field) ``` it is unclear if the result has dimension I, J or J, I.
…ectivity) (#1683) This PR adds support for lowering of `map_` and `make_const_list` builtin functions. However, the current implementation only supports neighbor tables with full connectivity (no skip values). The support for skip values will be added in next PR. To be noted: - This PR generalizes the handling of tasklets without arguments inside a map scope. The return type for `input_connections` is extended to contain a `TaskletConnection` variant, which is lowered to an empty edge from map entry node to the tasklet node. - The result of `make_const_list` is a scalar value to be broadcasted on a local field. However, in order to keep the lowering simple, this value is represented as a 1D 1-element array (`shape=(1,)`).
…1693) This PR implements the lowering of expressions like: `as_fieldop(deref, domain)(x[0] + (x[1][0] × 2.0 + x[1][1] × 3.0))` where the elements of tuple `x` being accessed are all scalar values, so `x[0] + (x[1][0] × 2.0 + x[1][1] × 3.0)` is a scalar expression. Therefore, this PR allows to lower expressions where the argument to `as_field_op` is a scalar expression rather than a lambda. Note that: - in case of a lambda, the `as_fieldop` function computes the lambda over the field domain; - in case of scalar expression, it broadcasts the scalar value over the field domain. Additionally, some cleanup: - Removed call to `gt_simplify` so that dace backend returns a plain unoptimized SDFG. - Removed `patch_gtir`, since the functionality of this IR pass is provided by similar changes delivered in #1677 and #1689.
DaCe backends are optional backends in the cartesian version of GT4Py. Currently, we silently drop support for DaCe backends if DaCe can't be imported. This can can happen because DaCe isn't available or for a couple other reasons (e.g. a circular import in the import path). With this PR we thus add a warning message allowing developers to easily figure out that/why DaCe backends are disabled.
New temporary extraction pass. Transforms an `itir.Program` like
```
testee(inp, out) {
out @ c⟨ IDimₕ: [0, 1) ⟩
← as_fieldop(deref, c⟨ IDimₕ: [0, 1) ⟩)(as_fieldop(deref, c⟨ IDimₕ: [0, 1) ⟩)(inp));
}
```
into
```
testee(inp, out) {
__tmp_1 = temporary(domain=c⟨ IDimₕ: [0, 1) ⟩, dtype=float64);
__tmp_1 @ c⟨ IDimₕ: [0, 1) ⟩ ← as_fieldop(deref, c⟨ IDimₕ: [0, 1) ⟩)(inp);
out @ c⟨ IDimₕ: [0, 1) ⟩ ← as_fieldop(deref, c⟨ IDimₕ: [0, 1) ⟩)(__tmp_1);
}
```
Note that this pass intentionally unconditionally extracts. In case you
don't want a temporary you should fuse the `as_fieldop` before. As such
the fusion pass (see #1670)
contains the heuristics on what to fuse.
This PR simplifies the representation of program scalar arguments in the SDFG: instead of promoting them to symbols, we represent them as scalar data. Scalar to symbol promotion is a transformation pass available in DaCe that should be applied after lowering to SDFG, eventually. Additional changes: - Fixes a problem in naming of input connectors for scalar expression tasklets: the connector name cannot match the argument name, otherwise we cannot pass the same value to two connectors in expressions like `out = tmp * tmp`. - Only propagate to lambda scope the symbols that are referenced.
#1694) This PR extends the solution for map-reduce provided in #1683 with the support for connectivity tables with skip values. The field definition is extended with a `local_offset` attribute that stores the offset provider used to build the values in the local dimension. In case the local dimension is built by the `neighbors` expression, the `local_offset` corresponds to the offset provider used to access the neighbor dimension. Since this information is carried along the data itself, whenever the data is accessed it is also possible to access the corresponding offset provider and check whether the neighbor index is valid or if there is a skip value. For local dimensions already present in the program argument, this information is retrieved from the field domain (enabled in new test case). The data is accessed in the `map_` and `reduce` expressions. Here it is now possible to check for skip values. Therefore, the main objective of this PR is the lowering of map-reduce with skip values. A secondary objective is to pave the road to simplify the lowering logic, by getting rid of the `reduce_identity` value. The current approach is propagate the `reduce_identity` value while visiting the arguments to `reduce` expressions. By introducing `local_offset`, the argument visitor will return the information needed to implement `reduce` of local values in presence of skip values.
Wrap every broadcast in an `as_fieldop` (not only scalars). The materialization of intermediate broadcasted fields need to be optimized by transformations.
When making a bad call to at a `gtscript.function` we add the function name in the error message for quick reference.
with features from #1648
Casting the condition expression on an inter-state edge to `bool` prevented dead-state elimination in the SDFG in case the expression was specialized to True/False.
This PR implements an alternative design to local expressions on lists containing skip values. The previous design was based on the assumption that such operations only exist in the context of a reduction (which is the most common case, but necessarily the only one) and was carrying the reduce identity value from a reduce node to the visiting context of the arguments and all their child nodes. The reduce identity value was used till fill the skip values when building lists of neighbors or when applying map expressions. Although quite effective for reduce expressions, this design led to a very complicated code. Besides, this design was mixing lowering with optimization, by implicitly optimizing the SDFG. With this PR, we stop carrying the reduce identity to the arguments context. Instead, in presence of skip values we just write a dummy value. When it is time to reduce, we override the dummy value with the reduce identity. In this way, the reduce identity is only used in the context of the reduction expression.
This PR fixes one corner case of nested let-statements, discovered in `test_tuple_unpacking_star_multi` during GTIR integration. Test case added. Additionally, fixed handling of symbol already defined in SDFG for #1695.
Fix lowering of `as_fieldop` with broadcast expression after changes in PR #1701. Additional change: - Add support for GT4Py zero-dimensional fields, equivalent of numpy zero-dimensional arrays. Test case added.
… when buiding the SDFG for DaCe (#1709) When building an SDFG for DaCe it is sometimes necessary to specify the starting point of the graph. In the past, `is_start_state=` was used for this. In never versions of Dace, `is_start_state` has been deprecated and replaced with `is_start_block`. This PR replaces the two occurrences in the `cartesian/` folder and removes warnings generated during test runs. It looks like the `next/` folder has already been cleaned.
The license file configured in `pyproject.toml` was missing the `*.txt` extension, leading to warnings in tests. Parent: GEOS-ESM/SMT-Nebulae#89
This PR updates the GitHub Actions (GHA) workflows to use `actions/checkout@v4` instead of `v3` (or `v2` in the cartesian case) and `actions/setup-python@v5`. Development for `actions/checkout@v3` and `actions/setup-python@v4` stopped ~1 year ago and GH is currently enforcing newer node versions than the one that these actions were designed with, leading to the following warnings  _warnings in next workflows_  _warnings in cartesian workflows_ `deploy_release` action was following `actions/checkout@master`. Was this on purpose? Happy to revert if so. Unless there's a good reason, I suggest to keep all actions pinned at ideally the same major version. `pre-commit/action` was updated to keep its dependencies up to date and avoid transitive warnings similar to the ones above. No changes made to currently disabled workflows, i.e. the ones under `.github/workflows/_disabled/`. Parent: GEOS-ESM/SMT-Nebulae#89
Use the `_map` function in all cases where mapping of with as_fieldop/lifted stencil *and* mapping of lists is required.
updates minimum gridtools_cpp version for #1699
#1703) Lists returned from an `as_fieldop`ed stencil will be turned into a local dimension of the the resulting field. In case of a `make_const_list`, a magic local dimension `_CONST_DIM` is used. This is a hack, but localized to `itir.embedded`. A clean implementation will probably involve to tag the `make_const_list` with the neighborhood it is meant to be used with.
A couple of tests were skipped in the DaCe backend with issue #1084 because they were throwing compiler errors. Current versions of DaCe can handle these cases again. Re-enabling the test cases.
DaCe requires C-compatible strings for the names of data containers, such as arrays and scalars. GT4Py uses a unicode symbols (`ᐞ`) as name separator in the SSA pass, which generates invalid symbols for DaCe. This PR introduces a helper method to find new names for invalid symbols present in the IR.
Adds index builtin for embedded and gtfn backends.
added for documentation purposes
DaCe's `MapReduceFusion` and `MapWCRFusion` are interesting as they move the initialization of the reduction accumulator away, which enables more fusion. However, they currently have a bug, as they assume that the reduction node is in the global scope and not inside a map scope.
The current mapping from GTIR logical operators to python code was inconsistent. The mapping was using bitwise operators instead of logical ones. This still resulted in functionally correct code because the boolean type has an integer representation in python language. This PR introduces the correct mapping, and leaves the dace toolchain and target compiler the possibility to generate optimized code.
…tput of a mapped nested SDFG (#1877) This PR provides a better fix than the one delivered earlier in #1828. It adds a check to detect whether the temporary output data has compile-time or runtime size. In case of runtime size, the transient array on the output connector of a mapped nested SDFG is removed. This is needed in order to avoid dynamic memory allocation inside the cuda kernel that represents a parallel map scope.
<!--
Delete this comment and add a proper description of the changes
contained in this PR. The text here will be used in the commit message
since the approved PRs are always squash-merged. The preferred format
is:
- PR Title: <type>[<scope>]: <one-line-summary>
<type>:
- build: Changes that affect the build system or external dependencies
- ci: Changes to our CI configuration files and scripts
- docs: Documentation only changes
- feat: A new feature
- fix: A bug fix
- perf: A code change that improves performance
- refactor: A code change that neither fixes a bug nor adds a feature
- style: Changes that do not affect the meaning of the code
- test: Adding missing tests or correcting existing tests
<scope>: cartesian | eve | next | storage
# ONLY if changes are limited to a specific subsystem
- PR Description:
Description of the main changes with links to appropriate
issues/documents/references/...
-->
## Description
16-bit integers were missing in the set of data types that return true
for `DataType.isinteger()`. Added missing test coverage.
## Requirements
- [x] All fixes and/or new features come with corresponding tests.
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
N/A
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
…ble k offsets (#1882) <!-- Delete this comment and add a proper description of the changes contained in this PR. The text here will be used in the commit message since the approved PRs are always squash-merged. The preferred format is: - PR Title: <type>[<scope>]: <one-line-summary> <type>: - build: Changes that affect the build system or external dependencies - ci: Changes to our CI configuration files and scripts - docs: Documentation only changes - feat: A new feature - fix: A bug fix - perf: A code change that improves performance - refactor: A code change that neither fixes a bug nor adds a feature - style: Changes that do not affect the meaning of the code - test: Adding missing tests or correcting existing tests <scope>: cartesian | eve | next | storage # ONLY if changes are limited to a specific subsystem - PR Description: Description of the main changes with links to appropriate issues/documents/references/... --> ## Description We figured that DaCe backends are currently missing support for casting in variable k offsets. This PR - adds a codegen test with a cast in a variable k offset - adds a node validator for the DaCe backends complaining about missing for support. - adds an `xfail` test for the node validator This should be fixed down the road. Here's the issue #1881 to keep track. The PR also has two smaller and unrelated commits - 741c448 increases test coverage with another codgen test that has a couple of read after write access patterns which were breaking the "new bridge" (see GEOS-ESM/NDSL#53). - e98ddc5 just forwards all keyword arguments when visiting offsets. I don't think this was a problem until now, but it's best practice to forward everything. ## Requirements - [x] All fixes and/or new features come with corresponding tests. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
The problem was that in the auto optimizer the "set stride correctly" function was called after the "go to GPU transformation". The GPU transformation will call `CopyToMap` transformation, this is needed such that we can set the GPU block size and their order. However, the decision if we call `CopyToMap` also depends on the strides, so we need to set them first. --------- Co-authored-by: edopao <edoardo16@gmail.com>
Be strict about tests marked with xfail. If they pass unexpectedly, fail the testsuite instead of logging an XPASS.
…1883) Changes the `InlineCenterDerefLiftVars` pass to respect evaluation order by lazily evaluating the inlined values. Consider the following case that is common in boundary conditions: ``` let(var, (↑deref)(it2))(if ·on_bc then 0 else ·var) ``` Then var should only be dereferenced in case `·on_bc` evalutes to False. Previously we just evaluated all values unconditionally: ``` let(_icdlv_1, ·it)(if ·on_bc then 0 else _icdlv_1) ``` Now we instead create a 0-ary lambda function for `_icdlv_1` and evaluate it when the value is needed. ``` let(_icdlv_1, λ() → ·it)(if ·on_bc then 0 else _icdlv_1()) ``` Note that as a result we do evaluate the function multiple times. To avoid redundant recompuations usage of the common subexpression elimination is required.
## Description The computation of the minimal k-ranges that happen during the vaildate-args step is allowing for inconsistent computation to happen. This PR stiffens the requirements on fields: - intervals `[START+X ,END+y)` are now also considered: - `interval(3,-1)` requires a minimal size of 5 for the interval to not be empty - `interval(3,None)` now requires a minimal size of 4 - intervals `[START+X, START+Y)` and `[END+X,END+Y)` are not affected. - empty intervals are still allowed to have a 0-domain as to accomodate 2-dimensional fields - `interval(...)` still requires no k-size ## Requirements - [x] All fixes and/or new features come with corresponding tests. --------- Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
The transformation `MultiStateGlobalSelfCopyElimination` is very similar to the `GT4PyGlobalSelfCopyElimination` but they target slightly different cases. The transformation `GT4PyGlobalSelfCopyElimination`, which is already there and was renamed to `SingleStateGlobalSelfCopyElimination`, handles the pattern `(G) -> (T) -> (G)`, i.e. the global data `G` is copied into the transient `T` is then immediately copied back into `G`. Because of ADR-18 we know that this has no effect, because `G` is used as input and output and must therefore be point wise, so `G[i, j]` in the output can only be `G[i, j]` at the beginning. The new transformation `MultiStateGlobalSelfCopyElimination` handles a different case, it looks for patterns `(G) -> (T)` and `(T) -> (G)`, which is essentially the same, but this time the definition of `T` and the write back of `T` into `G` does not need to be in the same state. In the long run, the two transformation should be combined. --------- Co-authored-by: edopao <edoardo16@gmail.com>
This PR introduces a global ordering relation of dimensions, replacing the previous mechanism in promote_dims. The ordering relation is: sorting first by `Dimension.kind` (`HORIZONTAL` < `VERTICAL` < `LOCAL`) and then lexicographically by `Dimension.value`. Reason: An as_fieldop call as emitted by the frontend has no domain, and inferring the type of the domain was not possible, since no global ordering relation of dimensions existed, e.g. for an as_fieldop operating on a `Vertex` and a `K` field it was unclear if the dimensions of the output were `(Vertex, K)` or (`K, Vertex)`, which lead to many negative consequences in other places, that will be tackled in [PR 1853](#1853) and following ones.
## Description Following the [YAGNI principle](https://github.com/GridTools/gt4py/blob/main/CODING_GUIDELINES.md#software-design), stick to the device types that are actually supported in the codebase. We can add support for other devices later. @havogt as discussed the other day, let's have a discussion about whether or not we need all these device types defined. I know that @FlorianDeconinck would like to work towards over protection and throwing errors rather sooner than later for unsupported things (very much in general). This is part of a clean-up series tracked in issue #1880. ## Requirements - [ ] All fixes and/or new features come with corresponding tests. N/A - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
## Description This PR cleans up all the warnings that would be present in the test_code_generation.
## Description In preparation for PR #1894, pull out some refactors and cleanups. Notable in this PR are the changes to `src/gt4py/cartesian/gtc/dace/oir_to_dace.py` - visit `stencil.vertical_loops` directly instead of calling `generic_visit` (simplification since there's nothing else to visit) - rename library nodes from `f"{sdfg_name}_computation_{id(node)}"` to `f"{sdfg_name}_vloop_{counter}_{node.loop_order}_{id(node)}"`. This adds a bit more information (because `sdfg_name` is the same for all library nodes) and thus simplifies debugging workflows. Related issue: GEOS-ESM/NDSL#53 ## Requirements - [x] All fixes and/or new features come with corresponding tests. covered by existing test suite - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
The `domain` ir maker now only accepts Dimensions, not strings. This simplifies the typing in some places and is less error prone, since one can not accidentally create a domain with the wrong kind, e.g. by using `"KDim"`. Co-authored-by: Till Ehrengruber <till.ehrengruber@cscs.ch>
We use `CopyToMap` in CUDA lowering for copies between arrays that do not necessarily have the same strides. This happens in case of `concat_where`, where we copy a source array into a subset of the destination array. For this reason, the option `ignore_strides` must be set to `True` (`False` by default).
This PR adds the `CopyChainRemover` transformation. The introduction of `concat_where` introduced a new pattern, which is essentially a chain of copies. As an example imagine the case that a domain is split into 3 subdomain. The result on the first subdomain is stored in `T1`, the one of the second in `T2` and the one for the third domain in `T3`. `T1` and `T2` are then copied into `T4`, finally `T4` together with `T3` are then copied into `T5`. This transformation will remove `T1`, `T2`, `T3` and `T4` thus the results will be written into `T5` directly. There are some limitation, if we have the pattern `(A1) -> (A2)`, then we eliminate `A1` only if: - `A1` is fully read; this is to avoid some nasty adjustments of map bounds. - There can only be one connection between `A1` and `A2`. The transformation was added twice to the simplify pass, which allows us to mitigate DaCe [issue#1959](spcl/dace#1959). --------- Co-authored-by: edopao <edoardo16@gmail.com>
…#1910) The issue was discovered by Edoardo (@edopao). The underlying problem is that `get_src_subset()` tries to set the direction of the Memlet, for that it looks for the source of the data. However, when we called it then the source is unspecific because the new edge is already there and the old one was not removed. This might implicitly change the direction of the Memlet. I have removed the calls where they are potentially dangerous and added some counter measure to avoid the problem.
This PR slightly changes how the GPU transformation works. It mainly changes how trivial Maps are eliminated. First, there is now a dedicated function, `gt_remove_trivial_gpu_maps()` for this. Furthermore, before it was only using the `TrivialGPUMapElimination` transformation, which tries to promote and fuse the trivial maps, but only with downstream maps. Now there is a second stage that tries to fuse the trivial maps together. This is mostly to reduce the number of kernel calls that we are doing.
Documents describing the contents of a (sub-)folder are commonly named README.md. This is so common that e.g. GitHub will display the contents of a README.md file when looking at the folder.
## Description Documents describing the contents of a (sub-)folder are commonly named `README.md`. This is so common that e.g. GitHub will display the contents of a `README.md` file when looking at the folder, e.g. https://github.com/GridTools/gt4py/tree/main/docs/development/ADRs will show the contents of renamed file (after the proposed name change). ## Requirements - [ ] All fixes and/or new features come with corresponding tests. N/A - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
## Description This PR centralizes the definition of `CUPY_DEVICE_TYPE` in `_core/definitions`. This effectively de-duplicates the definition of `CUPY_DEVICE` in gt4py next and cartesian. Fixes a couple of typos along the way. Related issue: #1880 ## Requirements - [x] All fixes and/or new features come with corresponding tests. Should be covered by existing tests. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
The following new transformations are introduced: - _CANONICALIZE_OP_FUNCALL_SYMREF_LITERAL_: `literal, symref` -> `symref, literal`, prerequisite for FOLD_FUNCALL_LITERAL, FOLD_NEUTRAL_OP `literal, funcall` -> `funcall, literal`, prerequisite for FOLD_FUNCALL_LITERAL, FOLD_NEUTRAL_OP `funcall, op` -> `op, funcall` for s[0] + (s[0] + 1), prerequisite for FOLD_MIN_MAX_PLUS - _CANONICALIZE_MINUS_: `a - b` -> `a + (-b)`, prerequisite for FOLD_MIN_MAX_PLUS - _CANONICALIZE_MIN_MAX_: `maximum(a, maximum(...))` -> `maximum(maximum(...), a)`, prerequisite for - _FOLD_FUNCALL_LITERAL_: `(a + 1) + 1` -> `a + (1 + 1)` - _FOLD_MIN_MAX_: `maximum(maximum(a, 1), a)` -> `maximum(a, 1)` `maximum(maximum(a, 1), 1)` -> `maximum(a, 1)` - FOLD_MIN_MAX_PLUS: `maximum(plus(a, 1), a)` -> `plus(a, 1)` `maximum(plus(a, 1), plus(a, -1))` -> `plus(a, maximum(1, -1))` - FOLD_NEUTRAL_OP: `a + 0` -> `a`, `a * 1` -> `a` In the end, unary minuses are transformed back to `minus`-calls where possible by `UndoCanonicalizeMinus`: `a + (-b)` -> `a - b` , `-a + b` -> `b - a`, `-a + (-b)` -> `-a - b` In addition the pass now transforms until a fixed point is reached (like in the `CollapseTuple` pass) instead of just transforming once. The `FixedPointTransformation` class from [PR 1826](#1826) is used here. Previously, large nested maximum expressions like `maximum(maximum(maximum(maximum(sym, 1),...), 1), maximum(maximum(sym, 1), 1))` caused timeouts in PMAP-GO as the runtime of the domain inference increased significantly due to the large domain expressions that could not be simplified. This replaces [this PR](tehrengruber#4). --------- Co-authored-by: Till Ehrengruber <till.ehrengruber@cscs.ch>
## Description This PR refactors the GT4Py/DaCe bridge to expose control flow elements (`if` statements and `while` loops) to DaCe. Previously, the whole contents of a vertical loop was put in one big Tasklet. With this PR, that Tasklet is broken apart in case control flow is found such that control flow is visible in the SDFG. This allows DaCe to better analyze code and will be crucial in future (within the current milestone) performance optimization work. The main ideas in this PR are the following 1. Introduce `oir.CodeBlock` to recursively break down `oir.HorizontalExecution`s into smaller pieces that are either code flow or evaluated in (smaller) Tasklets. 2. Introduce `dcir.Condition`and `dcir.WhileLoop` to represent if statements and while loops that are translated into SDFG states. We keep the current `dcir.MaskStmt` / `dcir.While` for if statements / while loops inside horizontal regions, which aren't yet exposed to DaCe (see #1900). 3. Add support for `if` statements and `while` loops in the state machine of `sdfg_builder.py` 4. We are breaking up vertical loops inside stencils in multiple Tasklets. It might thus happen that we write a "local" scalar in one Tasklet and read it in another Tasklet (downstream). We thus create output connectors for all scalar writes in a Tasklet and input connectors for all reads (unless previously written in the same Tasklet). 5. Memlets can't be generated per horizontal execution anymore and need to be more fine grained. `TaskletAccessInfoCollector` does this work for us, duplicating some logic in `AccessInfoCollector`. A refactor task has been logged to fix/re-evaluate this later. This PR depends on the following (downstream) DaCe fixes - spcl/dace#1954 - spcl/dace#1955 which have been merged by now. Follow-up issues - unrelated changes have been moved to #1895 - #1896 - #1898 - #1900 Related issue: GEOS-ESM/NDSL#53 ## Requirements - [x] All fixes and/or new features come with corresponding tests. Added new tests and increased coverage of horizontal regions with PRs #1807 and #1851. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. Docs are [in our knowledge base](https://geos-esm.github.io/SMT-Nebulae/technical/backend/dace-bridge/) for now. Will be ported. --------- Co-authored-by: Roman Cattaneo <> Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
…USE_HIP` (#1916) ## Description This PR breaks the dependency cycle between `gt4py.cartesian` and `gt4py.storage`. The last puzzle piece was the distinction between AMD and NVIDIA GPUs. This was controlled by `GT4PY_USE_HIP`. With this PR we re-use `CUPY_DEVICE_TYPE` (for `_core/definitions`) for this purpose. `GT4PY_USE_HIP` could be set by an environment variable or be auto-detected. Auto-detection for `GT4PY_USE_HIP` and `CUPY_DEVICE_TYPE` is the same. And according to @stubbiali, the environment variable was never[^1] needed because auto-detection worked well. We thus don't expect issues removing the environment variable. Related issue: #1880 ## Requirements - [x] All fixes and/or new features come with corresponding tests. Assumed to covered by existing tests. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A [^1]: #1867 (comment) Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.