Skip to content

Conversation

@tkoskela
Copy link
Contributor

@tkoskela tkoskela commented Nov 30, 2023

Closes #280

Uses assumed shape arrays for remote index arrays. These can vary in size and using a fixed size leads to accessing uninitialized memory.

Edit: I wanted to clarify that the inputs you gave in #292 still fail for me on all multiply kernels. So I'm not claiming to fix that issue. But I discovered a different issue and I'm fixing it 😀

Edit2: The reason the inputs from #292 fail for me is that my machine runs out of memory. Based on our last discussion that is to be expected with the size of the problem.

@tkoskela tkoskela requested a review from davidbowler November 30, 2023 10:48
@tkoskela tkoskela changed the base branch from master to develop November 30, 2023 10:50
@tkoskela tkoskela marked this pull request as ready for review November 30, 2023 11:38
@tkoskela
Copy link
Contributor Author

I tested this change on all multiply kernels in https://github.com/OrderN/CONQUEST-release/actions/runs/7045441423 and the tests pass so it's ready

@tkoskela tkoskela requested a review from ilectra November 30, 2023 11:56
Copy link
Contributor

@ilectra ilectra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, but I still don't understand why it worked! See my review comment in #292 (comment)

@tkoskela
Copy link
Contributor Author

tkoskela commented Nov 30, 2023

Actually, with this change we can just pass slices of part_array into the multiply kernel, there is no need for pointers (other than making the subroutine calls look a bit more tidy).

See #294

@tkoskela tkoskela added area: main-source Relating to the src/ directory (main Conquest source code) improves: stability Fix or enhance issues with stability or robustness priority: critical type: bug labels Dec 13, 2023
While part_array may not matter anymore, ndimi and ndimj from
the halo structure will.
@davidbowler
Copy link
Contributor

I have added two small but important changes: we now zero part_array and ndimi and ndimj in the halo structure, all of which are passed to maxval. This should now fix the problem in #280

@davidbowler davidbowler merged commit acc8cc2 into develop Dec 21, 2023
@davidbowler davidbowler deleted the tk-fix-multiply-kernel-memory-leak branch December 21, 2023 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: main-source Relating to the src/ directory (main Conquest source code) improves: stability Fix or enhance issues with stability or robustness priority: critical type: bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak for linear scaling ompGemm_m kernel on atom movement

5 participants