Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make coverity happy for 4.0 #623

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

bosilca
Copy link
Contributor

@bosilca bosilca commented Jan 28, 2024

Coverity identified 692 legitimate issues with the node. Luckily for us most of them are in the generated code and have relatively simple solutions. This PR is a placeholder to address all of the issues that can be addressed in a reasonable time frame.

@bosilca bosilca requested a review from a team as a code owner January 28, 2024 04:34
@bosilca bosilca changed the title Fix/coverity happy for 4.0 Make coverity happy for 4.0 Jan 28, 2024
@abouteiller abouteiller added this to the v4.0 milestone Jan 31, 2024
parsec/interfaces/ptg/ptg-compiler/jdf2c.c Outdated Show resolved Hide resolved
parsec/utils/output.c Show resolved Hide resolved
tests/dsl/dtd/dtd_test_simple_gemm.c Show resolved Hide resolved
parsec/interfaces/ptg/ptg-compiler/jdf2c.c Show resolved Hide resolved
Don't generate the args code if there are no successors
compute the key using 64 bits arithmetic
ensure vpid is within expected range

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This is only used by the DTD interface, but it needs to be completely
initialized in all cases.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@bosilca bosilca marked this pull request as draft February 5, 2024 16:47
@bosilca
Copy link
Contributor Author

bosilca commented Feb 5, 2024

let's keep it as a draft for now, there are some tests failing and I can't yet figure out why.

@abouteiller
Copy link
Contributor

abouteiller commented Feb 7, 2024

Snapshot of investigation:

During parsec_redistribute_dtd_New we create an adt that we cleanup later in that same function.

376         adt = parsec_dtd_create_arena_datatype(parsec, &TARGET);
...
(gdb) n
426         adt = parsec_dtd_get_arena_datatype(parsec, TARGET);
(gdb) p *adt
null

we later dereference null+8 as an input to parsec_datatype_free, which may also explain why we would see these weird MPI_TYPE_FREE errors in some cases recently.

The same code for SOURCE works, and I suspect that this code in add2arena causes the index to be overwritten with 0 when it should be 1.

260     adt->ht_item.next_item = NULL;  /* keep Coverity happy */
261     adt->ht_item.hash64    = 0;  /* keep Coverity happy */
262     adt->ht_item.key       = 0;  /* keep Coverity happy */

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
@abouteiller
Copy link
Contributor

For posterity, a piece of discussion from Slack about the dtd_arena_create and our plan to address the problem

I see what is happening. In general we create the arena with a datatype and we keep it for as long as we need it.
13:56 well … not in DTD. Here we areate an empty arena, I guess just to get an ID that will then be attached to the arena we create with the normal function
13:57 so the “create_arena” cannot reset the ht_key (which is used as the ID) because in DTD this holds important data
13:57 so create is now not really a create, but more like “initialize some fields”
13:58 well, coverity was not complaining about the use of ht_item in DTD, because there it is used. IT was complaining about it in PTG where it is not used at all
14:00we need to go back to a sane use of the arena. We create and that function will set all fields, and then in DTD we will register it with the taskpool, and then it will get its ID (edited)
14:01 with this approach we will keep most of the infrastructure we have today in place. Which means we will still use the hash-table for the DTD arenas for something with an id that monotonically increases …

@abouteiller abouteiller self-requested a review February 22, 2024 14:04
abouteiller added a commit to abouteiller/dplasma that referenced this pull request Apr 22, 2024
@abouteiller abouteiller linked an issue Apr 24, 2024 that may be closed by this pull request
@abouteiller
Copy link
Contributor

CI failure is due to #641 this is ready

parsec/data.c Outdated Show resolved Hide resolved
parsec/data_dist/matrix/apply_wrapper.c Outdated Show resolved Hide resolved
@@ -351,9 +349,11 @@ char *parsec_argv_join_range(char **argv, size_t start, size_t end, int delimite
for (p = &argv[start], i=start; *p && i < end; ++p, ++i) {
str_len += strlen(*p) + 1;
}
if (0 == str_len) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why returning a copy of an empty string instead of NULL ? It is not like this function is heavily used, but why do we need to change the logic ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you did that as part of the first commit addressing coverity errors. I suppose we free the value later on without checking for NULL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why I did that, it turns out that function is never used.

@@ -252,8 +242,6 @@ int main(int argc, char *argv[])

ctp->arenas_datatypes[PARSEC_remote_no_re_reshape_DEFAULT_ADT_IDX] = adt_default;
ctp->arenas_datatypes[PARSEC_remote_no_re_reshape_LOWER_TILE_ADT_IDX] = adt_lower;
PARSEC_OBJ_RETAIN(adt_default.arena);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this means we are not tracking the refcount on the arena anymore ? How does it get deallocated ?

Copy link
Contributor

@abouteiller abouteiller Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still track the arena counts, but the adt is not destructed implicitly by the generated code, thus there is no need for the extra retain (which was to prevent said deallocation by the generated code), this is documented in both jdf2c (generate_constructors) and data.c (class declaration),

the arena gets deallocated when we destruct the adt (destructor called when the wrapper or main calls the obj_release(adt) or obj_destruct(adt), or the dtd helper function that also calls obj_release)

Copy link
Contributor

@abouteiller abouteiller May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my own review of these changes I found the following issue that contradict some aspect of my previous message:

  1. What I said above is correct for all codes that use parsec_matrix_adt_define_xyz, and codes that are using the DTD attach type, they have all the correct hooking points ready to call the symetric cleanup (and have to, because they have to free the actual MPI datatype they created).
  2. This is not the case for the examples/Ex0?.jdf, they use parsec_arena_datatype_set_type and use a predefined datatype to populate the static PTG adt arrays. Without the auto OBJ_DESTRUCT in the generated code, these will need to call an explicit cleanup function to be leak free. These should be straightforward to adapt.
  3. This is also incorrect for a number of PTG testers (e.g., touch.jdf, a2a.jdf, BT_reduction_wrapper, merge_sort_wrapper, rtt_wrapper, collections/reduce.c, all the ptgpp testers,

We had a discussion about this on Friday during the team meeting
Q: because we don't auto cleanup the examples are now incorrect (they leak the arenas), should I (Aurelien) revert the change that removed the automatic OBJ_DESTRUCT on the static ADTs or do we prefer adapting the offending testers.
A: (Thomas): I like that the user has to make symmetrical calls, so users will need to call parsec_arena_datatype_unset_type (new function wrapping around OBJ_DESTRUCT, essentially)
A: (Aurelien): a subset of offenders in group 3. are missing the corresponding _finalize call to match their xyz_initialize and directly call taskpool_free, these require a larger change to adapt.

Documenting the discussion so that @bosilca can react.

abouteiller added a commit to abouteiller/dplasma that referenced this pull request Oct 24, 2024
@abouteiller abouteiller modified the milestones: v4.0, v5.0 Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename parsec_add2arena
2 participants