Run Unit Tests for Different Parameters #182

michaelmckinsey1 · 2024-06-24T19:11:41Z

This PR proposes to expand running the unit tests for (1) Thickets with intersection trees (default is union) and (2) Thickets without filling the performance data (default is filling the performance data). Therefore, testing for each combination would run all the unit tests for 4 types of Thickets:

Union+Filled
Union+NonFilled
Intersection+Filled
Intersection+NonFilled

Example of using parametrized fixtures from pytest docs

When tests fail, we can see which configuration of parameters failed:

FAILED thicket/tests/test_tree.py::test_indices[Intersection-FillPerfdata] - Key...
FAILED thicket/tests/test_tree.py::test_indices[Union-NoFillPerfdata] - KeyError: ...

We can run single tests for a single set of parameters by specifying them to pytest

$ python -m pytest thicket/tests/test_tree.py::test_indices[Union-NoFillPerfdata]

============================= test session starts =============================
platform win32 -- Python 3.11.7, pytest-8.0.0, pluggy-1.4.0
rootdir: C:\Users\Micha\Documents\Github\thicket
configfile: pytest.ini
collected 1 item

thicket\tests\test_tree.py F                                             [100%]

michaelmckinsey1 · 2024-07-09T21:46:06Z

Depends on ~~#193~~

michaelmckinsey1 · 2024-07-09T22:05:46Z

depends on ~~#181~~

ilumsden

This looks good, but there are a couple of things I'd like you to clarify @michaelmckinsey1.

thicket/tests/test_concat_thickets.py

thicket/tests/test_intersection.py

michaelmckinsey1 · 2024-07-17T18:49:42Z

@ilumsden Tests that use the thicket_axis_columns fixture are already parametrized, since thicket_axis_columns is parametrized itself. Unlike the other fixtures, which are lists of files, thicket_axis_columns and stats_thicket_axis_columns create the Thickets in the fixtures.

So running python -m pytest thicket/tests/test_concat_thickets.py::test_filter_stats_concat_thickets_columns results in:

0.44s call     thicket/tests/test_concat_thickets.py::test_filter_stats_concat_thickets_columns[Intersection-FillPerfdata]
0.41s call     thicket/tests/test_concat_thickets.py::test_filter_stats_concat_thickets_columns[Intersection-NoFillPerfdata]
0.33s call     thicket/tests/test_concat_thickets.py::test_filter_stats_concat_thickets_columns[Union-FillPerfdata]
0.33s call     thicket/tests/test_concat_thickets.py::test_filter_stats_concat_thickets_columns[Union-NoFillPerfdata]

michaelmckinsey1 · 2024-07-17T20:08:36Z

thicket/thicket.py

+        if isinstance(self.dataframe.columns, pd.MultiIndex):
+            rows = []
+            nodes = self.dataframe.index.get_level_values("node").unique()
+            extend_len = len(self.dataframe)//len(nodes)
+            for node in nodes:
+                df = self.dataframe.loc[node]
+                keep = all([df[header].notna() # We are checking for NaNs
+                            .all() # For all values in a row
+                            .all() # For all rows in the slice
+                            for header in self.dataframe.columns.get_level_values(0)] # For each column header df[header] == slice
+                        )
+                rows.extend([keep]*extend_len) # Extend by extend_len for MultiIndex
+            tkc = self.deepcopy()
+            tkc.dataframe = tkc.dataframe[rows]
+            tkc = tkc.squash()
+            return tkc


@ilumsden Do you think this can be a query? This code works but I was not successful at making a query.

This is performing an intersection when the columns are MultiIndex, which can be identified when all of the metric values for a given header are NA like the three rows seen below under the l header.

I don't recall if there is support for MultiIndex columns in the query language.

There is support for MultiIndex columns in the QL nowadays, but I would not recommend doing this with the QL unless it's really needed. If we care about performance, the QL shouldn't be used for everything. At the end of the day, the problem that the QL is solving (a special variant of subgraph isomorphism) is an NP-Hard problem. Although I can make the QL faster, it will never be super fast due to that fact.

The QL should be used when you need to select or filter a Thicket while considering the node relationships encoded in the graph. If you don't need to take those relationships into account, then the QL could introduce unnecessary bottlenecks or slowdowns into your code. A good example of when you should use the QL is what you do in NCUReader @michaelmckinsey1. A good example of when you should not use the QL is when you just want to apply a filter to each node independently.

Besides all that stuff related to the QL, I'd also recommend you don't do the mentioned change in this PR. That's a different change to the code, so it would belong in a different PR if you were to do it.

Ignore my point about not changing this in this PR. I hadn't gone over your changes before I said that. My comment about not using the QL willy-nilly stands though.

ilumsden

Got a few more changes that I'd like you to make. The most notable is that I'd encourage you to remove the use of the QL in Thicket.intersection since you are already editing it.

thicket/tests/conftest.py

ilumsden · 2024-07-22T02:30:02Z

thicket/thicket.py

+                )
+            else:
+                # If perfdata not padded
+                query = Query().match(".", lambda df: len(df) == len(self.profile))


I didn't see this before. I get why you were asking about using the QL with MultiIndex now. I'd still recommend not using the QL here. At the end of the day, it's almost always better to use P code than NP Hard code.

I suppose if there were a case where the parent was NaN, e.g.:

1.781 RAJAPerf ├─ nan Algorithm │ ├─ 0.002 Algorithm_MEMCPY │ ├─ 0.002 Algorithm_MEMSET │ ├─ 0.003 Algorithm_REDUCE_SUM

becomes

1.781 RAJAPerf ├─ 0.002 Algorithm_MEMCPY ├─ 0.002 Algorithm_MEMSET ├─ 0.003 Algorithm_REDUCE_SUM

when it should be

1.781 RAJAPerf

I can't think of a situation where this would happen, since in reality if the parent node was filled with NaNs because it did not exist, then the children shouldn't have existed either. The children can't exist without the parent, so they would also have NaNs. However, the QL would properly remove the children in this example.

I prefer using the QL where I can, because it is better tested and more readable than writing new code like in this PR. I suppose then you would think that we would benefit from having a performance data filter function, and not from the QL? And a performance data filter would replace the code in intersection, including the existing queries?

Going back and trying to knock out some of these reviews now.

A performance data filter function (similar to Hatchet's filter function) would be a great thing to have. At the end of the day, if you are doing filtering that doesn't care about the structure of the graph, you really should be using Pandas operations over the QL, and that's what this type of function would provide. I have some thoughts on how to improve the QL's performance, but it should still never be faster than Pandas on single node filtering (unless Pandas is much worse in performance than I think).

That being said I do get your point about using the QL in this case. I think a good thing to do here would be to continue using the QL in intersection (for now at least, we may revisit later) and then create a performance data filter function.

ilumsden

LGTM, but see my last comment regarding performance data filtering and the QL.

@pearce8 this PR is ready for your review.

slabasan · 2024-10-25T13:33:59Z

@michaelmckinsey1 Can you please resolve conflicts on this PR?

This was referenced Jun 25, 2024

Add Nodes in Slice Before Printing Tree #181

Merged

Intersection Broken When _fill_perfdata=False #186

Closed

michaelmckinsey1 force-pushed the run-allparams branch from f87ded1 to 35508f8 Compare July 9, 2024 22:04

michaelmckinsey1 force-pushed the run-allparams branch 3 times, most recently from 7da35c1 to 90b2907 Compare July 9, 2024 22:33

michaelmckinsey1 self-assigned this Jul 9, 2024

michaelmckinsey1 force-pushed the run-allparams branch from 90b2907 to db2fd67 Compare July 11, 2024 18:28

michaelmckinsey1 marked this pull request as ready for review July 11, 2024 19:08

michaelmckinsey1 requested a review from ilumsden July 11, 2024 19:09

michaelmckinsey1 added status-ready-for-review This PR is ready to be reviewed by assigned reviewers and removed status-work-in-progress PR is currently being worked on labels Jul 11, 2024

ilumsden reviewed Jul 17, 2024

View reviewed changes

thicket/tests/test_concat_thickets.py Show resolved Hide resolved

thicket/tests/test_concat_thickets.py Show resolved Hide resolved

thicket/tests/test_intersection.py Show resolved Hide resolved

michaelmckinsey1 commented Jul 17, 2024

View reviewed changes

ilumsden requested changes Jul 22, 2024

View reviewed changes

michaelmckinsey1 requested a review from ilumsden July 22, 2024 21:53

ilumsden approved these changes Aug 12, 2024

View reviewed changes

ilumsden requested a review from pearce8 August 12, 2024 14:05

ilumsden added status-approved No more revisions are required on this PR and it is ready for merge and removed status-ready-for-review This PR is ready to be reviewed by assigned reviewers labels Aug 12, 2024

michaelmckinsey1 added this to the 2024.2.0 milestone Sep 4, 2024

michaelmckinsey1 mentioned this pull request Oct 25, 2024

Running Thicket.concat_thickets with calltree="intersection" produces empty query #199

Open

michaelmckinsey1 and others added 16 commits October 25, 2024 13:47

Add fixtures

1ea56df

Add fixtures to some unit tests

1f1bdb7

Add ids to parameter values

8bd85bc

Add parametrization to remaining unit tests

7204bcd

Fix unit test

b9fa9d4

black

1fd3e93

Fix logic for flake

0d3a8a0

Fix unit tests

05747f8

Update

d017604

run intersection on column thicket

f15ecfc

Fix docstring

b5342b6

Add intersection test

8e392f8

Fix intersection for multiindex columns

ca6f9da

Fix bug

1cbaeb9

black

5afcae9

Change syntax

8905b43

michaelmckinsey1 force-pushed the run-allparams branch from 6df4876 to 8905b43 Compare October 25, 2024 18:51

black

4ea1fd1

slabasan approved these changes Oct 28, 2024

View reviewed changes

slabasan merged commit c1cd36e into LLNL:develop Oct 28, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Unit Tests for Different Parameters #182

Run Unit Tests for Different Parameters #182

michaelmckinsey1 commented Jun 24, 2024 •

edited

Loading

michaelmckinsey1 commented Jul 9, 2024 •

edited

Loading

michaelmckinsey1 commented Jul 9, 2024 •

edited

Loading

ilumsden left a comment

michaelmckinsey1 commented Jul 17, 2024

michaelmckinsey1 Jul 17, 2024

ilumsden Jul 22, 2024

ilumsden Jul 22, 2024

ilumsden left a comment

ilumsden Jul 22, 2024

michaelmckinsey1 Jul 22, 2024 •

edited

Loading

michaelmckinsey1 Jul 22, 2024 •

edited

Loading

ilumsden Aug 12, 2024

ilumsden left a comment

slabasan commented Oct 25, 2024

Run Unit Tests for Different Parameters #182

Run Unit Tests for Different Parameters #182

Conversation

michaelmckinsey1 commented Jun 24, 2024 • edited Loading

michaelmckinsey1 commented Jul 9, 2024 • edited Loading

michaelmckinsey1 commented Jul 9, 2024 • edited Loading

ilumsden left a comment

Choose a reason for hiding this comment

michaelmckinsey1 commented Jul 17, 2024

michaelmckinsey1 Jul 17, 2024

Choose a reason for hiding this comment

ilumsden Jul 22, 2024

Choose a reason for hiding this comment

ilumsden Jul 22, 2024

Choose a reason for hiding this comment

ilumsden left a comment

Choose a reason for hiding this comment

ilumsden Jul 22, 2024

Choose a reason for hiding this comment

michaelmckinsey1 Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

michaelmckinsey1 Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

ilumsden Aug 12, 2024

Choose a reason for hiding this comment

ilumsden left a comment

Choose a reason for hiding this comment

slabasan commented Oct 25, 2024

michaelmckinsey1 commented Jun 24, 2024 •

edited

Loading

michaelmckinsey1 commented Jul 9, 2024 •

edited

Loading

michaelmckinsey1 commented Jul 9, 2024 •

edited

Loading

michaelmckinsey1 Jul 22, 2024 •

edited

Loading

michaelmckinsey1 Jul 22, 2024 •

edited

Loading