Add basic smoke tests for topology branch #897

SylviaWhittle · 2024-09-11T12:31:56Z

This PR adds basic smoke tests for disordered tracing, node stats, ordered tracing and splining.

SylviaWhittle · 2024-09-11T12:43:28Z

Pre-commit problems list (shown below)

I believe that the files and code listed are not code that I have touched. Correct me if I've not noticed something 😄

ns-rse · 2024-09-11T13:32:12Z

I believe that the files and code listed are not code that I have touched.

Nope, none of those are in the files you've touched.

We should all be using the pre-commit configuration that is part of the distribution so that the CI checks pass (yes the target branch here isn't main but eventually it will be being merged into main and will have to pass all these checks, so its better practice to use these checks and get things right in the first instance).

Within the TopoStats directory run the following to install...

pre-commit install

It will then highlight all these problems before changes can be pushed.

ns-rse

Bunch of comments in-line, great work writing all these tests @SylviaWhittle 👍

The tests/resources/ is starting to look somewhat cluttered. I think many of the .csv files could be removed if we used pytest-regtest (some comments in-line on this).

That still leaves a lot of files though, many of the filenames carry common suffixes and its clear that many of the objects (.pkl and .npy) are related. I can think of two options to improve organisation...

Bundle similar objects into a dictionary with the keys formed from the component that distinguishes them and save as a single pickle.

example_rep_int_all_images.pkl
example_rep_int_all_images_nodestats.pkl
example_rep_int_disordered_crop_data.pkl
example_rep_int_disordered_tracing_stats.csv
example_rep_int_grainstats_additions_df.csv
example_rep_int_grainstats_additions_nodestats.csv
example_rep_int_labelled_grain_mask_thresholded.npy
example_rep_int_nodestats_branch_images.pkl
example_rep_int_nodestats_data.pkl
example_rep_int_ordered_tracing_data.pkl
example_rep_int_ordered_tracing_full_images.pkl
example_rep_int_ordered_tracing_grainstats_additions.csv
example_rep_int_splining_data.pkl
example_rep_int_splining_grainstats_additions.csv
example_rep_int_splining_molstats.csv

Would go into a dictionary with keys of...

{
"all_images": <object>
"all_images_nodestats": <object>
"disordered_crop_data": <object>
"disordered_tracing_stats": <object>
"grainstats_additions_df": <object>
"grainstats_additions_nodestats": <object>
"labelled_grain_mask_thresholded": <object>
"nodestats_branch_images": <object>
"nodestats_data": <object>
"ordered_tracing_data": <object>
"ordered_tracing_full_images": <object>
"ordered_tracing_grainstats_additions": <object>
"splining_data": <object>
"splining_grainstats_additions": <object>
"splining_molstats": <object>
}

...and that could be saved as tests/resources/example_rep_int.pkl.

Alternatively create a nested directory structure under tests/resources reflecting the common prefixes...

tests/resources/node/
tests/resources/catenanes/
tests/resources/example_rep_int/

...drop the prefixes from the filenames.

tests/tracing/conftest.py

tests/tracing/test_nodestats.py

ns-rse · 2024-09-11T13:50:25Z

tests/tracing/test_nodestats.py

-    np.testing.assert_equal(node_dict_result, nodestats_catenane_node_dict)
-    np.testing.assert_equal(image_dict_result, nodestats_catenane_image_dict)
-    np.testing.assert_array_equal(nodestats_catenane.all_connected_nodes, nodestats_catenane_all_connected_nodes)
+    # Debugging


This is presumably to update the test files when the underlying code changes?

The syrupy package which is compatible with pytest might be an alternative to this. Not used it yet, only became aware of it ar RSECon2024, but its similar to pytest-regtest I think.

tests/tracing/test_nodestats.py

ns-rse · 2024-09-11T14:10:09Z

tests/tracing/test_nodestats.py

+    # )
+
+    # Load the nodestats catenane node dict from pickle
+    with Path(RESOURCES / "nodestats_analyse_nodes_catenane_node_dict.pkl").open("rb") as f:


If the arrays and dictionaries aren't too large I'd be inclined to use the pytest-regtest approach to comparing these.

ns-rse · 2024-09-11T14:33:35Z

topostats/io.py

@@ -34,6 +34,54 @@
 # pylint: disable=too-many-lines


+# Sylvia: Ruff says too complex but I think breaking this out would be more complex.
+def dict_almost_equal(dict1: dict, dict2: dict, abs_tol: float = 1e-9):  # noqa: C901


Two things...

Could we use DeepDiff? It seems a very general solution. dict_almost_equal() as it stands only compares floats via np.allclose().

I see that this is used in a some of the tests introduced but we should probably have a tests/test_io.py::test_dict_almost_equal() to check that it behaves as expected.

1 - Upon a quick glance, it DeepDiff seems like it might do what is needed

2 - When I wrote dict_almost_equal, I did add a test in test_io.py, line 172 in this branch.

This function was already in the codebase, I just had to move it since I could not import it in the tests/tracing/test_*.py files since the function was in tests/test_io.py and pylint would not let me do from ..test_io.py import dict_almost_equal and python wouldn't recognise from tests/test_io.py import dict_almost_equal so I have moved it to io.py to allow importing in all tests.

Self referential! Test code testing itself!

topostats/tracing/splining.py

ns-rse · 2024-09-11T14:54:23Z

tests/tracing/test_splining.py

+        spline_degree=3,
+    )
+
+    # # Debugging


Can the Debugging section here be removed? If not would it be useful to have this as a function that we can use to plot co-ordinates?

I'd be inclined to keep it as-is if possible because:

1 - I really think it's useful since we are going to be updating this test a lot over the next few weeks probably. I'd only end up re-writing it each time I update the test, slowing down test updates.

2 - I can't think of anywhere else in our codebase that we would need to debug viewing the splines like this and it would take some time to make a function out of it and test it.

That being said it is a lot of code mess and I very much empathise with wanting to get rid of commented-out code since it does tend to linger infinitely

Perhaps a middle ground of making it a function within the test?

It would then be possible to have a single line which calls the function that can be commented/uncomment the call to the function for debugging and the test file doesn't have a large amount of lingering code.

I'd did similar with a plotting function I'd written for the topostats.measure.feret.plot_feret()](

TopoStats/topostats/measure/feret.py

Line 430 in 70d02bd

def plot_feret( # pylint: disable=too-many-arguments,too-many-locals # noqa: C901

) which I used for debugging. I already had images that I'd looked at during the debugging process so tests were relatively straight-forward to add.

ns-rse · 2024-09-11T15:17:42Z

tests/resources/example_catenanes_disordered_tracing_stats.csv

I think perhaps we should use pytest-regtest to compare the statistics that are produced rather than having to have code to update the CSV files when methods change and then read them in to pd.DataFrame() and pd.testing.assert_frame_equal().

We use this approach for other CSVs that the pipeline produces so should be ok here unless there is a specific reason for this approach?

Very happy to have pushback on this, but my thinking was trying to keep it all the same style of test with assertions and variables loaded explicitly since when the test fails, I'm finding it easier to debug using a debugger & IDE tools? I tried debugging tests that use pytest-regtest and it's rather difficult with large objects. Perhaps I went about it wrong?

I think we are anticipating quite a bit of iteration and so making it as smooth as possible to view what exactly changed and if it's valid and then to easily update the values is useful.

IIRC pytest-regtest has the excellent override method of pytest --regtest-reset which makes updating tests a dream but I always have to add code and dig to see if the change is legitimate. Do you have a good way of inspecting changes in regtests?

When I've had to update tests from pytest-regtest it is in essence a diff which is what many of the assert show (in one form or another, whether that is default Python / pytest / np.testing.assert* / pd.testing.assert*).

Diffs can be tricky to use and understand at times particularly when its a few numbers in the middle of a large CSV and perhaps Numpy/Pandas help with this but there are some alternatives and features that make it easier.

I find the specific changes within a line that Git can be configured to show really useful, one tool for this is delta but personally I use difftastic as it also understands structure of different programming languages and Git can be easily configured to work with it.

The --regtest-reset is in my view a lot quicker than having to uncomment a bunch of lines to write a CSV or pkl out.

Perhaps we should look at how syrupy compares to pytest-regtests and the manual approach?

Broadly I think it is useful to be consistent across a project though, pick one approach and stick with it, it reduces cognitive overhead and makes it easier for others to get to grips with how the repository works (this is true of more than just tests, e.g. always using pathlib rather than mixing it up with os modules).

SylviaWhittle · 2024-09-11T15:50:16Z

I believe that the files and code listed are not code that I have touched.

Nope, none of those are in the files you've touched.

We should all be using the pre-commit configuration that is part of the distribution so that the CI checks pass (yes the target branch here isn't main but eventually it will be being merged into main and will have to pass all these checks, so its better practice to use these checks and get things right in the first instance).

Within the TopoStats directory run the following to install...
pre-commit install
It will then highlight all these problems before changes can be pushed.

I do have pre-commit, I was going to tidy it up in a follow-up PR that was just focused on that, keeping this PR self-contained but am happy to fix it in this PR 👍

…ct_nodes_nearest

…ordered

… /tests/tracing/*.py

…ibility

…ata types

…ld test output files

SylviaWhittle · 2024-09-19T14:53:42Z

DeepDiff doesn't seem to be a viable alternative currently. It cannot handle np.nan values properly.

SylviaWhittle requested a review from MaxGamill-Sheffield September 11, 2024 12:43

ns-rse requested changes Sep 11, 2024

View reviewed changes

This was referenced Sep 11, 2024

Adds topological features into better tracing #898

Open

Move description of nodestats dictionary to documentation #901

Open

SylviaWhittle force-pushed the SylviaWhittle/topology_tests branch 2 times, most recently from 5d542a5 to 38b6829 Compare September 18, 2024 13:18

SylviaWhittle added 21 commits September 18, 2024 15:03

Start to test disordered_trace_grain, starting with height biasing

5a3a280

test_disordered_trace_grain: Add test case for no pruning

0f165bd

test_disordered_trace_grain: Add test case for pruning small tail

b5f2eec

test_disordered_trace_grain: Add test case for re-adding holes

7adb114

[WIP] Unpack and create test catenane image

0f995e3

Replace example catenane with multiple catenanes

f95c16a

Add grain mask for spliced catenane test image

51fc9a6

Example catenanes: Replace grain mask with labelled grain mask

5ae7882

Add test_trace_image_disordered

4ea8538

Rename test files to not have 'result' in the filenames

75a2c1b

Fix dict_almost_equal not being able to handle np.nan equal

1b8ae4e

Fix test_analyse_node_branches by adding singlet_branch_vectors

e82c449

Fix pair_odd_branches not present in class constructor for test_conne…

a7d70ca

…ct_nodes_nearest

Add test_nodestats_image

ab7e404

Fix nodestats_catenane fixture pair_odd_branches argument not present

e3eda2a

Fix test_analyse_nodes

a65bbc4

Parametrize test_nodestats_image

a3d51b8

Parametrize test_trace_image_disordered

8d18d81

Include all test resources by default except under processed

cdd6830

Exclude __pycache__ again

d0bb8ea

Add replication intermediate parameterisation to test_trace_image_dis…

c801d21

…ordered

SylviaWhittle and others added 25 commits September 18, 2024 15:03

Add test_splining

692f4fe

Add test_splining_image

a58bd44

test_splining_image > add replication intermediate parameterization

b4fe96d

splining_image: Fix documentation not listing one of the outputs

44b3581

Move dict_almost_equal to io.py due to not being able to import it in…

2599995

… /tests/tracing/*.py

Add missing test data files

a25f7b4

Add replication intermediate example image

32c5fc3

Remove tracing test image creation notebook

671913e

Linting documentation

a168ffc

tests: Removes .pkl and .npy resources no longer used

a92a30e

Fix find_connections returning list rather than string for csv compat…

abf619d

…ibility

Don't compress images using 16-bit encoding as it messes with tests d…

5b2dbf9

…ata types

Delete the test image creation notebook again but keep it in git history

27c4401

Use TopoStats' grains finding for example cat and RI segmentation

346ed35

Fix ordered_tracing documentation

c8e9f02

Update test catenane and RI with correct config provided by Max

0c4710d

Parametrize all variables for test_trace_image_disordered

3325dff

Update test ordered tracing to accept the molstats df

00b5ffe

Fix wrong values used for generating images and masks

ff8c019

Create parameterisations for pairing odd branches or not and remove o…

42db193

…ld test output files

Move ordered tracing test files into tests/ordered_tracing/

6a17edb

Tidy up nomenclature and condense debug code

3d05b5d

Move test files to relevant directory under tests/resources/tracing/

33031cb

Remove old test files and clean up

13f4a03

Update ordered tracing and splining tests after Max's fix

58a4b99

SylviaWhittle force-pushed the SylviaWhittle/topology_tests branch from f86f074 to 58a4b99 Compare September 18, 2024 14:03

SylviaWhittle added 3 commits September 18, 2024 15:14

Update tests again for latest tracing hotfix

c55410b

Remove singularly used fixtures

6ee899b

Remove comments that describe object structure

9d9075a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic smoke tests for topology branch #897

Add basic smoke tests for topology branch #897

SylviaWhittle commented Sep 11, 2024

SylviaWhittle commented Sep 11, 2024

ns-rse commented Sep 11, 2024

ns-rse left a comment

ns-rse Sep 11, 2024

ns-rse Sep 11, 2024

ns-rse Sep 11, 2024

SylviaWhittle Sep 11, 2024

SylviaWhittle Sep 11, 2024

ns-rse Sep 12, 2024

ns-rse Sep 11, 2024

SylviaWhittle Sep 11, 2024

ns-rse Sep 12, 2024

ns-rse Sep 11, 2024

SylviaWhittle Sep 11, 2024

ns-rse Sep 12, 2024

SylviaWhittle commented Sep 11, 2024

SylviaWhittle commented Sep 19, 2024

Add basic smoke tests for topology branch #897

Are you sure you want to change the base?

Add basic smoke tests for topology branch #897

Conversation

SylviaWhittle commented Sep 11, 2024

SylviaWhittle commented Sep 11, 2024

ns-rse commented Sep 11, 2024

ns-rse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SylviaWhittle commented Sep 11, 2024

SylviaWhittle commented Sep 19, 2024