Weaken tests by using random sampling #781

pinkwah · 2020-12-18T15:00:31Z

pinkwah
Dec 18, 2020
Maintainer

@eivindjahren is refactoring some tests (:tada:). Some of the unit tests test some large matrices for some property, where they iterate over every cell. This is pretty hefty in terms of time required. Instead, we can utilise Catch 2's sampled variables, and essentially pick cells uniformly at random.

However, my concern is that this will weaken the tests. In particular, for matrices, the corners and edges are far less likely to be randomly selected (assuming uniform distribution). Thus, if there is an obscure error in just the corners, then the tests will give us false positives until months down the line suddenly it breaks.

I think @eivindjahren can elaborate on the particularities. Would also be nice to have @markusdregi 's opinion on this.

eivindjahren · 2020-12-18T15:39:21Z

eivindjahren
Dec 18, 2020
Maintainer

I think input from @jokva would be nice as well. So the general idea would follow this schema:

TEST_CASE("Test ", "[unittest]") {
    GIVEN("A grid") {
       ecl_grid_type *ecl_grid = ecl_grid_alloc_rectangular(5, 5, 5, 1, 1, 1, nullptr);

       AND_GIVEN("Any cell in that grid") {
            auto i = GENERATE(take(3, random(0, 5)));
            auto j = GENERATE(take(3, random(0, 5)));
            auto k = GENERATE(take(3, random(0, 5)));

            THEN("some property") {
                REQUIRE(some_cell_property(ecl_grid, i, j, k));
            }
       }
       elc_grid_free(ecl_grid);
    }
}

This tests a randomly chosen set of points as apposed to the following, which will test the cell property on every single cell every time:

TEST_CASE_METHOD(Tmpdir, "Test ", "[unittest]") {
    GIVEN("A grid") {
      ecl_grid_type *ecl_grid = ecl_grid_alloc_rectangular(5, 5, 5, 1, 1, 1, nullptr);
 
      THEN("some property") {
          for(int i = 0; i < 5; i++){
            for(int j = 0; j < 5; j++){
              for(int k = 0;k < 5; k++){
                REQUIRE(some_cell_property(ecl_grid, i, j, k));
              }
            }
          }
      }
      elc_grid_free(ecl_grid);
    }
}

The main advantage is the ability to cover more cases by diversifying the kind of input. For instance we might
extend the above test to:

TEST_CASE("Test ", "[unittest]") {
   GIVEN("Dimensions of a grid") {
      auto nx = GENERATE(take(3, random(3,10));
      auto ny = GENERATE(take(3, random(3,10));
      auto nz = GENERATE(take(3, random(3,10));

      AND_GIVEN("A grid of those dimensions") {
        ecl_grid_type *ecl_grid = ecl_grid_alloc_rectangular(nx, ny, nz, 1, 1, 1, nullptr);

        AND_GIVEN("Any cell in that grid") {
          auto i = GENERATE(take(3, random(0, nx)));
          auto j = GENERATE(take(3, random(0, ny)));
          auto k = GENERATE(take(3, random(0, nz)));

          THEN("some property") {
            REQUIRE(some_cell_property(ecl_grid, i, j, k));
          }
      }
      elc_grid_free(ecl_grid);
    }
}

were you to use for loops for the above test, the run time would likely be too long. But by only choosing a subset
of cases we can still cover more ground.

Sidenote: An important note here is that catch2 tests are evaluated like this:

TEST_CASE(...) {
    std::cout << 'A';
    SECTION(...) {
        std::cout << 'A';
    }
    SECTION(...) {
        std::cout << 'B';
    }
    SECTION(...) {
        std::cout << 'C';
    }
    std::cout << '\n';
}

will print out

AA
AB
AC

Turns out that this is very convenient for writing tests.

0 replies

eivindjahren · 2020-12-18T18:58:54Z

eivindjahren
Dec 18, 2020
Maintainer

Now if we want we could make sure that the generator contains some edge cases, as in the edges of the matrix:

  i = GENERATE_COPY(0, nx-1 take(3, random(1, nx-1));
  j = GENERATE_COPY(0, ny-1, take(3, random(1, ny-1));
  k = GENERATE_COPY(0, nz-1, take(3, random(1, nz-1));

Here, i,j,k will be guaranteed to b all four corners of the grid.

0 replies

pinkwah · 2020-12-18T20:53:03Z

pinkwah
Dec 18, 2020
Maintainer Author

According to https://github.com/catchorg/Catch2/blob/devel/docs/generators.md#data-generators invoking GENERATE multiple times will yield a Cartesian product of all choices. Thus you can replace the triple for-loop with:

auto i = GENERATE(range(0, nx));
auto j = GENERATE(range(0, ny));
auto k = GENERATE(range(0, nz));

I'm not sure if this makes the tests be parameterised such that it's necessary to recreate the matrix every time and other such overhead.

I would perhaps propose a set of slow tests that are always run on CI, backed by "good enough" randomly sampled tests that the developer herself can develop with. At the very least I'd advocate for N samples to be at least 1000. In cases like this, I think it'd be faster to just test the entire 10000 element matrix rather than doing uniform randomness math 1000 times.

1 reply

eivindjahren Dec 19, 2020
Maintainer

That does indeed generate the entire cartesian product but it also regenerates the grid for each one of those cases.

eivindjahren · 2020-12-19T09:34:49Z

eivindjahren
Dec 19, 2020
Maintainer

One way of achieving what you want @dotfloat with the "slow tests" is to increase the amount of samples that are run on the master branch.

I don't think generating every i,j,k while keeping nx,ny,nz (as the original code does) will catch more bugs than varying nx,ny,nz as well. Unfortunately, taking the cartesian product of all 6 dimensions starts being hairy in running time, and you are left with no options if you want to add yet another dimension. Instead, we could occasionally run the tests with increased sample size. It is unlikely we want to do this sort of number crunching on a PR (but we could) instead we could run this (say every night?) on master to detect if any bugs have entered the code. Note that the random seed changes every time, so every time you are potentially testing new parts of the space. Likely the union of points tested will be greater than what you would be able to search through every time the test runs.

2 replies

pinkwah Dec 20, 2020
Maintainer Author

I agree that varying the sizes is a win too. How about testing a small matrix completely, and having a separate test that samples the size and also the points within?

eivindjahren Dec 21, 2020
Maintainer

A small matrix has a high probability of being tested completely though so adding the machinery for testing it completely every time is just added complexity.

markusdregi · 2020-12-22T06:21:23Z

markusdregi
Dec 22, 2020
Maintainer

I remember writing these tests :) And they are probably all too extensive, but at the time we experienced some wrong answers when testing point inclusion in cells...

I think it would be useful to keep at least one test with a grid that is at least 3x3 (as then you get a cell completely within the grid) that checks:

all cell corners and their containment,
containment of at least one point on each face of each cell (6 faces per cell) and
at least one point within each cell.

If we would like to strengthen point II, then one can notice that each of the faces actually decomposes into two triangles. Hence, it is perhaps interesting to test one point on the diagonal of the face as well as one strictly within each triangle (perhaps the centroid)?

Besides this I'm all for randomising!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weaken tests by using random sampling #781

{{title}}

Replies: 5 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Weaken tests by using random sampling #781

pinkwah Dec 18, 2020 Maintainer

Replies: 5 comments · 3 replies

eivindjahren Dec 18, 2020 Maintainer

eivindjahren Dec 18, 2020 Maintainer

pinkwah Dec 18, 2020 Maintainer Author

eivindjahren Dec 19, 2020 Maintainer

eivindjahren Dec 19, 2020 Maintainer

pinkwah Dec 20, 2020 Maintainer Author

eivindjahren Dec 21, 2020 Maintainer

markusdregi Dec 22, 2020 Maintainer

pinkwah
Dec 18, 2020
Maintainer

Replies: 5 comments 3 replies

eivindjahren
Dec 18, 2020
Maintainer

eivindjahren
Dec 18, 2020
Maintainer

pinkwah
Dec 18, 2020
Maintainer Author

eivindjahren Dec 19, 2020
Maintainer

eivindjahren
Dec 19, 2020
Maintainer

pinkwah Dec 20, 2020
Maintainer Author

eivindjahren Dec 21, 2020
Maintainer

markusdregi
Dec 22, 2020
Maintainer