flaky tests #120

jmr · 2025-03-20T08:29:09Z

Some tests are flaky due to use of randomness. Here are the failure counts for 1000 runs:

     10 TestCellContainsPointConsistentWithS2CellIDFromPoint
      1 TestCellDistanceToEdge
     24 TestCellUnionNormalizePseudoRandom
      2 TestChordAngleBetweenPoints
      3 TestConvexHullQueryPointsInsideHull
     12 TestEdgeClippingClipToPaddedFace
      2 TestPaddedCellShrinkToFit
     24 TestTestingFractal

The text was updated successfully, but these errors were encountered:

Some tests are flaky due to the random seed. It's not clear whether the problem is the code or the test. These all need further investigation. Just fix the seed for now so that the tests pass. golang#120

rsned · 2025-03-25T18:50:53Z

Need to add a seed flag and plumb it through the tests.
If tests are still flaky after using a consistent seed throughout, will need to refactor the helpers that use the random generation to allow an optional rand.Source as needed.

rsned · 2025-03-27T05:00:57Z

If there are tests that are still flaky after using new seeded random in testing comes through, update the flagged flaky methods to create a specific random Source to pass into the calls to random things

e.g. r := rand.New(rand.NewSource(1))

jmr · 2025-03-27T07:48:04Z

e.g. r := rand.New(rand.NewSource(1))

This should always be considered a temporary workaround. Tests should pass with all possible seeds.

jmr · 2025-04-02T11:23:07Z

Now, out of 1k runs, I see:

      2 TestCellContainsPointConsistentWithS2CellIDFromPoint
      7 TestCellDistanceToEdge

=== RUN   TestCellDistanceToEdge
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:758: {4 29 2 11000670145955975532 {{0.9862056245871376 0.9862056295284304} {-0.38971637592477315 -0.38971637226702704}}}.DistanceToEdge((0.267375917282735797719795, 0.686078227799468698400176, 0.676614206322004418936444), (0.267375926952832576599661, 0.686078217696954650861585, 0.676614212744517828923563)) = 179.9999976, want 180.0000000
--- FAIL: TestCellDistanceToEdge (0.00s)
=== RUN   TestCellDistanceToEdge
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:758: {4 26 3 9556877047513654528 {{-0.06904137022373527 -0.06904134839468679} {-0.4024873254542941 -0.4024872959350345}}}.DistanceToEdge((0.372615487166876868663223, 0.925781986026771197551000, -0.063917236108254532611639), (0.372615480827304668132172, 0.925781988906149266860268, -0.063917231360653428695606)) = 179.9999976, want 180.0000000
--- FAIL: TestCellDistanceToEdge (0.00s)
=== RUN   TestCellDistanceToEdge
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:758: {3 29 0 8842001826768255220 {{-0.07966830116367281 -0.07966829839924419} {0.006045478339771722 0.006045480845718505}}}.DistanceToEdge((0.996823407456148169458743, 0.006026273905833879653005, -0.079415227572085633767074), (0.996823407695055063726386, 0.006026266183654568724115, -0.079415225159296345958104)) = 179.9999976, want 180.0000000
--- FAIL: TestCellDistanceToEdge (0.00s)
=== RUN   TestCellDistanceToEdge
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:758: {3 29 1 8921361455338558108 {{-0.23330310236446708 -0.23330309912643066} {0.3398573945985196 0.3398573981279007}}}.DistanceToEdge((0.924526646664121320995378, 0.314207216719537940630147, -0.215694934037303193141710), (0.924526646678424102177019, 0.314207218112570574319875, -0.215694931946737417094440)) = 179.9999976, want 180.0000000
--- FAIL: TestCellDistanceToEdge (0.00s)
=== RUN   TestCellDistanceToEdge
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:758: {4 27 1 10981155640602174656 {{0.9066021650982308 0.9066021842579453} {-0.15111907794184543 -0.15111906596575575}}}.DistanceToEdge((0.111262505947894135838183, 0.736257169376785869374658, 0.667492348504069132886229), (0.111262505938919648018626, 0.736257169402754541032152, 0.667492348476921182331978)) = 179.9999976, want 180.0000000
--- FAIL: TestCellDistanceToEdge (0.01s)
=== RUN   TestCellDistanceToEdge
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:758: {4 26 2 11434922374428377344 {{0.4987063433775232 0.49870637476753643} {-0.46248924825598936 -0.462489217556751}}}.DistanceToEdge((0.382417884354746773212241, 0.826868738584582430029002, 0.412364706148793724871382), (0.382417884491440707162013, 0.826868737862826774787095, 0.412364707469283109375624)) = 179.9999976, want 180.0000000
--- FAIL: TestCellDistanceToEdge (0.00s)
=== RUN   TestCellDistanceToEdge
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:758: {0 28 2 1183189571281627152 {{0.10346630437590645 0.10346631006182486} {0.06665937837939376 0.06665938382047898}}}.DistanceToEdge((-0.992510609445896863078929, -0.102691410122022236395267, -0.066160142264697643921245), (-0.992510609444993141536884, -0.102691410156553503130183, -0.066160142224656506848568)) = 179.9999976, want 180.0000000
--- FAIL: TestCellDistanceToEdge (0.00s)

=== RUN   TestCellContainsPointConsistentWithS2CellIDFromPoint
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:574: For p=(-0.370762717941454711390037, -0.350223940409457201727861, -0.860161728135318992549685), CellFromCellID(cellIDFromPoint(p)).ContainsPoint(p) was false
--- FAIL: TestCellContainsPointConsistentWithS2CellIDFromPoint (0.00s)
=== RUN   TestCellContainsPointConsistentWithS2CellIDFromPoint
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:574: For p=(0.808990606873618012251370, -0.390616995366782127074856, -0.439263657637281756951353), CellFromCellID(cellIDFromPoint(p)).ContainsPoint(p) was false
    third_party/golang/github_com/golang/geo/v/v0/s2/cell_test.go:574: For p=(-0.350835790694997651240072, 0.009349383896125742360317, -0.936390322989392509533957), CellFromCellID(cellIDFromPoint(p)).ContainsPoint(p) was false
--- FAIL: TestCellContainsPointConsistentWithS2CellIDFromPoint (0.00s)

Remove TODO from tests that are no longer flaky. (Tested 1k times, see golang#120 (comment))

Remove TODO from tests that are no longer flaky. (Tested 1k times, see #120 (comment))

flwyd · 2025-04-22T06:48:36Z

If I understand TestCellContainsPointConsistentWithS2CellIDFromPoint correctly, it's uncovering a bug, so the RNG occasionally produces points which correctly fail the assertion. My annotated thinking:

for iter := 0; iter < 1000; iter++ {
        cell := CellFromCellID(randomCellID())
        i1 := randomUniformInt(4) // i1 is a random conrer
        i2 := (i1 + 1) & 3 // i2 is the next corner, counterclockwise
        v1 := cell.Vertex(i1) // v1 is a random corner of the given cell
        v2 := samplePointFromCap( // pick a random point in a cap
          CapFromCenterAngle(cell.Vertex(i2), // centered at the second corner
             s1.Angle(epsilon))) // with a radius of 1e-15
        // if the corner is a right angle, there's a 25% chance that v2 is in the interior of the cell and a 75% chance it's outside
        // v2 is within 0.0 and 1e-15 of the true vertex, with likelihood increasing proportional to the square of the radius
        p := Interpolate(randomUniformFloat64(0, 1.0), v1, v2) // p is a random point between v1 and v2
        // p could be outside cell by at most 1e-15, but that's larger than dblEpsilon of approx. 2.2e-16
        if !CellFromCellID(
          cellIDFromPoint(p) // this could be adjacent to cell, since p can be on or over the edge
        ).ContainsPoint(p) { // cellIdFromPoint says this should always be true
                t.Errorf("For p=%v, CellFromCellID(cellIDFromPoint(p)).ContainsPoint(p) was false", p)
        }
}

Regardless of what p is, CellFromPoint(p).ContainsPoint(p) should be true. ContainsPoint expands by dblEpsilon to handle points that are very close to the edge, but v2 could be slightly more than dblEpsilon away from a cell vertex. cellIDFromPoint is supposed to return a cell that contains the point, since points on the edge are contained by both adjacent cells (and thus vertex points are contained by four cells). So perhaps cellIDFromPoint has a looser tolerance than dblEpsilon when identifying a potential cell match, but ContainsPoint then applies a tighter constraint.

jmr · 2025-05-08T12:56:40Z

The details of how p is generated aren't important, that's just to efficiently generate hard cases.

I can also reproduce this in C++. It fails about 8/1M times, on level 29 or 30 cells.

0.38203141040035632, 0.030196609707941954, 0.9236558700239289 is a specific example, if that helps to debug.

alan-strohm · 2025-05-08T19:56:35Z

Gemini thinks we should make this change

--- util/geometry/s2cell.cc   2024-03-08 07:19:05.000000000 +0100
+++ util/geometry/s2cell.cc     2025-05-08 20:27:13.000000000 +0200
@@ -287,9 +287,10 @@
   //   S2Cell(S2CellId(p)).Contains(p)
   //
   // is always true.  To do this, we need to account for the error when
-  // converting from (u,v) coordinates to (s,t) coordinates.  At least in the
-  // case of S2_QUADRATIC_PROJECTION, the total error is at most DBL_EPSILON.
-  return uv_.Expanded(DBL_EPSILON).Contains(uv);
+  // converting from (u,v) coordinates to (s,t) coordinates.  At least in
+  // the case of S2_QUADRATIC_PROJECTION, the total error is at most
+  // 2 * DBL_EPSILON.
+  return uv_.Expanded(2 * DBL_EPSILON).Contains(uv);
 }

Here is it's reasoning. Prompt "Is there an argument for why 2*DBL_EPSILON can guarantee that we will avoid future bugs like this? Is this the smallest possible expansion we could make?"

The choice of 2 * DBL_EPSILON is based on an analysis of potential floating-point error accumulation in the sequence of operations that determine cell ownership and check for containment:

Error in S2CellId(p) affecting S2Cell's uv_ bounds:
- When an S2CellId is constructed from an S2Point p, p is first converted from (x,y,z) to (face, u, v) coordinates using S2::XYZtoFaceUV. This can introduce an error of up to 0.5 * DBL_EPSILON in u and v.
- These u,v coordinates are then converted to s,t coordinates using S2::UVtoST (a quadratic projection). This transformation can propagate and potentially amplify the initial error, and its own arithmetic can add further small errors.
- The s,t coordinates are then discretized to integer i,j coordinates (S2::STtoIJ) to form the S2CellId.
- When S2Cell(S2CellId(p)) is constructed, its uv_ member (defining the cell's boundaries in UV space) is calculated from these i,j coordinates. This involves S2CellId::IJLevelToBoundUV, which uses S2::IJtoSTMin (converting i,j to s,t cell boundaries, which is fairly exact for the discrete boundaries) and then S2::STtoUV (converting these s,t boundaries back to u,v). The arithmetic operations within this S2::STtoUV conversion for the cell's uv_ boundaries can introduce an error. Given that STtoUV is a quadratic function involving several multiplications, additions/subtractions, and a division by 3, its arithmetic error for the calculated u,v boundary values could be estimated to be in the range of 1.0 * DBL_EPSILON to 1.5 * DBL_EPSILON relative to the true UV coordinates for those ST boundaries.
Error in S2Cell::Contains(p)'s uv check value:
- The Contains method takes the original S2Point p and converts it again to (face, u, v) using S2::FaceXYZtoUV. Let's call this uv_check. This conversion, as before, can introduce an error of up to 0.5 * DBL_EPSILON in each component of uv_check relative to the true UV coordinates of p.

Why 2 * DBL_EPSILON is argued:

Consider a point p that is truly on the boundary of the cell it gets assigned to by S2CellId(p).

The calculated UV coordinate for p (uv_check.u) might be u_true_boundary - 0.5 * DBL_EPSILON.
The cell's calculated UV boundary (cell.uv_.lo().u) might be u_true_boundary + 1.5 * DBL_EPSILON (if the errors align unfavorably).

In this scenario, uv_check.u would be (u_true_boundary - 0.5 * DBL_EPSILON), and the cell's lower bound cell.uv_.lo().u would appear as (u_true_boundary + 1.5 * DBL_EPSILON). The uv_check.u is now less than cell.uv_.lo().u by 2.0 * DBL_EPSILON.
To ensure that uv_check.u is considered inside the expanded boundary cell.uv_.lo().u - epsilon_expansion, we need epsilon_expansion to be at least 2.0 * DBL_EPSILON.

More generally, the uv_check for the point can be off by ~0.5 * DBL_EPSILON from its true UV position. The uv_ boundary of the cell, derived from the discretized ID, can also be off from its corresponding true UV boundary by ~1.5 * DBL_EPSILON due to the STtoUV calculation. In a worst-case scenario where these errors conspire to push the calculated point coordinate and the calculated cell boundary away from each other (relative to their true relationship), the total separation to bridge could be the sum of these error magnitudes, i.e., 0.5 * DBL_EPSILON + 1.5 * DBL_EPSILON = 2.0 * DBL_EPSILON.

Is this the smallest possible expansion?

Not formally proven minimal: A rigorous mathematical proof of the absolute smallest epsilon would require a detailed forward error analysis of all floating-point operations in the entire chain, which is exceptionally complex.
Empirically necessary: The failing case demonstrates that DBL_EPSILON is insufficient.
Consistent with other S2 practices: The S2 library uses 2 * DBL_EPSILON in other areas (e.g., S2Cell::GetRectBound) to account for accumulated errors from coordinate conversions and normalizations.
Covers primary error sources: The 2.0 * DBL_EPSILON value is derived from summing reasonable estimates of the dominant, direct errors in the two UV values being compared in Contains().

While a slightly smaller value (e.g., 1.9 * DBL_EPSILON) might work for all known cases, 2.0 * DBL_EPSILON is a more standard and robust choice when combining a few error sources of this nature. It provides a margin that accounts for the main known error contributions without being excessively large. Finding a point that fails with (2.0 - delta) * DBL_EPSILON but passes with 2.0 * DBL_EPSILON would be needed to argue for strict minimality of 2.0 * DBL_EPSILON.

Thus, 2 * DBL_EPSILON is a well-justified engineering choice to ensure robustness against accumulated floating point errors in this specific context, rather than a formally proven tightest possible bound.

jmr · 2025-05-09T12:19:37Z

@rsned You've done some error analysis with theorem provers. Any opinions here? Is this comment implying that only the (u, v) -> (s, t) error needs to be considered?

geo/s2/cell.go

Line 486 in 5b58c72

// converting from (u,v) coordinates to (s,t) coordinates. In the

jmr mentioned this issue Mar 20, 2025

Deflake tests #122

Merged

rsned self-assigned this Mar 25, 2025

rsned mentioned this issue Mar 26, 2025

Changes to better support random seeds for flaky tests. #129

Merged

rsned closed this as completed in #129 Apr 1, 2025

rsned closed this as completed in 45267f9 Apr 1, 2025

jmr reopened this Apr 2, 2025

jmr added a commit to jmr/geo that referenced this issue Apr 9, 2025

tests: Remove outdated flaky test comments

4dc5f4d

Remove TODO from tests that are no longer flaky. (Tested 1k times, see golang#120 (comment))

jmr mentioned this issue Apr 9, 2025

tests: Remove outdated flaky test comments #147

Merged

rsned pushed a commit that referenced this issue Apr 10, 2025

tests: Remove outdated flaky test comments (#147)

1bf973d

Remove TODO from tests that are no longer flaky. (Tested 1k times, see #120 (comment))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flaky tests #120

flaky tests #120

jmr commented Mar 20, 2025

rsned commented Mar 25, 2025

rsned commented Mar 27, 2025

jmr commented Mar 27, 2025

jmr commented Apr 2, 2025

flwyd commented Apr 22, 2025

jmr commented May 8, 2025

alan-strohm commented May 8, 2025

jmr commented May 9, 2025

flaky tests #120

flaky tests #120

Comments

jmr commented Mar 20, 2025

rsned commented Mar 25, 2025

rsned commented Mar 27, 2025

jmr commented Mar 27, 2025

jmr commented Apr 2, 2025

flwyd commented Apr 22, 2025

jmr commented May 8, 2025

alan-strohm commented May 8, 2025

jmr commented May 9, 2025