Potential enhancement to Mask interface to help avoid FPEs #117

jgfouca · 2021-06-24T16:29:57Z

As I'm cleaning up FPEs in SHOC, I'm encountering this pattern over and over (this pattern is widespread in P3 too):

Spack pack1(data1), pack2(data2);
pack2.set(pack1 != 0, val / pack1);

... which needs to be changed in over to avoid the FPE (if they are on) to:

Spack pack1(data1), pack2(data2);
const auto mask = pack1 != 0;
if (mask.any()) {
  pack2.set(mask, val / pack1);
}

Because, even if pack2 is protected from getting infs by the mask, the divide-by-zero still occurs when val/pack1 gets evaluated. The code that needed to protect against this makes the code uglier and may impact performance since it introduces another branch.

I was thinking a macro like this would be nice:

#define FPE_SAFE_SET(pack, mask, val)
  #if defined(SCREAM_FPE)
    if (mask.any()) {
      pack.set(mask, val);
    }
  #else
    pack.set(mask, val);
  #endif

Which would be used like this:

Spack pack1(data1), pack2(data2);
FPE_SAFE_SET(pack2, pack1 != 0, val / pack1);

Thoughts?

The text was updated successfully, but these errors were encountered:

ambrad · 2021-06-24T17:42:23Z

Jim, would you point to an example of this pattern in the code? Thanks.

jgfouca · 2021-06-24T17:54:21Z

@ambrad , sure. Here's a change I had to make in my dev branch:

--- a/components/scream/src/physics/shoc/atmosphere_macrophysics.hpp
+++ b/components/scream/src/physics/shoc/atmosphere_macrophysics.hpp
@@ -298,10 +298,12 @@ public:
 
         inv_qc_relvar(i,k) = 1;
         const auto condition = (qc(i,k) != 0 && qc2(i,k) != 0);
-        inv_qc_relvar(i,k).set(condition,
-                               ekat::min(inv_qc_relvar_max,
-                                         ekat::max(inv_qc_relvar_min,
-                                                   ekat::square(qc(i,k))/qc2(i,k))));
+        if (condition.any()) {
+          inv_qc_relvar(i,k).set(condition,
+                                 ekat::min(inv_qc_relvar_max,
+                                           ekat::max(inv_qc_relvar_min,
+                                                     ekat::square(qc(i,k))/qc2(i,k))));
+        }

This was one of a dozen or so similar changes I needed to make.

ambrad · 2021-06-24T18:16:21Z

Jim, would you point me to your branch? I'm curious if all of the cases involve clipping with min and max.

bartgol · 2021-06-24T18:20:04Z

Here's a crazy idea, which you guys might deem too "bulky": we could wrap binary ops results into a "pack expression", which is only evaluated when it's accessed. Something along the line of:

// Does not eval the expression.
PackExpression operator / (const Pack& p1, const Pack& p2);

// Only evaluates the expression if m[i] is true.
Pack& Pack::set(const Mask& m, const PackExpression& expr);

// Evaluates the expression everywhere
Pack& Pack::operator=(const PackExpression& expr);

The struct PackExpression can simply hold references to the args.

I think Sacado in Trilinos had a similar issue, which is due to expressions in function calls being evaluated before passing control to the fcn.

jgfouca · 2021-06-24T18:21:45Z

@ambrad , I am about to make a PR; my branch is jgfouca/fix_fpe_testing

ambrad · 2021-06-24T18:23:07Z

One of my concerns here is it's not clear to me the original F90 is quite what we want. Then again, I might just be confused. Here is the original F90 corresponding to the C++ above:

    relvar(:,:) = 1.0_r8
    relvarmax = 10.0_r8
    where (rcm(:ncol,:pver) /= 0.0 .and. rcm2(:ncol,:pver) /= 0.0) &
           relvar(:ncol,:pver) = min(relvarmax,max(0.001_r8,rcm(:ncol,:pver)**2.0/rcm2(:ncol,:pver)))

For simplicity, assume these are all scalars, not arrays. Suppose rcm2 = 0 and rcm = 1. Then rcm/rcm2 = +Inf. In that case, one would think relvar should be set to relvarmax = 10. But, instead, the above code makes relvar = 1. Yet if rcm2 were just a bit above 0, say rcm2 = 1e-6, the above code would indeed make it 10. So that makes the overall function discontinuous. A similar situation occurs when rcm = 0 but rcm2 != 0. relvar is set to 1 but if rcm were just a bit above 0, it would instead be 0.001.

Someone please correct me if I'm just confused.

I'd like to look at the other cases to see if similar strange things are happening.

bartgol · 2021-06-24T18:29:16Z

@ambrad what you say seems right to me, but it has to do with the algorithm, rather than the impl. In general, the question is how to deal with "compute x=f(a,b), then set y=x only where condition(a,b) is satisfied". It seems to me that the two issues are related, but could be tackled independently. In the example above, e.g., the solution might be to init relvar=relvarmax (which is the value to be used when handling the FPE case).

jgfouca · 2021-06-24T18:31:20Z

@ambrad , i think maybe i picked one that was more complicated than necessary. Most of the cases are very simple where you have a zero if check in fortran and a != mask on the C++ side.

Here's a simpler one:

C++:

@@ -207,10 +211,13 @@ void Functions<S,D>::shoc_assumed_pdf(
       Spack r_qwthl_1(0);
       {
         const Spack testvar = a*sqrtqw2_1*sqrtthl2_1 + (1 - a)*sqrtqw2_2*sqrtthl2_2;
-        r_qwthl_1.set(testvar != 0,
-                      ekat::max(-1,
-                                ekat::min(1, (qwthlsec - a*(qw1_1 - qw_first)*(thl1_1 - thl_first)
-                                              - (1 - a)*(qw1_2 - qw_first)*(thl1_2 - thl_first))/testvar)));
+        const auto testvar_ne_zero = testvar != 0;
+        if (testvar_ne_zero.any()) {
+          r_qwthl_1.set(testvar_ne_zero,
+                        ekat::max(-1,
+                                  ekat::min(1, (qwthlsec - a*(qw1_1 - qw_first)*(thl1_1 - thl_first)
+                                                - (1 - a)*(qw1_2 - qw_first)*(thl1_2 - thl_first))/testvar)));
+        }
       }

F90:

  testvar=(a*sqrtqw2_1*sqrtthl2_1+(1._rtype-a)*sqrtqw2_2*sqrtthl2_2)

  if (testvar .eq. 0._rtype) then
    r_qwthl_1=0._rtype
  else
    r_qwthl_1=max(-1.0_rtype,min(1.0_rtype,(qwthlsec-a*(qw1_1-qw_first) &
      *(thl1_1-thl_first)-(1._rtype-a)*(qw1_2-qw_first) &
      *(thl1_2-thl_first))/testvar))
  endif

ambrad · 2021-06-24T18:41:36Z

This case seems to have the same issue as above. If testvar = 0, then r_qwthl_1 = 0. But if it deviates from 0 by a tiny bit, then r_qwthl_1 = 1 or -1. So again this divide-by-0 situation arises in a context where something bad is happening.

bartgol · 2021-06-24T18:42:43Z

I was thinking a macro like this would be nice:

#define FPE_SAFE_SET(pack, mask, val)
  #if defined(SCREAM_FPE)
    if (mask.any()) {
      pack.set(mask, val);
    }
  #else
    pack.set(mask, val);
  #endif

Wait, doesn't this evaluate val anyways? That is, the FPE is still generated, even if mask.any()==false...

ambrad · 2021-06-24T18:43:28Z

@bartgol I'm not convinced yet that code that properly implements the evaluation of a continuous function will necessarily require delayed evaluation to look cleaner.

Re: the question you just asked, the issue here is when pack size = 1 in an FPE build, so the macro above does accomplish what is intended.

bartgol · 2021-06-24T18:45:50Z

I think the issue of the continuous fcn is separate. Meaning, even if you init relvar=relvarmax int he 1st example (hence, it's continuous w.r.t rcm2), you still have the FPE generated by evaluating rcm/rcm2 when calling pack.set(m,rcm/rcm2) (the f90 code does not have this issue, since the division is only performed for those rcm2 that are nonzero).

I agree that the fact that relvar is not continuous w.r.t. rcm2 might be an issue, but I think that has nothing to do with FPEs.

bartgol · 2021-06-24T18:49:30Z

Re: the question you just asked, the issue here is when pack size = 1 in an FPE build, so the macro above does accomplish what is intended.

Ah, yes, you are right, for pack size 1 it would be fine.

ambrad · 2021-06-24T18:49:53Z

Luca, I don't disagree and have no opinion re: Jim's macro, your suggestions, etc. Obviously please feel free to introduce those additional mechanisms. My point is only that in the course of implementing a continuous function with clipping, it's likely the .any(), etc., usage will end up leading to no divide-by-0 in the packsize=1 case relevant to the FPE builds, obviating any new pack mechanism.

jgfouca added enhancement New feature or request discussion labels Jun 24, 2021

jgfouca assigned bartgol, ambrad and jgfouca Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential enhancement to Mask interface to help avoid FPEs #117

Potential enhancement to Mask interface to help avoid FPEs #117

jgfouca commented Jun 24, 2021 •

edited

Loading

ambrad commented Jun 24, 2021

jgfouca commented Jun 24, 2021

ambrad commented Jun 24, 2021

bartgol commented Jun 24, 2021

jgfouca commented Jun 24, 2021

ambrad commented Jun 24, 2021 •

edited

Loading

bartgol commented Jun 24, 2021 •

edited

Loading

jgfouca commented Jun 24, 2021

ambrad commented Jun 24, 2021

bartgol commented Jun 24, 2021

ambrad commented Jun 24, 2021

bartgol commented Jun 24, 2021 •

edited

Loading

bartgol commented Jun 24, 2021

ambrad commented Jun 24, 2021

Potential enhancement to Mask interface to help avoid FPEs #117

Potential enhancement to Mask interface to help avoid FPEs #117

Comments

jgfouca commented Jun 24, 2021 • edited Loading

ambrad commented Jun 24, 2021

jgfouca commented Jun 24, 2021

ambrad commented Jun 24, 2021

bartgol commented Jun 24, 2021

jgfouca commented Jun 24, 2021

ambrad commented Jun 24, 2021 • edited Loading

bartgol commented Jun 24, 2021 • edited Loading

jgfouca commented Jun 24, 2021

ambrad commented Jun 24, 2021

bartgol commented Jun 24, 2021

ambrad commented Jun 24, 2021

bartgol commented Jun 24, 2021 • edited Loading

bartgol commented Jun 24, 2021

ambrad commented Jun 24, 2021

jgfouca commented Jun 24, 2021 •

edited

Loading

ambrad commented Jun 24, 2021 •

edited

Loading

bartgol commented Jun 24, 2021 •

edited

Loading

bartgol commented Jun 24, 2021 •

edited

Loading