Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential enhancement to Mask interface to help avoid FPEs #117

Open
jgfouca opened this issue Jun 24, 2021 · 14 comments
Open

Potential enhancement to Mask interface to help avoid FPEs #117

jgfouca opened this issue Jun 24, 2021 · 14 comments
Assignees
Labels
discussion enhancement New feature or request

Comments

@jgfouca
Copy link
Member

jgfouca commented Jun 24, 2021

As I'm cleaning up FPEs in SHOC, I'm encountering this pattern over and over (this pattern is widespread in P3 too):

Spack pack1(data1), pack2(data2);
pack2.set(pack1 != 0, val / pack1);

... which needs to be changed in over to avoid the FPE (if they are on) to:

Spack pack1(data1), pack2(data2);
const auto mask = pack1 != 0;
if (mask.any()) {
  pack2.set(mask, val / pack1);
}

Because, even if pack2 is protected from getting infs by the mask, the divide-by-zero still occurs when val/pack1 gets evaluated. The code that needed to protect against this makes the code uglier and may impact performance since it introduces another branch.

I was thinking a macro like this would be nice:

#define FPE_SAFE_SET(pack, mask, val)
  #if defined(SCREAM_FPE)
    if (mask.any()) {
      pack.set(mask, val);
    }
  #else
    pack.set(mask, val);
  #endif

Which would be used like this:

Spack pack1(data1), pack2(data2);
FPE_SAFE_SET(pack2, pack1 != 0, val / pack1);

Thoughts?

@jgfouca jgfouca added enhancement New feature or request discussion labels Jun 24, 2021
@ambrad
Copy link
Member

ambrad commented Jun 24, 2021

Jim, would you point to an example of this pattern in the code? Thanks.

@jgfouca
Copy link
Member Author

jgfouca commented Jun 24, 2021

@ambrad , sure. Here's a change I had to make in my dev branch:

--- a/components/scream/src/physics/shoc/atmosphere_macrophysics.hpp
+++ b/components/scream/src/physics/shoc/atmosphere_macrophysics.hpp
@@ -298,10 +298,12 @@ public:
 
         inv_qc_relvar(i,k) = 1;
         const auto condition = (qc(i,k) != 0 && qc2(i,k) != 0);
-        inv_qc_relvar(i,k).set(condition,
-                               ekat::min(inv_qc_relvar_max,
-                                         ekat::max(inv_qc_relvar_min,
-                                                   ekat::square(qc(i,k))/qc2(i,k))));
+        if (condition.any()) {
+          inv_qc_relvar(i,k).set(condition,
+                                 ekat::min(inv_qc_relvar_max,
+                                           ekat::max(inv_qc_relvar_min,
+                                                     ekat::square(qc(i,k))/qc2(i,k))));
+        }

This was one of a dozen or so similar changes I needed to make.

@ambrad
Copy link
Member

ambrad commented Jun 24, 2021

Jim, would you point me to your branch? I'm curious if all of the cases involve clipping with min and max.

@bartgol
Copy link
Contributor

bartgol commented Jun 24, 2021

Here's a crazy idea, which you guys might deem too "bulky": we could wrap binary ops results into a "pack expression", which is only evaluated when it's accessed. Something along the line of:

// Does not eval the expression.
PackExpression operator / (const Pack& p1, const Pack& p2);

// Only evaluates the expression if m[i] is true.
Pack& Pack::set(const Mask& m, const PackExpression& expr);

// Evaluates the expression everywhere
Pack& Pack::operator=(const PackExpression& expr);

The struct PackExpression can simply hold references to the args.

I think Sacado in Trilinos had a similar issue, which is due to expressions in function calls being evaluated before passing control to the fcn.

@jgfouca
Copy link
Member Author

jgfouca commented Jun 24, 2021

@ambrad , I am about to make a PR; my branch is jgfouca/fix_fpe_testing

@ambrad
Copy link
Member

ambrad commented Jun 24, 2021

One of my concerns here is it's not clear to me the original F90 is quite what we want. Then again, I might just be confused. Here is the original F90 corresponding to the C++ above:

    relvar(:,:) = 1.0_r8
    relvarmax = 10.0_r8
    where (rcm(:ncol,:pver) /= 0.0 .and. rcm2(:ncol,:pver) /= 0.0) &
           relvar(:ncol,:pver) = min(relvarmax,max(0.001_r8,rcm(:ncol,:pver)**2.0/rcm2(:ncol,:pver)))

For simplicity, assume these are all scalars, not arrays. Suppose rcm2 = 0 and rcm = 1. Then rcm/rcm2 = +Inf. In that case, one would think relvar should be set to relvarmax = 10. But, instead, the above code makes relvar = 1. Yet if rcm2 were just a bit above 0, say rcm2 = 1e-6, the above code would indeed make it 10. So that makes the overall function discontinuous. A similar situation occurs when rcm = 0 but rcm2 != 0. relvar is set to 1 but if rcm were just a bit above 0, it would instead be 0.001.

Someone please correct me if I'm just confused.

I'd like to look at the other cases to see if similar strange things are happening.

@bartgol
Copy link
Contributor

bartgol commented Jun 24, 2021

@ambrad what you say seems right to me, but it has to do with the algorithm, rather than the impl. In general, the question is how to deal with "compute x=f(a,b), then set y=x only where condition(a,b) is satisfied". It seems to me that the two issues are related, but could be tackled independently. In the example above, e.g., the solution might be to init relvar=relvarmax (which is the value to be used when handling the FPE case).

@jgfouca
Copy link
Member Author

jgfouca commented Jun 24, 2021

@ambrad , i think maybe i picked one that was more complicated than necessary. Most of the cases are very simple where you have a zero if check in fortran and a != mask on the C++ side.

Here's a simpler one:

C++:

@@ -207,10 +211,13 @@ void Functions<S,D>::shoc_assumed_pdf(
       Spack r_qwthl_1(0);
       {
         const Spack testvar = a*sqrtqw2_1*sqrtthl2_1 + (1 - a)*sqrtqw2_2*sqrtthl2_2;
-        r_qwthl_1.set(testvar != 0,
-                      ekat::max(-1,
-                                ekat::min(1, (qwthlsec - a*(qw1_1 - qw_first)*(thl1_1 - thl_first)
-                                              - (1 - a)*(qw1_2 - qw_first)*(thl1_2 - thl_first))/testvar)));
+        const auto testvar_ne_zero = testvar != 0;
+        if (testvar_ne_zero.any()) {
+          r_qwthl_1.set(testvar_ne_zero,
+                        ekat::max(-1,
+                                  ekat::min(1, (qwthlsec - a*(qw1_1 - qw_first)*(thl1_1 - thl_first)
+                                                - (1 - a)*(qw1_2 - qw_first)*(thl1_2 - thl_first))/testvar)));
+        }
       }

F90:

  testvar=(a*sqrtqw2_1*sqrtthl2_1+(1._rtype-a)*sqrtqw2_2*sqrtthl2_2)

  if (testvar .eq. 0._rtype) then
    r_qwthl_1=0._rtype
  else
    r_qwthl_1=max(-1.0_rtype,min(1.0_rtype,(qwthlsec-a*(qw1_1-qw_first) &
      *(thl1_1-thl_first)-(1._rtype-a)*(qw1_2-qw_first) &
      *(thl1_2-thl_first))/testvar))
  endif

@ambrad
Copy link
Member

ambrad commented Jun 24, 2021

This case seems to have the same issue as above. If testvar = 0, then r_qwthl_1 = 0. But if it deviates from 0 by a tiny bit, then r_qwthl_1 = 1 or -1. So again this divide-by-0 situation arises in a context where something bad is happening.

@bartgol
Copy link
Contributor

bartgol commented Jun 24, 2021

I was thinking a macro like this would be nice:

#define FPE_SAFE_SET(pack, mask, val)
  #if defined(SCREAM_FPE)
    if (mask.any()) {
      pack.set(mask, val);
    }
  #else
    pack.set(mask, val);
  #endif

Wait, doesn't this evaluate val anyways? That is, the FPE is still generated, even if mask.any()==false...

@ambrad
Copy link
Member

ambrad commented Jun 24, 2021

@bartgol I'm not convinced yet that code that properly implements the evaluation of a continuous function will necessarily require delayed evaluation to look cleaner.

Re: the question you just asked, the issue here is when pack size = 1 in an FPE build, so the macro above does accomplish what is intended.

@bartgol
Copy link
Contributor

bartgol commented Jun 24, 2021

I think the issue of the continuous fcn is separate. Meaning, even if you init relvar=relvarmax int he 1st example (hence, it's continuous w.r.t rcm2), you still have the FPE generated by evaluating rcm/rcm2 when calling pack.set(m,rcm/rcm2) (the f90 code does not have this issue, since the division is only performed for those rcm2 that are nonzero).

I agree that the fact that relvar is not continuous w.r.t. rcm2 might be an issue, but I think that has nothing to do with FPEs.

@bartgol
Copy link
Contributor

bartgol commented Jun 24, 2021

Re: the question you just asked, the issue here is when pack size = 1 in an FPE build, so the macro above does accomplish what is intended.

Ah, yes, you are right, for pack size 1 it would be fine.

@ambrad
Copy link
Member

ambrad commented Jun 24, 2021

Luca, I don't disagree and have no opinion re: Jim's macro, your suggestions, etc. Obviously please feel free to introduce those additional mechanisms. My point is only that in the course of implementing a continuous function with clipping, it's likely the .any(), etc., usage will end up leading to no divide-by-0 in the packsize=1 case relevant to the FPE builds, obviating any new pack mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants