Skip to content

Commit

Permalink
Merge pull request #19578 from ronawho/fix-ofi-fenced-am-cp
Browse files Browse the repository at this point in the history
Fix an MCM issue between AMs and PUTs in ofi msg-order-fence (cherry-pick #19575)

Fix a race where an AM wasn't waiting for outstanding PUTs to complete
for ofi message-order-fence. For a case like:

```chpl
var a = 0;
on Locales[numLocales-1] {
  a = 1;
  on Locales[0] {
    assert(a == 1);
  }
}
```

the AM (`on Locales[0]`) wasn't waiting for the pending PUT (`a = 1`) to
complete. The comments in the code indicate we thought we were handling
this correctly by issuing our AM with `FI_FENCE` to wait for any pending
operations, but that does not appear to be happening.

Here I'm applying a hotfix to extend how we force message visibility to
stop skipping the target node. We want to understand what's going on
here better, but for now I just want to get a base fix in to address
failures we're seeing.

Also add a CHANGES.md entry since we're tying to land this hotfix in the
1.26 release.
  • Loading branch information
ronawho authored Mar 31, 2022
2 parents b558a9e + 28488b9 commit 06b1198
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 4 deletions.
8 changes: 8 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
Release Changes List
====================

version 1.26.0.1
================

This version is a slight variation on the 1.26.0 release containing a
late-breaking bug fix that is specific to CHPL_COMM='ofi' when using the
'cxi' provider.


version 1.26.0
==============

Expand Down
7 changes: 3 additions & 4 deletions runtime/src/comm/ofi/comm-ofi.c
Original file line number Diff line number Diff line change
Expand Up @@ -4288,8 +4288,7 @@ void amReqFn_msgOrdFence(c_nodeid_t node,
// to any node are visible. Similarly, for GETs we have to ensure
// that previous AMOs and PUTs to the target node are visible, and for
// PUTs we have to ensure that previous AMOs to the target node are
// visible. Do that here for all nodes except this op's target. For
// that node, we'll use a fenced send instead.
// visible. Do that here for all nodes.
//
chpl_bool havePutsOut = false;
chpl_bool haveAmosOut = false;
Expand All @@ -4298,7 +4297,7 @@ void amReqFn_msgOrdFence(c_nodeid_t node,
case am_opExecOn:
case am_opExecOnLrg:
forceMemFxVisAllNodes(true /*checkPuts*/, true /*checkAmos*/,
node /*skipNode*/, tcip);
-1 /*skipNode*/, tcip);
havePutsOut = (tcip->putVisBitmap != NULL
&& bitmapTest(tcip->putVisBitmap, node));
haveAmosOut = (tcip->amoVisBitmap != NULL
Expand All @@ -4308,7 +4307,7 @@ void amReqFn_msgOrdFence(c_nodeid_t node,
{
chpl_bool amoHasMemFx = (req->amo.ofiOp != FI_ATOMIC_READ);
forceMemFxVisAllNodes(amoHasMemFx /*checkPuts*/, true /*checkAmos*/,
node /*skipNode*/, tcip);
-1 /*skipNode*/, tcip);
havePutsOut = (amoHasMemFx
&& tcip->putVisBitmap != NULL
&& bitmapTest(tcip->putVisBitmap, node));
Expand Down

0 comments on commit 06b1198

Please sign in to comment.