Skip to content

Commit

Permalink
Doc update: deprecate NC_SHARE and --disable-file-sync
Browse files Browse the repository at this point in the history
  • Loading branch information
wkliao committed Feb 10, 2024
1 parent 0368c82 commit 90f6f78
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 37 deletions.
56 changes: 41 additions & 15 deletions doc/README.consistency.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,53 @@
## Note on parallel I/O data consistency

PnetCDF follows the same parallel I/O data consistency as MPI-IO standard.
Refer the URL below for more information.
PnetCDF follows the same parallel I/O data consistency as MPI-IO standard,
quoted below.

```
Consistency semantics define the outcome of multiple accesses to a single file.
All file accesses in MPI are relative to a specific file handle created from a
collective open. MPI provides three levels of consistency:
* sequential consistency among all accesses using a single file handle,
* sequential consistency among all accesses using file handles created from a
single collective open with atomic mode enabled, and
* user-imposed consistency among accesses other than the above.
Sequential consistency means the behavior of a set of operations will be as if
the operations were performed in some serial order consistent with program
order; each access appears atomic, although the exact ordering of accesses is
unspecified. User-imposed consistency may be obtained using program order and
calls to MPI_FILE_SYNC.
```

Users are referred to the MPI standard Chapter 14.6 Consistency and Semantics
for more information.
http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report/node296.htm#Node296

Readers are also referred to the following paper.
Rajeev Thakur, William Gropp, and Ewing Lusk, On Implementing MPI-IO Portably
and with High Performance, in the Proceedings of the 6th Workshop on I/O in
Parallel and Distributed Systems, pp. 23-32, May 1999.

If users would like PnetCDF to enforce a stronger consistency, they should add
NC_SHARE flag when open/create the file. By doing so, PnetCDF adds
MPI_File_sync() after each MPI I/O calls.
* For PnetCDF collective APIs, an MPI_Barrier() will also be called right
after MPI_File_sync().
* For independent APIs, there is no need for calling MPI_Barrier().

Users are warned that the I/O performance when using NC_SHARE flag could become
significantly slower than not using it.

If NC_SHARE is not set, then users are responsible for their desired data
consistency. To enforce a stronger consistency, users can explicitly call
ncmpi_sync(). In ncmpi_sync(), MPI_File_sync() and MPI_Barrier() are called.
* NC_SHARE has been deprecated in PnetCDF release of 1.13.0.
+ NC_SHARE is a legacy flag inherited from NetCDF-3, whose purpose is to
provide some degree of data consistency for multiple processes concurrently
accessing a shared file. To achieve a stronger consistency, user
applications are required to also synchronize the processes, such as
calling MPI_Barrier, together with nc_sync.
+ Because PnetCDF follows the MPI file consistency, which only addresses the
case when all file accesses are relative to a specific file handle created
from a collective open, NC_SHARE becomes invalid. Note that NetCDF-3
supports only sequential I/O and thus has no collective file open per se.

If users would like a stronger consistency, they may consider using the code
fragment below after each collective write API call (e.g.
`ncmpi_put_vara_int_all`, `ncmpi_wait_all` `ncmpi_enddef`, `ncmpi_redef`,
`ncmpio_begin_indep_data`, `ncmpio_end_indep_data`).
```
ncmpi_sync(ncid);
MPI_Barrier(comm);
ncmpi_sync(ncid);
```
Users are warned that the I/O performance could become significantly slower.

### Note on header consistency in memory and file
In data mode, changes to file header can happen in the following scenarios.
Expand Down
15 changes: 5 additions & 10 deletions man/pnetcdf.m4
Original file line number Diff line number Diff line change
Expand Up @@ -495,10 +495,9 @@ Creates a new netCDF dataset at ARG(path) collectively by a group of MPI
processes specified by ARG(comm), returning a netCDF ID in ARG(ncid). The
argument ARG(cmode) may <<include>> the bitwise-or of the following flags:
MACRO(NOCLOBBER) to protect existing datasets (default is MACRO(CLOBBER),
silently blows them away), MACRO(SHARE) for stronger metadata data consistency
control, MACRO(64BIT_OFFSET) to create a file in the 64-bit offset format
(CDF-2), as opposed to classic format, the default, or MACRO(64BIT_DATA) to
create a file in the 64-bit data format (CDF-5).
silently blows them away), MACRO(64BIT_OFFSET) to create a file in the
64-bit offset format (CDF-2), as opposed to classic format, the default, or
MACRO(64BIT_DATA) to create a file in the 64-bit data format (CDF-5).
Use either MACRO(64BIT_OFFSET) or MACRO(64BIT_DATA).
The 64-bit offset format allows the creation of very large files with far fewer
restrictions than netCDF classic format, but can only be read by the netCDF
Expand Down Expand Up @@ -530,7 +529,7 @@ Opens an existing netCDF dataset at ARG(path) collectively by a group of MPI
processes specified by ARG(comm), returning a netCDF ID in ARG(ncid). The type
of access is described by the ARG(mode) parameter, which may <<include>> the
bitwise-or of the following flags: MACRO(WRITE) for read-write access (default
read-only), MACRO(SHARE) for stronger metadata data consistency control.
read-only).
.sp
ifelse(DAP,TRUE,
<<As of NetCDF version 4.1, and if DAP support was enabled
Expand Down Expand Up @@ -559,11 +558,7 @@ After a successful call, variable data can be read or written to the dataset.
.HP
FDECL(sync, (INCID()))
.sp
Unless the
MACRO(SHARE)
bit is set in
FREF(open) or FREF(create),
data written by PnetCDF APIs may be cached by local file system on each compute
Data written by PnetCDF APIs may be cached by local file system on each compute
node. This <<API>> flushes cached data by calling MPI_File_sync.
.HP
FDECL(abort, (INCID()))
Expand Down
15 changes: 5 additions & 10 deletions man/pnetcdf_f90.m4
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,9 @@ Creates a new netCDF dataset at \fIpath\fP collectively by a group of MPI
processes specified by \fIcomm\fP, returning a netCDF ID in \fIncid\fP. The
argument \fIcmode\fP may include the bitwise-or of the following flags:
\fBnf90_noclobber\fR to protect existing datasets (default is \fBnf90_clobber\fR,
silently blows them away), \fBnf90_share\fR for stronger metadata data consistency
control, \fBnf90_64bit_offset\fR to create a file in the 64-bit offset format
(CDF-2), as opposed to classic format, the default, or \fBnf90_64bit_data\fR to
create a file in the 64-bit data format (CDF-5).
silently blows them away), \fBnf90_64bit_offset\fR to create a file in the
64-bit offset format (CDF-2), as opposed to classic format, the default, or
\fBnf90_64bit_data\fR to create a file in the 64-bit data format (CDF-5).
Use either \fBnf90_64bit_offset\fR or \fBnf90_64bit_data\fR.
The 64-bit offset format allows the creation of very large files with far fewer
restrictions than netCDF classic format, but can only be read by the netCDF
Expand Down Expand Up @@ -115,7 +114,7 @@ Opens an existing netCDF dataset at \fIpath\fP collectively by a group of MPI
processes specified by \fIcomm\fP, returning a netCDF ID in \fIncid\fP. The type
of access is described by the \fImode\fP parameter, which may include the
bitwise-or of the following flags: \fBnf90_write\fR for read-write access (default
read-only), \fBnf90_share\fR for stronger metadata data consistency control.
read-only).
.sp

The argument \fImode\fP must be consistent among all MPI processes that
Expand Down Expand Up @@ -158,11 +157,7 @@ integer, intent(in) :: ncid
integer :: nf90mpi_sync
.fi
.sp
Unless the
\fBnf90_share\fR
bit is set in
\fBnf90mpi_open(\|)\fR or \fBnf90mpi_create(\|)\fR,
data written by PnetCDF APIs may be cached by local file system on each compute
Data written by PnetCDF APIs may be cached by local file system on each compute
node. This API flushes cached data by calling MPI_File_sync.
.RE
.HP
Expand Down
13 changes: 11 additions & 2 deletions sneak_peek.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ This is essentially a placeholder for the next release note ...
+ none
* Configure options
+ `--disable-file-sync` is now deprecated. This configure option alone does
not provide a sufficient data consistency. Users are suggested to call
`ncmpi_sync` and `MPI_Barrier` to achieve a desired consistency.
+ `--enable-install-examples` to install example programs under folder
`${prefix}/pnetcdf_examples` along with run script files. An example is
`${prefix}/pnetcdf_examples/C/run_c_examples.sh`. The default of this
Expand Down Expand Up @@ -53,10 +56,16 @@ This is essentially a placeholder for the next release note ...
+ none
* API syntax changes
+ none
+ File open flag NC_SHARE is now deprecated. It is still defined, but takes
no effect.
* API semantics updates
+ none
+ NC_SHARE alone is not sufficient to provide data consistency for accessing
a shared file in parallel and thus is now deprecated. Because PnetCDF
follows the MPI file consistency, which only addresses the case when all
file accesses are relative to a specific file handle created from a
collective open, NC_SHARE becomes invalid. See doc/README.consistency.md
for more information.
* New error code precedence
+ none
Expand Down

0 comments on commit 90f6f78

Please sign in to comment.