-
Notifications
You must be signed in to change notification settings - Fork 8
Implementation Details for the "Overwrite" Option
Note that, by default, no auxiliary files should be generated by the preprocessing script -- the only output should be a .db
file created using Python's sqlite3
module. However, if certain options are passed to the preprocessing script (-pg
, -px
, -spqr
, -nbdf
, -npdf
, -sp
, etc), this can cause certain extra/"auxiliary" files to be generated during a run of the script.
We create certain auxiliary files directly from the Python code in the preprocessing script: this includes *.gv
, *.xdot
, *_links
, *_single_links
files, as well as the sp_[bubbles|chains|etc].txt
files generated by -sp
. These auxiliary files are generated using Python's os.open()
method with the O_EXCL
flag set, so on modern computer systems creating these files shouldn't overwrite extant files if if -w
is not passed.
However, this isn't necessarily the case for auxiliary files generated outside of the Python code (e.g. spqrD.gml
or component_D.info
files). So those writing operations are technically vulnerable to that race condition, although it's an admittedly uncommon one.
When we call check_file_existence()
before creating a new auxiliary
file from within save_aux_file()
in the preprocessing script, a user or a process could get around this check for errors by
creating a file or directory at the checked filepath after
check_file_existence()
is called but before we start writing to that
filepath. This could result in data loss for whoever owns the recently
created file/directory, or it could result in this script running into
an error. In either case, it's not a desirable situation (although it is
an uncommon one).
We circumvent this by using os.fdopen()
wrapped to os.open()
, with
certain flags (based on whether or not the user passed -w
) set in order
to create files here. (This function is the one place where
MetagenomeScope's preprocessing script directly writes to a file; all
other file creation operations are done by other processes, e.g. the
SPQR script or pysqlite.) This approach allows us to guarantee an
error will be thrown and no data will be erroneously written if
the aforementioned race condition happens.
(Note that, for NFS, this approach only works "...when using NFSv3 or later on kernel 2.6 or later," according to the open(2) man page as of June 8, 2018. That being said, NFSv3 dates back to June 1995 and the Linux kernel v2.6 dates back to December 2003, so most modern systems shouldn't encounter this race condition.)
The use of os.open()
in conjunction with the os.O_EXCL
flag in order to prevent the race condition, as well as the background
information for this writeup, is based on
Adam Dinwoodie (username me_and
)'s answer to this Stack
Overflow question.
-
Controls
(Work in progress)
-
Viewer Interface Tutorial