-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add checksums to hdf5 datasets datasets #4831
Add checksums to hdf5 datasets datasets #4831
Conversation
f4b073d
to
540728c
Compare
The codeclimate issue is basically because I wanted to match the source code as closely as possible. However this just looks like it is the HDF5 file locking mechanism, and if we just want to bypass that, I can remove it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with this is that we've copied a block of code from Group.create_group
in h5py into here. What happens if h5py
change their own code? However, because h5py
is using explicit class names, I don't see a better way to do this.
Thats my concern as well - I will try to keep on top of if there are any changes to that code (Ive set up a daily checker for changes to the github commit history) |
…during what? |
I missed that I had'nt finished the sentence! ... during the time between write and read |
Add fletcher32 checksums by default to hdf5 datasets
Standard information about the request
This is a new feature
This change affects everywhere that hdf5 file objects are used
This change follows style guidelines (See e.g. PEP8), has been proposed using the contribution guidelines
Motivation
This will check whether data has been corrupted in any way during the interim period from writing to loading.
From the documentation
Implementation
Add a wrapper around the h5py Group object which does two things:
create_group
function so that it points to this wrappercreate_dataset
methodAdd an inheritence to this wrapper to the HFile class, so that these are used in the file generally
Links to any issues or associated PRs
#1525 - (I was sorting issues in order, oldest first)
Testing performed
Run fit_sngls_over_multiparam - which uses both direct assignment on the file as well as
create_group()
and assignment to that, and check that the files are identical at the output.h5ls -rv
on the output we see that all datasets have the lineFilter-0: fletcher32-3 {}
Same test on
pycbc_add_statmap
, same resultAdditional notes
Some note about the implementation:
create_group
method's code is copied from the h5py source code, but couldn't see a nicer implementation.I can add compression to the
create_dataset
wrapper if wanted, and then we can replace the stuff inpycbc_coinc_statmap
, though I'm not sure of the I/O penalty for that