Archiving ERDDAP datasets at NOAA's NCEI and avoiding fix fixity issue for netCDF files #367
MathewBiddle
started this conversation in
General
Replies: 1 comment
-
|
I think an important thing to discuss as well is who uses ArchiveADataset and does it have a place in a new workflow (assuming we provide something like the proposed .ncNCEI or find another alternative) or can we deprecate/stop supporting ArchiveADataset? I do think can cachefromurl cover the use case is a very important question. If not, is there an improvement we can make so it could cover the use case? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Mainly described in this conversation ioos/ioos-atn-data#28. Is there a way to use ERDDAP as a platform to submit (or generate) packages for NCEI long term archival?
Problem
When downloading a netCDF file from ERDDAP we run into a file fixity problem. Essentially every time a user goes to an ERDDAP endpoint and downloads a netCDF file (preferred format for archival at NCEI), the resultant file will always have a different checksum. Even if the data contents are the same.
It's been stated that this is because some of the metadata ERDDAP included in the resultant file is dynamically generated by ERDDAP (e.g.
:historyattribute containing information about when the file was downloaded) and thus gets changed every time someone downloads the dataset, resulting in a different checksum. Even though the data don't actually change.For NCEI to understand when files change, they need some certainty that a change in checksum is an actual change to data or some other part of the data that requires an update to the archive.
Discussion
The proposal discussed in ioos/ioos-atn-data#28 suggests that NCEI downloads a temporary version of the file without the metadata (.jsonl) and computes checksums on that, then pulls nc files if there are changes. That's a lot of overhead for NCEI to manage.
Axiom has built a packager for building ready to archive packages for NCEI from ERDDAP datasets. However, it uses the .ncCFMA endpoint. So, we will run into the file fixity issue.
Would it be possible to add another output format, lets say .ncNCEI, that outputs the file without updating the
:historyglobal attribute?Is there another approach we are not considering here? (thinking about ERDDAP's capability to copy data with
cachefromurl, maybe NCEI stands up an ERDDAP and registers approved datasets to archive in there? Then, the files copied could be archived. Just thinking out loud.)cc: @relphj, @apkrelling, @ksauby, @ChrisJohnNOAA
Beta Was this translation helpful? Give feedback.
All reactions