You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be very useful to get an estimate of the total size of the target dataset produced by a recipe in GB / TB. For example, this information could be used by bakery managers to decide whether to accept a dataset into their storage.
Here are some different ways we could do this without actually running the whole recipe.
Create a test version of the recipe (see Testing / profiling recipes #97) and examine the total size of the test target. Scale up based on the "pruning factor" (what fraction of the full data did the test dataset pull).
Go through each file in the recipe's FilePattern and inspect its size. Sum to get an estimated size. Only works for static file inputs (not APIs like OPeNDAP). May not accurately reflect target size if there is lots of processing involved.
Randomly sample files from the FilePattern and scale up.
The text was updated successfully, but these errors were encountered:
Create a test version of the recipe (see Testing / profiling recipes #97) and examine the total size of the test target. Scale up based on the "pruning factor" (what fraction of the full data did the test dataset pull).
Are there known reasons why this is not the obvious best direction to pursue? It seems to dovetail nicely with other objectives, and should be relatively accurate, assuming the as-yet-unimplemented prune method referenced in pangeo-forge/staged-recipes#28 (comment) is "prune factor"-aware.
It would be very useful to get an estimate of the total size of the target dataset produced by a recipe in GB / TB. For example, this information could be used by bakery managers to decide whether to accept a dataset into their storage.
Here are some different ways we could do this without actually running the whole recipe.
The text was updated successfully, but these errors were encountered: