-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify workflow to allow individual contributions of Tier 1 #117
Comments
Thanks for this suggestion, @ksonda. I agree fully with points 2-3: having individual state transformer files from new Tier 1 boundaries will be critical. This should be easy to incorporate as new data becomes available. Given that you're suggesting individual pwsid boundaries rather than state level, I think your suggestion that we modify the original state transformer to incorporate the one-off pwsid boundaries is straightforward and should be implemented when we have that data. Detailed commenting and a modified developer guide can support this change pretty seamlessly. My recommendation for point 1 is that, rather than have subdirectories that maintain the data on github, we request states to host their own FTP or Drive folder or site where we can pull data from a reliable/ maintained URL. This is the current work-flow arrangement, where all incoming data is brought in from upstream sources. This ensures 1) upstream data has a clear/reproducible source; 2) there are not conflicts between github data and state/agency maintained data as changes happen over time; and 3) we do not risk hitting file size limits of github (100MB–unlikely for individual pwsids, but I could easily see a state offering a smaller subset of data with many pwsids). In short, the repository is designed to ingest/transform/load, but not store and maintain external data–which is a formidable task to do well to ensure the data remains current and accessible beyond the repository. |
Thanks @jess-goddard. I see that there are good reasons to separate this repository from data storage. Regarding "states host their own FTP" recommendation, I agree fully with that for large aggregations that might be made available by more states. The issue is that in the short term there will likely be an EPIC-led activity to source the ~200ish 'very large' systems directly from the relevant utilities, which are generally in states that do not currently have any kind of boundary collection program. This process will require some way to provide for a publicly visible submission/ version tracking mechanism of its own, to be transparent about which individual boundaries were submitted by whom with what underlying source, so that the data can be folded over and replaced by state sources if and when that is appropriate. GitHub is as good an option as any at this scale, since
Perhaps EPIC and I need to coordinate creating a separate repo that has this directory structure. Then steps 2-3 can be implemented against those URLs |
@ksonda Yes I see the value here in what you're suggesting! I like the idea of modularizing the data uploads to a small repo just for that purpose, but we can also discuss offline the pros/cons of keeping it separate from here. Let's connect when I'm back in office May 17 |
I've mocked something up here https://github.com/cgs-earth/national-cws-boundary-update |
We have a contribution workflow set up here now https://github.com/cgs-earth/ref_pws It generates/updates a geopackage here anytime a contribution is made https://www.hydroshare.org/resource/c9d8a6a6d87d4a39a4f05af8ef7675ad/data/contents/contributed_pws.gpkg If this is of interest to ping |
@ksonda great we have it on our agenda to connect with you this month about an integration |
There may be an upcoming activity prioritizing the harvesting of Tier 1 boundaries from the remaining "Very Large" systems, and these should be able to be integrated without too much fanfare into the existing workflow.
A proposal:
/contributions-tier1/{state}
subdirectories.{st}{pwsid}.geojson
files of Tier 1 boundariessrc/transformers/states/transform_wsb_{st}.R
as appropriate to merge in these new Tier 1 boundaries prior to the match and modeling stepsThe text was updated successfully, but these errors were encountered: