Modify workflow to allow individual contributions of Tier 1 #117

ksonda · 2022-04-21T14:53:26Z

There may be an upcoming activity prioritizing the harvesting of Tier 1 boundaries from the remaining "Very Large" systems, and these should be able to be integrated without too much fanfare into the existing workflow.

A proposal:

Set up /contributions-tier1/{state}subdirectories.
Authorized contributors will place in each folder individual {st}{pwsid}.geojson files of Tier 1 boundaries
Add or Modify src/transformers/states/transform_wsb_{st}.R as appropriate to merge in these new Tier 1 boundaries prior to the match and modeling steps

The text was updated successfully, but these errors were encountered:

jess-goddard · 2022-04-30T16:14:33Z

Thanks for this suggestion, @ksonda.

I agree fully with points 2-3: having individual state transformer files from new Tier 1 boundaries will be critical. This should be easy to incorporate as new data becomes available. Given that you're suggesting individual pwsid boundaries rather than state level, I think your suggestion that we modify the original state transformer to incorporate the one-off pwsid boundaries is straightforward and should be implemented when we have that data. Detailed commenting and a modified developer guide can support this change pretty seamlessly.

My recommendation for point 1 is that, rather than have subdirectories that maintain the data on github, we request states to host their own FTP or Drive folder or site where we can pull data from a reliable/ maintained URL. This is the current work-flow arrangement, where all incoming data is brought in from upstream sources. This ensures 1) upstream data has a clear/reproducible source; 2) there are not conflicts between github data and state/agency maintained data as changes happen over time; and 3) we do not risk hitting file size limits of github (100MB–unlikely for individual pwsids, but I could easily see a state offering a smaller subset of data with many pwsids).

In short, the repository is designed to ingest/transform/load, but not store and maintain external data–which is a formidable task to do well to ensure the data remains current and accessible beyond the repository.

ksonda · 2022-04-30T19:17:40Z

Thanks @jess-goddard. I see that there are good reasons to separate this repository from data storage. Regarding "states host their own FTP" recommendation, I agree fully with that for large aggregations that might be made available by more states. The issue is that in the short term there will likely be an EPIC-led activity to source the ~200ish 'very large' systems directly from the relevant utilities, which are generally in states that do not currently have any kind of boundary collection program.

This process will require some way to provide for a publicly visible submission/ version tracking mechanism of its own, to be transparent about which individual boundaries were submitted by whom with what underlying source, so that the data can be folded over and replaced by state sources if and when that is appropriate. GitHub is as good an option as any at this scale, since

The submission mechanism will only operate for this collection of individual boundaries over a small period of time
GitHub size limits will not be relevant for individual boundaries
EPIC does not want to be in the business of maintaining whatever boundaries do need to be web accessible in state-based FeatureCollections from something like a Sharepoint/GDrive/Dropbox for a long period of time.

Perhaps EPIC and I need to coordinate creating a separate repo that has this directory structure. Then steps 2-3 can be implemented against those URLs

jess-goddard · 2022-05-09T13:38:59Z

@ksonda Yes I see the value here in what you're suggesting!

I like the idea of modularizing the data uploads to a small repo just for that purpose, but we can also discuss offline the pros/cons of keeping it separate from here. Let's connect when I'm back in office May 17

ksonda · 2022-05-25T18:14:52Z

I've mocked something up here https://github.com/cgs-earth/national-cws-boundary-update

ksonda · 2022-08-03T14:35:25Z

We have a contribution workflow set up here now https://github.com/cgs-earth/ref_pws

It generates/updates a geopackage here anytime a contribution is made https://www.hydroshare.org/resource/c9d8a6a6d87d4a39a4f05af8ef7675ad/data/contents/contributed_pws.gpkg

If this is of interest to ping

jess-goddard · 2022-08-03T19:12:22Z

@ksonda great we have it on our agenda to connect with you this month about an integration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify workflow to allow individual contributions of Tier 1 #117

Modify workflow to allow individual contributions of Tier 1 #117

ksonda commented Apr 21, 2022 •

edited

Loading

jess-goddard commented Apr 30, 2022

ksonda commented Apr 30, 2022 •

edited

Loading

jess-goddard commented May 9, 2022

ksonda commented May 25, 2022 •

edited

Loading

ksonda commented Aug 3, 2022

jess-goddard commented Aug 3, 2022

Modify workflow to allow individual contributions of Tier 1 #117

Modify workflow to allow individual contributions of Tier 1 #117

Comments

ksonda commented Apr 21, 2022 • edited Loading

jess-goddard commented Apr 30, 2022

ksonda commented Apr 30, 2022 • edited Loading

jess-goddard commented May 9, 2022

ksonda commented May 25, 2022 • edited Loading

ksonda commented Aug 3, 2022

jess-goddard commented Aug 3, 2022

ksonda commented Apr 21, 2022 •

edited

Loading

ksonda commented Apr 30, 2022 •

edited

Loading

ksonda commented May 25, 2022 •

edited

Loading