Looking to 6.0 - potential big change: nuke and rebuild the repo #2657
Replies: 2 comments
-
Ok, well, it turns out we are way closer to this limit than I thought, and it's breaking 5.0 as I type this. So, likely I'll be moving forward with this later today for 5.0 (like it or not!). Just to recap:
I'm going to wait on moving the shapefiles into only one zip, as this took me by surprise and I'll need to test those solutions fairly carefully before doing anything. It will likely also take me a little while to get the PR submission, bot and isssues moved over. But I want to get the core stuff back online today. Upside: it should only take about an hour, because 80% of the processing time for everything is dealing with massive git commits due to large binaries in the legacy versions! Clarification Edit: My plan is to --mirror this repo over to another, and then do a history clear of this repository. So, with luck I won't have to re-initialize a new repo, so we can keep discussion, configs, etc. But we'll see :). Core objective is ensuring we have all the key infrastrucutre back up today. Dan |
Beta Was this translation helpful? Give feedback.
-
Just for the historic value, the total size of the repo before I am migrating it to LFS: Wow! |
Beta Was this translation helpful? Give feedback.
-
Posting this here as I look forward to 6.0 a bit (which I think will be hot on the heels of 5.0, as 5.0 was rather late at the end of last year).
The core gB repo is now approaching 100GB, which is the maximum allowable size on github. We currently use LFS for our larger files, but even with that our history (and many binary files) are starting to cause issues.
Ultimately, I'd like to make a few changes to future-proof gB, bring the repo size (way) down, and make sure that we don't hit limits that inhibit future updates. In that context, here is what I am thinking (comments welcome, as always!):
I'm only going to distribute shapefiles as *.zip - i.e., to get a shapefile, you'll have to download the geoBoundaries-USA-ADM1-all.zip file. This will save a lot of space, and most everyone using shapefiles has to get the *.zip anyway (as you need the *.shx, *.prj, etc.).
I'm going to move all release files to git LFS. This really needed to happen a while ago - right now it's a split case, where some files are on LFS and others the traditional git buckets.
I'm going to copy the current repository into a new repo (geoBoundariesArchive), and then copy only the current files (i.e., 6.0) into the geoBoundaries repo (i.e., this one). I'm going to do a bit of digging to see if I can do this while retaining issues/discussion - pretty sure the answer is yes, but if not I'll manually move things over.
As a part of all of this, I'd like to deprecate the gbID API (not the core API - this is the second, lesser-known API that lets you reference a file ID). It's almost never used, and tracking back through the revised commit hashes would be very difficult.
This should be fairly transparent to most folk, unless you're using a specific hash on the repo for something directly (i.e., not through the API).
Beta Was this translation helpful? Give feedback.
All reactions