Skip to content

Fairhaven2022 LOGOFF and Git On

Alexandre Rademaker edited this page Jul 23, 2022 · 19 revisions

LOGOFF and Git On

Migrating Legacy Projects to Git Repositories

Moderator: Michael Goodman

Background

The long-running Oslo SVN server, including the LOGON repository, remains functional for the foreseeable future. However, support to active developer communities has been limited in recent years (e.g. user creation and management of access rights). Developer communities may consider migrating to other services. As DELPH-IN has largely moved to GitHub, this is the default choice, however individual maintainers may choose to find their own hosting arrangement. There was a similar discussion in the previous year (see VirtualInfrastructure), which covered many more topics, such as the migration of the wiki (which is complete, or very nearly so), and the mailing lists (which are now inaccessible; communication now happens on GitHub and the DELPH-IN Discourse site). This discussion is about code migration. For currently inactive projects, or ones that do not require much support, Stephan Oepen expects to maintain the SVN infrastructure at Oslo for years to come, if in part for archival purposes and to keep valid download addresses and URLs that have been published.

Identifying Projects

http://svn.delph-in.net/

Others?

Candidates:

  • REPP (MWG will move)
  • NorSource
  • Non-FOS LKB (Jon will talk to oe)
  • SRG (Olga will ask Montse, strong intention to move)
  • Jigsaw (unclear; ask Yi)
  • GG (ask Berthold)
  • HaG (ask Berthold)
  • BURGER (Francis will ask Petya)
  • KRG (Francis will ask Sanghoun)
  • Emily to move current transfer matrix stuff (ace-enabled) from UW-local to GitHub
  • SDSU TADM -- ask if anyone is using

Identifying Authors

$ svn log --quiet http://svn.delph-in.net/ \
  | grep '^r[0-9]' | cut -d'|' -f2 | sed -e 's/^ *//' -e 's/ *$//' \
  | sort | uniq -c | sort -nk1 \
  > authors.txt

(email addresses have been partially redacted)

Commit Count SVN User GitHub Username
2 adolphs
2 alex
2 andreku
2 eric-n
2 linghelp@
2 root
3 sshieber
3 tbaldwin
3 uc
4 ezra99
4 kiefer
5 kordoni
5 rpearah@
5 uschaefer
6 simoes
7 jbernd
8 dag
9 ccb
9 gslayden@ glenn-slayden
17 jbeavers
18 ericn
18 olasba
19 cj
20 frermann
22 bender emilymbender
22 biehl
22 gisle
22 test
22 tobiasvl
31 bart
33 murhaff
39 liljao
42 ebender@ emilymbender
49 ilianas
50 dan
50 sweaglesw sweaglesw
55 rdrid becdridan
56 mingwen
63 petterha
71 marsuk
75 fettig
78 gisley
78 jread
79 johanbev
82 yzhang
86 sanghoun
98 montse
101 brodbd
109 bond
117 milen
127 erikve
131 j.a.carroll@ john-a-carroll
142 lluisp
152 angelii
158 rdridan becdridan
210 emanuel
215 sweagles sweaglesw
235 malouf
242 arnskj
266 crysmann
280 johnca john-a-carroll
905 bmw
948 aac
1068 danf
8333 oe
14205 (no author)

oe: I am surprised at the large number of anonymous commits – how is that possible?

Migrating From Subversion To Git

The git svn tool (https://git-scm.com/docs/git-svn) does a reasonably good job of importing Subversion repositories, including converting conventional branches/ and tags/ subdirectories into the appropriate Git structures:

$ git svn clone http://svn.delph-in.net/... --stdlayout --authors-file=authors.txt

I have some notes about doing this for ACE here: https://gist.github.com/goodmami/b2e70fe2fd47fb92bb27576d8c59f758

It's useful to map the SVN authors to GitHub @users.noreply.github.com email addresses so their personal emails are not exposed while still mapping to their GitHub profiles, if they exist.

Notes

The above SVN-to-Git import does not push the repositories to GitHub, so you will need to do this separately. You may also need to perform additional steps to convert tags to proper Git tags instead of branches.

GitHub, being a free service, has size limits on individual files and on full repositories (including the history). Individual files should be less than 50MB, and strictly less than 100MB. Repositories should aim to be less than 1GB. See https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github. In order to accommodate this, we may need to filter out files from larger projects. Some guidelines:

  • Avoid checking in binary files (.grm, .dat, etc.)
    • Binary files that are trivially reproducible should be excluded
    • Those that are hard to reproduce but change infrequently may be included, provided they are < 100MB
    • Consider creating a "release" on GitHub and attaching binaries to it
  • Very large repositories may need to be split into multiple

Backups

We believe that for codebases, the fact that people have things checked out locally will serve as a backup, but need other plans for things like the wiki, the discussions on the `participants' team.

GitHub suggests owners of organization can do a 'github migration' --- prepare to do that, get a full dump. This can be backed up. EMB worries this would be an expensive way to create backups to store on the patas cluster. Maybe better to create a cron job that does a `git pull' for the docs repo (aka the wiki) and whatever the appropriate API call is to pull the info on issues for some list of projects (at least Matrix, maybe orphan projects).

Clone this wiki locally