Skip to content

Fairhaven2022 LOGOFF and Git On

Glenn Slayden edited this page Jul 21, 2022 · 19 revisions

LOGOFF and Git On

Migrating Legacy Projects to Git Repositories

Moderator: Michael Goodman

Background

The long-running Oslo Subversion and related servers, including the LOGON repository, are being phased out. In order to preserve the projects and their history, we are looking for new hosts for the repositories. As DELPH-IN has largely moved to GitHub, this is the default choice, however individual maintainers may choose to find their own hosting arrangement. There was a similar discussion in the previous year (see VirtualInfrastructure), which covered many more topics, such as the migration of the wiki (which is complete, or very nearly so), and the mailing lists (which are now inaccessible; communication now happens on GitHub and the DELPH-IN Discourse site). This discussion is about code migration.

Identifying Projects

http://svn.delph-in.net/

Others?

Candidates:

  • REPP (MWG will move)
  • NorSource
  • Non-FOS LKBs (Jon will talk to oe)
  • SRG (Olga will ask Montse, strong intention to move)
  • wsi (+WeSearch?) (EMB will talk to oe)
  • Jigsaw (unclear; ask Yi)
  • GG (ask Berthold)
  • HaG (ask Berthold)
  • BURGER (ask Petya)
  • KRG (Francis will ask Sanghoun)
  • WeSearch?
  • WikiWoods?

Identifying Authors

$ svn log --quiet http://svn.delph-in.net/ \
  | grep '^r[0-9]' | cut -d'|' -f2 | sed -e 's/^ *//' -e 's/ *$//' \
  | sort | uniq -c | sort -nk1 \
  > authors.txt

(email addresses have been partially redacted)

Commit Count SVN User GitHub Username
2 adolphs
2 alex
2 andreku
2 eric-n
2 linghelp@
2 root
3 sshieber
3 tbaldwin
3 uc
4 ezra99
4 kiefer
5 kordoni
5 rpearah@
5 uschaefer
6 simoes
7 jbernd
8 dag
9 ccb
9 gslayden@ glenn-slayden
17 jbeavers
18 ericn
18 olasba
19 cj
20 frermann
22 bender
22 biehl
22 gisle
22 test
22 tobiasvl
31 bart
33 murhaff
39 liljao
42 ebender@
49 ilianas
50 dan
50 sweaglesw
55 rdrid
56 mingwen
63 petterha
71 marsuk
75 fettig
78 gisley
78 jread
79 johanbev
82 yzhang
86 sanghoun
98 montse
101 brodbd
109 bond
117 milen
127 erikve
131 j.a.carroll@ john-a-carroll
142 lluisp
152 angelii
158 rdridan
210 emanuel
215 sweagles sweaglesw
235 malouf
242 arnskj
266 crysmann
280 johnca john-a-carroll
905 bmw
948 aac
1068 danf
8333 oe
14205 (no author)

Migrating From Subversion To Git

The git svn tool (https://git-scm.com/docs/git-svn) does a reasonably good job of importing Subversion repositories, including converting conventional branches/ and tags/ subdirectories into the appropriate Git structures:

$ git svn clone http://svn.delph-in.net/... --stdlayout --authors-file=authors.txt

I have some notes about doing this for ACE here: https://gist.github.com/goodmami/b2e70fe2fd47fb92bb27576d8c59f758

It's useful to map the SVN authors to GitHub @users.noreply.github.com email addresses so their personal emails are not exposed while still mapping to their GitHub profiles, if they exist.

Notes

The above SVN-to-Git import does not push the repositories to GitHub, so you will need to do this separately. You may also need to perform additional steps to convert tags to proper Git tags instead of branches.

GitHub, being a free service, has size limits on individual files and on full repositories (including the history). Individual files should be less than 50MB, and strictly less than 100MB. Repositories should aim to be less than 1GB. See https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github. In order to accommodate this, we may need to filter out files from larger projects. Some guidelines:

  • Avoid checking in binary files (.grm, .dat, etc.)
    • Binary files that are trivially reproducible should be excluded
    • Those that are hard to reproduce but change infrequently may be included, provided they are < 100MB
    • Consider creating a "release" on GitHub and attaching binaries to it
  • Very large repositories may need to be split into multiple
Clone this wiki locally