-
Notifications
You must be signed in to change notification settings - Fork 0
/
mod_version-control.qmd
100 lines (57 loc) · 7.56 KB
/
mod_version-control.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: "Version Control"
---
## Overview
Synthesis projects require an intensely collaborative team spirit complemented by an emphasis on quantitative rigor. While other modules focus on elements of each of these, version control is a useful tool that lives at their intersection. We will shortly embark on a deep dive into version control but broadly it is a method of tracking changes to files (especially code "scripts") that lends itself remarkably well to coding in groups. The LTER Network Office has a [Scientific Computing Team](https://lter.github.io/scicomp/) that developed a robust workshop-style set of materials aimed at teaching version control to LTER-funded synthesis working groups. Rather than re-invent these materials for the purposes of SSECR, we will work through parts of these workshop materials.
## Learning Objectives
After completing this module you will be able to:
- <u>Describe</u> how version control facilitates code collaboration
- <u>Navigate</u> GitHub via a web browser
- <u>Create</u> and edit a repository through GitHub
- <u>Define</u> fundamental git vocabulary
- <u>Sketch</u> the RStudio-to-GitHub order of operations
- <u>Use</u> RStudio, Git, and GitHub to collaborate with version control
## Preparation
Complete the ["Workshop Preparation" steps](https://lter.github.io/workshop-github/#workshop-preparation) identified in the linked workshop (see below).
## LTER SciComp Workshop Materials
The workshop materials we will be working through live [here](https://lter.github.io/workshop-github/) but for convenience we have also embedded the workshop directly into the SSECR course website (see below).
```{=html}
<iframe src="https://lter.github.io/workshop-github/" height="550" width="800" style="border: 1px solid #2e3846;"></iframe>
```
## Collaborating with Git
It is important to remember that while Git is a phenomenal tool for collaboration, it is _not_ Google Docs! <u>You can work together but you cannot work simultaneously in the same files</u>. Working at the same time is how merge conflicts happen which can be a huge pain to untangle after the fact. Fortunately, avoiding merge conflicts is relatively simple! Here are a few strategies for avoiding conflicts.
:::{.panel-tabset}
## Separate Scripts
At it's simplest, you can make a separate script for each group member and have each of you work _exclusively_ in your own script. If no one ever works in your script you will never have a merge conflict even if you are working in your script at the same time as someone else is working in theirs.
You can do this by all working on separate scripts that are trying to do the same thing or you can delegate a particular script in the workflow to a single person (e.g., one person is the only one allowed to edit the 'data wrangling' script, another is the only one allowed to edit the 'analysis' script, etc.)
**Recommendation: Worth Discussing!**
## Separate Shifts
You might also decide to work together on the same scripts and just stagger the time that you are doing stuff so that all of your changes are made, committed, and pushed before the next person begins work. This is a particularly nice option if you have people in different time zones because someone in Maine can work on code likely before another team member living in Oregon has even woken up much less started working on code.
For this to work _you will need to communicate extensively with the rest of your team_ so that you are absolutely sure that you won't start working before someone else has finished their edits.
**Recommendation: Worth Discussing!**
## Forks
GitHub does offer a "fork" feature where people can make a copy of a given repository that they then 'own'. Forks are connected to the source repository and you can open a pull request to get the edits from one fork into the source repository.
This may sound like a perfect fit for collaboration but in reality it introduces significant hurdles! Consider the following:
1. It is difficult to know where the "best" version of the code lives
It is equally likely for the primary code version to be in any group member's fork (or the original fork). So if you want to re-run a set of analyses you'll need to hunt down which fork the current script lives in rather than consulting a single repository in which you all work together.
2. You essentially gaurantee significant merge conflicts
If everyone is working independently and submitting pull requests to merge back into the main repository you all but ensure that people will make different edits that GitHub then doesn't know how to resolve. The pull request will tell you that there are merge conflicts but you still need to fix them yourself--and now that fixing effort must be done in someone else's fork of the repository.
3. It's not the intended use of GitHub forks
Forks are intended for when you want to take a set of code and then "go your own way" with that code base. While there is a mechanism for contributing those edits back to the main repository it's really better used when you never intend to do a pull request and thus don't have to worry about eventual merge conflicts. A good example here is you might attend a workshop and decide to offer a similar workshop yourself. You could then fork the original workshop's repository to serve as a starting point for your version and save yourself from unnecessary labor. It would be bizarre for you to suggest that your workshop should _replace_ the original one even if did begin with that content.
**Recommendation: Don't Do This**
## Branches
GitHub also offers a "branch" feature which is similar to forks in some ways. Branches create parallel workspaces within a single repository as opposed to forks that create a copy of a repository under a different user.
These have the same hurdles as forks so check out the first two points in the "Work in Forks" tab. Also, just like forks, this isn't how branches were meant to be used either! Branches exist so that you can leave some version of the code untouched while simultaneously developing some improvement in a branch. That way the user experiences a seamless upgrade while still allowing you to have a messy development period. Branches _are not_ intended for multiple people to be working on the same things at the same time and merge conflicts are the likely outcome of using branches in this way.
**Recommendation: Don't Do This**
## Single Author
You may be tempted to just delegate all code editing to a single person in the group. While this strategy does guarantee that there will never be a merge conflict it is also deeply inequitable as it places an unfair share of the labor of the project on one person.
Practically-speaking this also encourages an atmosphere where only one person can even _read_ your group's code. This makes it difficult for other group members to contribute and ultimately may cause your group to 'miss out on' novel insights.
**Recommendation: Don't Do This**
:::
## Additional Resources
### Papers & Documents
- Pereira Braga, P.H. _et al._ [Not Just for Programmers: How GitHub can Accelerate Collaborative and Reproducible Research in Ecology and Evolution](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.14108). **2023**. _Methods in Ecology and Evolution_
- GitHub. [Git Cheat Sheet](https://education.github.com/git-cheat-sheet-education.pdf). **2023**.
### Workshops & Courses
- Bryan J. _et al._ [Happy Git and GitHub for the useR](https://happygitwithr.com/). **2024**.
- National Center for Ecological Analysis and Synthesis (NCEAS) Learning Hub. [coreR: Intro to Git and GitHub](https://learning.nceas.ucsb.edu/2023-10-coreR/session_06.html). **2023**.