Skip to content

Commit

Permalink
Integrating Li Kui's 6/4 feedback notes. Text clarified, typos fixed,…
Browse files Browse the repository at this point in the history
… additional headings added, sidebar items refined, skeleton of spatial module drafted
  • Loading branch information
njlyon0 committed Jun 6, 2024
1 parent 28942ef commit 2c5ad50
Show file tree
Hide file tree
Showing 14 changed files with 104 additions and 17 deletions.
4 changes: 2 additions & 2 deletions _freeze/mod_data-viz/execute-results/html.json

Large diffs are not rendered by default.

Binary file modified _freeze/mod_data-viz/figure-html/multi-modal-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _freeze/mod_spatial/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"hash": "bf53f688e30e538d8f9679505425afd4",
"hash": "a4eb01ff58493741083fae5f9ee93b5f",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: \"Working with Spatial Data\"\ncode-annotations: hover\n---\n\n\n## Overview\n\nUnder Construction\n\n## Learning Objectives\n\nAfter completing this topic you will be able to: \n\n- <u>Define</u> characteristics of common types of spatial data\n- <u>Manipulate</u> spatial data with R\n- <u>Integrate</u> spatial data with tabular data\n\n## Needed Packages\n\nIf you'd like to follow along with the code chunks included throughout this module, you'll need to install the following packages:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Note that these lines only need to be run once per computer\n## So you can skip this step if you've installed these before\ninstall.packages(\"tidyverse\")\n```\n:::\n\n\n## Module Content\n\n\n\n## Additional Spatial Resources\n\n### Papers & Documents\n\n- \n\n### Workshops & Courses\n\n- The Carpentries' [Introduction to Geospatial Raster & Vector Data with R](https://datacarpentry.org/r-raster-vector-geospatial/)\n- The Carpentries' [Introduction to R for Geospatial Data](https://datacarpentry.org/r-intro-geospatial/index.html)\n- Arctic Data Center's [Spatial and Image Data Using GeoPandas](https://learning.nceas.ucsb.edu/2023-03-arctic/sections/geopandas.html) chapter of their Scalable Computing course\n- Jason Flower's (UC Santa Barbara) [Introduction to rasters with `terra`](https://jflowernet.github.io/intro-terra-ecodatascience/)\n\n### Websites\n\n- NASA's <u>App</u>lication for <u>E</u>xtracting and <u>E</u>xploring <u>A</u>nalysis <u>R</u>eady <u>S</u>amples [(AppEEARS) Portal](https://appeears.earthdatacloud.nasa.gov/)\n",
"markdown": "---\ntitle: \"Working with Spatial Data\"\ncode-annotations: hover\n---\n\n\n## Overview\n\nUnder Construction\n\n## Learning Objectives\n\nAfter completing this topic you will be able to: \n\n- <u>Define</u> characteristics of common types of spatial data\n- <u>Manipulate</u> spatial data with R\n- <u>Integrate</u> spatial data with tabular data\n\n## Needed Packages\n\nIf you'd like to follow along with the code chunks included throughout this module, you'll need to install the following packages:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Note that these lines only need to be run once per computer\n## So you can skip this step if you've installed these before\ninstall.packages(\"tidyverse\")\n```\n:::\n\n\n## Raster versus Vector Data\n\n\n\n## Coordinate Reference Systems\n\n\n\n## Making Maps\n\n\n\n## Extracting Spatial Data\n\n\n\n## Additional Spatial Resources\n\n### Papers & Documents\n\n- \n\n### Workshops & Courses\n\n- The Carpentries' [Introduction to Geospatial Raster & Vector Data with R](https://datacarpentry.org/r-raster-vector-geospatial/)\n- The Carpentries' [Introduction to R for Geospatial Data](https://datacarpentry.org/r-intro-geospatial/index.html)\n- Arctic Data Center's [Spatial and Image Data Using GeoPandas](https://learning.nceas.ucsb.edu/2023-03-arctic/sections/geopandas.html) chapter of their Scalable Computing course\n- Jason Flower's (UC Santa Barbara) [Introduction to rasters with `terra`](https://jflowernet.github.io/intro-terra-ecodatascience/)\n\n### Websites\n\n- NASA's <u>App</u>lication for <u>E</u>xtracting and <u>E</u>xploring <u>A</u>nalysis <u>R</u>eady <u>S</u>amples [(AppEEARS) Portal](https://appeears.earthdatacloud.nasa.gov/)\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
Expand Down
4 changes: 2 additions & 2 deletions _freeze/mod_stats/execute-results/html.json

Large diffs are not rendered by default.

Binary file modified _freeze/mod_stats/figure-html/mem-explore-graph-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ website:
href: mod_stats.qmd
- section: "Phase IV -- Magnify"
contents:
- text: "Next Steps & Proposals"
- text: "Next Steps"
href: mod_next-steps.qmd
- section: "Phase V -- Share"
contents:
Expand Down
10 changes: 10 additions & 0 deletions mod_data-viz.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,16 @@ ggplot() +

:::

## Ordination

If you are working with multivariate data (i.e., data where multiple columns are all response variables collectively) you may find ordination helpful. Ordination is the general term for many types of multivariate visualization but typically is used to refer to visualizing a distance or dissimiliarity measure of the data. Such measures collapse all of those columns of response variables into fewer (typically two) index values that are easier to visualize. Common examples of this include <u>P</u>rincipal <u>C</u>omponents <u>A</u>nalysis (PCA), <u>N</u>on-<u>M</u>etric <u>M</u>ultidimensional <u>S</u>caling (NMS / NMDS), or <u>P</u>rincipal <u>Co</u>ordinates <u>A</u>nalysis (PCoA / "metric multidimensional scaling").

## Maps

You may find it valuable to create a map as an additional way of visualizing data. Many synthesis groups do this--particularly when there is a strong spatial component to the research questions and/or hypotheses.

Check out the [bonus spatial data module](https://lter.github.io/ssecr/mod_spatial.html) for more information on map-making if this is of interest!

## Additional Resources

### Papers & Documents
Expand Down
10 changes: 9 additions & 1 deletion mod_findings.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,18 @@ After completing this module you will be able to:
- <u>Determine</u> audience motivations and interest
- <u>Translate</u> communication into various formats based on efficacy with target group

## Module Content
## Effective Communication



## Publishing Code, Data, and Results




## Data Management Plans


## Additional Resources

### Papers & Documents
Expand Down
2 changes: 1 addition & 1 deletion mod_next-steps.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Next Steps & Proposal Writing"
title: "Next Steps & Logic Models"
---

## Overview
Expand Down
4 changes: 3 additions & 1 deletion mod_reproducibility.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ Here are some rules to keep in mind as you decide how to organize your project:

Keeping all inputs, outputs, and documentation in a single folder makes it easier to collaborate and share all project materials. Also, most programming applications (RStudio, VS Code, etc.) work best when all needed files are in the same folder.

Note that <u>how you define "projct" may affect the number of folders you need</u>! Some synthesis projects may separate data harmonization into its own project while for others that same effort might not warrant being considered as a separate project. Similarly, you may want to make a separate folder for each manuscript your group plans on writing so that the code for each paper is kept separate.

2. **Organize content with sub-folders**

Putting files that share a purpose or source into logical sub-folders is a great idea! This makes it easy to figure out where to put new content and reduces the effort of documenting project organization. Don't feel like you need to use an intricate web of sub-folders either! Just one level of sub-folders is enough for many projects.
Expand Down Expand Up @@ -256,7 +258,7 @@ library(dplyr); library(magrittr); library(ggplot2)
. . .
```

In R the semicolon allows you to put multiple code operations in the same line of the script. Listing the needed libraries in this way thus lets everyone reading the code know exactly which packages they will need to have installed.
In R the semicolon allows you to put multiple code operations in the same line of the script. Listing the needed libraries in this way cuts down on the number of lines while still being precise about which packages are needed in the script.

If you are feeling generous you could use the [`librarian` R package](https://cran.r-project.org/web/packages/librarian/index.html) to install packages that are not yet installed and simultaneously load all needed libraries. Note that users would still need to install librarian itself but this at least limits possible errors to one location. This is done like so:

Expand Down
14 changes: 13 additions & 1 deletion mod_spatial.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,19 @@ If you'd like to follow along with the code chunks included throughout this modu
install.packages("tidyverse")
```

## Module Content
## Raster versus Vector Data



## Coordinate Reference Systems



## Making Maps



## Extracting Spatial Data



Expand Down
7 changes: 6 additions & 1 deletion mod_stats.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,12 @@ code-annotations: hover

## Overview

Given the wide range in statistical training in graduate curricula (and corresponding breadth of experience among early career researchers), we'll be approaching this module in a different way than the others. One half of the module will use a "flipped approach" where project teams will share their proposed analyses with one another. The other half of the module will be dedicated to analyses that are more common in--or exclusive to--synthesis research. Content produced by project teams during the flipped half may be linked in the '[Additional Resources](https://lter.github.io/ssecr/mod_stats.html#additional-resources)' section at the bottom of this module at the discretion of each team. Otherwise the content of this module will focus only on the non-flipped content.
Given the wide range in statistical training in graduate curricula (and corresponding breadth of experience among early career researchers), we'll be approaching this module by splitting it into two halves.

1. First half: a "flipped approach" where project teams will share their proposed analyses with one another
2. Second half: typical instructional module dedicated to **analyses that are more common in--or exclusive to--synthesis research**.

Content produced by project teams during the flipped half may be linked in the '[Additional Resources](https://lter.github.io/ssecr/mod_stats.html#additional-resources)' section at the bottom of this module at the discretion of each team. Otherwise the content of this module will focus only on the non-flipped content.

## Learning Objectives

Expand Down
58 changes: 54 additions & 4 deletions mod_version-control.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,70 @@ After completing this module you will be able to:
- <u>Sketch</u> the RStudio-to-GitHub order of operations
- <u>Use</u> RStudio, Git, and GitHub to collaborate with version control

## Module Content
## NCEAS SciComp Workshop Materials

The workshop materials we will be working through live [here](https://nceas.github.io/scicomp-workshop-collaborative-coding/) but for convenience we have also embedded the workshop directly into the SSECR course website (see below).

```{=html}
<iframe src="https://nceas.github.io/scicomp-workshop-collaborative-coding/" height="550" width="900" style="border: 1px solid #2e3846;"></iframe>
<iframe src="https://nceas.github.io/scicomp-workshop-collaborative-coding/" height="550" width="800" style="border: 1px solid #2e3846;"></iframe>
```

## Collaborating with Git

It is important to remember that while Git is a phenomenal tool for collaboration, it is _not_ Google Docs! <u>You can work together but you cannot work simultaneously in the same files</u>. Working at the same time is how merge conflicts happen which can be a huge pain to untangle after the fact. Fortunately, avoiding merge conflicts is relatively simple! Here are a few strategies for avoiding conflicts.

:::{.panel-tabset}
## Separate Scripts

At it's simplest, you can make a separate script for each group member and have each of you work _exclusively_ in your own script. If no one ever works in your script you will never have a merge conflict even if you are working in your script at the same time as someone else is working in theirs.

You can do this by all working on separate scripts that are trying to do the same thing or you can delegate a particular script in the workflow to a single person (e.g., one person is the only one allowed to edit the 'data wrangling' script, another is the only one allowed to edit the 'analysis' script, etc.)

**Recommendation: Worth Discussing!**

## Work in Shifts

You might also decide to work together on the same scripts and just stagger the time that you are doing stuff so that all of your changes are made, committed, and pushed before the next person begins work. This is a particularly nice option if you have people in different time zones because someone in Maine can work on code likely before another team member living in Oregon has even woken up much less started working on code.

For this to work _you will need to communicate extensively with the rest of your team_ so that you are absolutely sure that you won't start working before someone else has finished their edits.

**Recommendation: Worth Discussing!**

## Work in Forks

GitHub does offer a "fork" feature where people can make a copy of a given repository that they then 'own'. Forks are connected to the source repository and you can open a pull request to get the edits from one fork into the source repository.

This may sound like a perfect fit for collaboration but in reality it introduces significant hurdles! Consider the following:

1. It is difficult to know where the "best" version of the code lives

It is equally likely for the primary code version to be in any group member's fork (or the original fork). So if you want to re-run a set of analyses you'll need to hunt down which fork the current script lives in rather than consulting a single repository in which you all work together.

2. You essentially gaurantee significant merge conflicts

If everyone is working independently and submitting pull requests to merge back into the main repository you all but ensure that people will make different edits that GitHub then doesn't know how to resolve. The pull request will tell you that there are merge conflicts but you still need to fix them yourself--and now that fixing effort must be done in someone else's fork of the repository.

3. It's not the intended use of GitHub forks

Forks are intended for when you want to take a set of code and then "go your own way" with that code base. While there is a mechanism for contributing those edits back to the main repository it's really better used when you never intend to do a pull request and thus don't have to worry about eventual merge conflicts. A good example here is you might attend a workshop and decide to offer a similar workshop yourself. You could then fork the original workshop's repository to serve as a starting point for your version and save yourself from unnecessary labor. It would be bizarre for you to suggest that your workshop should _replace_ the original one even if did begin with that content.

**Recommendation: Don't Do This**

## Single Code Author

You may be tempted to just delegate all code editing to a single person in the group. While this strategy does guarantee that there will never be a merge conflict it is also deeply inequitable as it places an unfair share of the labor of the project on one person.

Practically-speaking this also encourages an atmosphere where only one person can even _read_ your group's code. This makes it difficult for other group members to contribute and ultimately may cause your group to 'miss out on' novel insights.

**Recommendation: Don't Do This**
:::

## Additional Resources

### Papers & Documents

- [Not Just for Programmers: How GitHub can Accelerate Collaborative and Reproducible Research in Ecology and Evolution](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.14108). Pereira Braga _et al._, 2023. **Methods in Ecology and Evolution**
- [Git Cheat Sheet](https://education.github.com/git-cheat-sheet-education.pdf). GitHub
- Pereira Braga _et al._, [Not Just for Programmers: How GitHub can Accelerate Collaborative and Reproducible Research in Ecology and Evolution](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.14108). **2023**. _Methods in Ecology and Evolution_
- GitHub, [Git Cheat Sheet](https://education.github.com/git-cheat-sheet-education.pdf). **2023**.

### Workshops & Courses

Expand Down
2 changes: 1 addition & 1 deletion policy_ai.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

Artificial intelligence (AI) tools are increasingly well-known and widely discussed in the context of data science. AI products can increase the efficiency of code writing and are becoming a common part of the data science landscape. For the purposes of this course, we **strongly recommend that you _do not_ use AI tools**. There is an under-discussed ethical consideration to the use and training of these tools in addition to their known practical limitations. However, the main reason we suggest you not use them for this class though is that leaning too heavily upon AI tools is likely to negatively impact your learning and skill acquisition.
Artificial intelligence (AI) tools are increasingly well-known and widely discussed in the context of data science. AI products can increase the efficiency of code writing and are becoming a common part of the data science landscape. For the purposes of this course, we **strongly recommend that you _do not_ use AI tools to write code**. There is an under-discussed ethical consideration to the use and training of these tools in addition to their known practical limitations. However, the main reason we suggest you not use them for this class though is that leaning too heavily upon AI tools is likely to negatively impact your learning and skill acquisition.

You may have prior experience with some of the quantitative skills this course aims to teach but others are likely new to you. During the first steps of learning any new skill, it can be really helpful to struggle a bit in solving problems. Your efforts now will help refine your troubleshooting skills and will likely make it easier to remember how you solved a given problem the next time it arises. Over-use of AI tools can short circuit this pathway to mastery. Once you have become a proficient coder, you will be better able to identify and avoid any distortions or assumptions introduced by relying on AI.

Expand Down

0 comments on commit 2c5ad50

Please sign in to comment.