-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
113 lines (81 loc) · 4.19 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# corella <a href="https://corella.ala.org.au"><img src="man/figures/logo.png" align="right" height="139" alt="corella website" /></a>
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/corella)](https://CRAN.R-project.org/package=corella)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->
## Overview
`corella` is an R package that helps users standardize their data using the
[*Darwin Core*](https://dwc.tdwg.org) data standard, used for biodiversity data like species occurrences. `corella` provides tools to prepare, manipulate and validate data against the standard's criteria. Once standardized, data can be subsequently shared as a [*Darwin Core Archive*](https://ipt.gbif.org/manual/en/ipt/latest/dwca-guide#what-is-darwin-core-archive-dwc-a) and published to open data infrastructures like the [Atlas of Living Australia](https://www.ala.org.au) and [GBIF](https://www.gbif.org/).
`corella` was built, and is maintained, by the [Science & Decision Support Team](https://labs.ala.org.au) at the [Atlas of Living Australia](https://www.ala.org.au) (ALA). It is named for the Little Corella ([_Cacatua sanguinea_](https://bie.ala.org.au/species/https%3A//biodiversity.org.au/afd/taxa/34b31e86-7ade-4cba-960f-82a6ae586206)). The logo was designed by [Dax Kellie](https://daxkellie.com/).
If you have any comments, questions or suggestions, please [contact us](mailto:support@ala.org.au).
## Installation
You can install the development version of `corella` from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("AtlasOfLivingAustralia/corella")
```
## Usage
Here we have a small sample of some example data. We'd like to convert our data to use Darwin Core standards.
```{r}
library(corella)
library(tibble)
# A simple example of species occurrence data
df <- tibble(
species = c("Callocephalon fimbriatum", "Eolophus roseicapilla"),
latitude = c(-35.310, "-35.273"), # deliberate error for demonstration purposes
longitude = c(149.125, 149.133),
eventDate = c("14-01-2023", "15-01-2023"),
status = c("present", "present")
)
df
```
One of the most important aspects of Darwin Core standard is using standard column names (Darwin Core *terms*). We can update column names in our data to match Darwin Core terms with `set_` functions.
Each `set_` function name corresponds to the type of data, and argument names correspond to the available Darwin Core terms to use as column names. `set_` functions support data wrangling operations & `dplyr::mutate()` functionality, meaning columns can be changed or fixed in your pipe. `set_` functions will indicate if anything needs fixing because they also automatically run checks on each column data to make sure each column is in the correct format.
```{r}
suppressMessages( # for readability
df |>
set_coordinates(
decimalLatitude = as.numeric(latitude), # fix latitude
decimalLongitude = longitude
) |>
set_scientific_name(
scientificName = species
) |>
set_datetime(
eventDate = lubridate::dmy(eventDate) # specify date format
) |>
set_occurrences(occurrenceStatus = status)
)
```
Not sure where to start? Use `suggest_workflow()` to know what steps you need to make to make your data Darwin Core compliant.
```{r}
df |>
suggest_workflow()
```
Or, if your data is nearly ready and you want to run checks over all columns that match Darwin Core terms, run `check_dataset()`. `check_dataset()` checks all columns with valid Darwin Core terms as column names.
```{r}
df |>
check_dataset()
```
## Citing corella
To generate a citation for the package version you are using, you can
run:
``` r
citation(package = "corella")
```
The current recommended citation is:
> Kellie D, Balasubramaniam S & Westgate MJ (2024) corella:
> Tools to standardize biodiversity data to Darwin Core. R Package version
> 0.1.0.