README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# corella <a href="https://corella.ala.org.au"><img src="man/figures/logo.png" align="right" height="139" alt="corella website" /></a>

<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/corella)](https://CRAN.R-project.org/package=corella)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->

## Overview

`corella` is an R package that helps users standardize their data using the 
[*Darwin Core*](https://dwc.tdwg.org) data standard, used for biodiversity data like species occurrences. `corella` provides tools to prepare, manipulate and validate data against the standard's criteria. Once standardized, data can be subsequently shared as a [*Darwin Core Archive*](https://ipt.gbif.org/manual/en/ipt/latest/dwca-guide#what-is-darwin-core-archive-dwc-a) and published to open data infrastructures like the [Atlas of Living Australia](https://www.ala.org.au) and [GBIF](https://www.gbif.org/). 

`corella` was built, and is maintained, by the [Science & Decision Support Team](https://labs.ala.org.au) at the [Atlas of Living Australia](https://www.ala.org.au) (ALA). It is named for the Little Corella ([_Cacatua sanguinea_](https://bie.ala.org.au/species/https%3A//biodiversity.org.au/afd/taxa/34b31e86-7ade-4cba-960f-82a6ae586206)). The logo was designed by [Dax Kellie](https://daxkellie.com/).

If you have any comments, questions or suggestions, please [contact us](mailto:support@ala.org.au).


## Installation

You can install the development version of `corella` from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("AtlasOfLivingAustralia/corella")
```

## Usage

Here we have a small sample of some example data. We'd like to convert our data to use Darwin Core standards.

```{r}
library(corella)
library(tibble)

# A simple example of species occurrence data
df <- tibble(
  species = c("Callocephalon fimbriatum", "Eolophus roseicapilla"),
  latitude = c(-35.310, "-35.273"), # deliberate error for demonstration purposes
  longitude = c(149.125, 149.133),
  eventDate = c("14-01-2023", "15-01-2023"),
  status = c("present", "present")
)

df
```

One of the most important aspects of Darwin Core standard is using standard column names (Darwin Core *terms*). We can update column names in our data to match Darwin Core terms with `set_` functions. 

Each `set_` function name corresponds to the type of data, and argument names correspond to the available Darwin Core terms to use as column names. `set_` functions support data wrangling operations & `dplyr::mutate()` functionality, meaning columns can be changed or fixed in your pipe. `set_` functions will indicate if anything needs fixing because they also automatically run checks on each column data to make sure each column is in the correct format.

```{r}
suppressMessages( # for readability

df |>
  set_coordinates(
    decimalLatitude = as.numeric(latitude), # fix latitude
    decimalLongitude = longitude
    ) |>
  set_scientific_name(
    scientificName = species
    ) |>
  set_datetime(
    eventDate = lubridate::dmy(eventDate) # specify date format
    ) |>
  set_occurrences(occurrenceStatus = status)

)
```

Not sure where to start? Use `suggest_workflow()` to know what steps you need to make to make your data Darwin Core compliant.

```{r}
df |> 
  suggest_workflow()
```

Or, if your data is nearly ready and you want to run checks over all columns that match Darwin Core terms, run `check_dataset()`. `check_dataset()` checks all columns with valid Darwin Core terms as column names.

```{r}
df |>
  check_dataset()
```


## Citing corella

To generate a citation for the package version you are using, you can
run:

``` r
citation(package = "corella")
```

The current recommended citation is:

> Kellie D, Balasubramaniam S & Westgate MJ (2024) corella:
> Tools to standardize biodiversity data to Darwin Core. R Package version
> 0.1.0.