-
Notifications
You must be signed in to change notification settings - Fork 7
/
zenodo_to_gbif.Rmd
118 lines (93 loc) · 2.75 KB
/
zenodo_to_gbif.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: "Prepare Movebank data for GBIF IPT upload (make Darwin Core)"
author:
- Peter Desmet
- Sanne Govaert
date: "`r Sys.Date()`"
output:
html_document
---
This script prepares Movebank bird tracking data for upload to a [GBIF IPT](https://gbif.org/ipt). It is designed for datasets deposited on Zenodo.
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```
Load libraries:
```{r}
library(magrittr)
library(here)
library(movepub) # devtools::install_github("inbo/movepub")
library(EML)
library(jsonlite)
```
## Set user-defined values
```{r}
project_id <- "O_WESTERSCHELDE"
versioned_doi <- "https://doi.org/10.5281/zenodo.5879096"
rights_holder <- "INBO"
contact <- person(
given = "Peter",
family = "Desmet",
email = "peter.desmet@inbo.be",
comment = c(ORCID = "0000-0002-8442-8025")
)
```
Create file paths:
```{r}
dir <- here::here("data", "processed", project_id, "dwc")
record_id <- gsub("https://doi.org/10.5281/zenodo.", "", versioned_doi)
zenodo_url <- paste0("https://zenodo.org/records/", record_id, "/export/json")
datapackage_url <- paste0("https://zenodo.org/records/", record_id, "/files/datapackage.json")
```
## Create and write EML
```{r}
eml <- movepub::write_eml(
doi = versioned_doi,
directory = dir,
contact = contact,
derived_paragraph = TRUE
)
```
Process EML:
```{r}
# Get description from Zenodo (with HTML, see https://github.com/inbo/movepub/issues/65)
zenodo <- jsonlite::read_json(zenodo_url)
description_full <- zenodo$metadata$description
```
```{r}
# Split description in paragraphs
paragraphs <- unlist(strsplit(description_full, "<p>|</p>|\n", perl = TRUE))
paragraphs <- paragraphs[paragraphs != ""]
# Keep desired paragraphs
# 1: overview of the study (wrapped in CDATA)
index_para1 <- grep(paste(project_id, "-"), paragraphs)[1]
paragraphs[index_para1] <- paste0("<![CDATA[", paragraphs[index_para1], "]]>")
# 2: Reference to paper
index_para2 <- grep("for a more detailed description of this dataset", paragraphs)
# 3: Acknowledgements
index_para3 <- grep("Acknowledgements", paragraphs) + 1
# 4: derived paragraph from write_eml()
derived_paragraph <- tail(eml$dataset$abstract$para, 1)
# Combine and update EML
description <- paragraphs[c(index_para1, index_para2, index_para3)]
eml$dataset$abstract$para <- c(description, derived_paragraph)
```
Write EML (again):
```{r}
EML::write_eml(eml, file = file.path(dir, "eml.xml"))
```
## Create and write Darwin Core
Read package:
```{r}
package <- movepub::read_package(datapackage_url)
```
Write files:
```{r}
movepub::write_dwc(
package = package,
directory = dir,
dataset_id = package$id,
dataset_name = eml$dataset$title,
license = toupper(eml$dataset$intellectualRights$para),
rights_holder = rights_holder
)
```