Skip to content
srldl edited this page Sep 17, 2021 · 80 revisions

Standards for data input

  1. Use "." as decimal separator for numbers.
  2. Use Svalbard placenames (check correct spelling) whenever it applies.
  3. Write in English unless Norwegian is specifically demanded.
  4. Use the following template for titles if possible: <species abbr 4 letters>, f.ex. "KS_COEI_MILKYS_2019_Kongsfjorden"
  5. For additional parameters, please check parameter list and reuse parameter names if possible.
  6. Use underscore ("_") to connect words in title and for additional parameters. Use small letters throughout (except placenames and title).
  7. Only use additional parameters if you think it is experimental data unlikely to be copied by others. If so, please check the parameter list and reuse parameter names whereever possible.
  8. Please create metadata for the dataset. You can do this here: https://data.npolar.no/dataset/ Under links, please include the link for the fieldwork, address similar to https://data.npolar.no/ecotox/fieldwork//edit f.ex. https://data.npolar.no/ecotox/fieldwork/3a7ce1ca-cae2-4e0e-a2ec-423e9017a752/edit
  9. Please include reference. Although this will have to be added later, it is informative to connect a dataset with publications.
  10. Units. The database does not have own columns for units - hence use the following units:

volume - liter latitude,longitude - decimal degrees weight - kg girth - cm? length - cm? tarsus - cm? bill - cm? bill height - cm? scull - cm? wing - cm? egg length - cm, mm? egg width - blubber thickness back - tusk volume - tusk length - tusk girth - caudal length -

How to fill in the ecotox database

  1. Find your list of fields variables. Go to https://data.npolar.no/ecotox and choose "template" in the upper left corner. Name the template:

    <your initials>_<species>_<project name or description>

    Use underscore to separate words f.ex. KS_GLGU_SEATRACK_2019_Kongsfjorden

    Create a new template based on the variables needed. Check off the parameters in the list of predefined variables first. Save the result and check that select alternatives for your selected fields are included. See image below.

    Go through previously created additional fields

    Also, for parameters placename, latitude and longitude it is possible to use autocomplete from Svalbard placenames if selected.

    At the bottom of the page is the additional variables others have chosen. Please go through the list and use the same variable names if you need them. If a variable name occur often in excel sheets, the technical staff should be contacted to include it in the list of predefined variables.

    Control select alternatives

    Finally add your own additional variables as needed.

  2. Register the dataset at https://data.npolar.no/dataset if not already described. Describe the variables here if not self explanatory.

  3. Mostly cruise-based fieldwork: Fill in expedition information (parent of dataset) in https://data.npolar.no/expedition to combine your data with other researcher's data taken at the same time and place.

How to get data back

Via mapview

Looking for a map presentation or want to look at the original Excel files? Try mapview: https://data.npolar.no/mapview/

Via the API, https://api.npolar.no (R language)

#First we fetch the files from the API:
#Receive as JSON file
> library(jsonlite)
> fieldwork_json = fromJSON("https://api-test.data.npolar.no/ecotox/fieldwork/?q=&filter-matrix=egg&format=json&limit=all")
> lab_json = fromJSON("https://api-test.data.npolar.no/lab/ecotox/?q=&filter-matrix=egg&fields=id,database_sample_id,analyte,analyte_value,fat_percentage,unit&format=json&limit=all")

#Go down some levels in the list structure
#Result is one long column only with all entries applied.
#Now we have two lists converted to data frames
> fieldwork_df = fieldwork_json$feed$entries
> lab_df = lab_json$feed$entries

#To view the table headers
> names(lab_df)
To view the whole table (note the big V in View!)
> View(fieldwork_df)

#If the result will contain a lot of rows, it is important to reduce the 
#number of columns as much as possible.
#Try to remove columns one by one:
> fieldwork_df$schema <- NULL
#or pick the columns to keep:
> keeps_field <- c("people_responsible","NPI_sample_id","reference", "placename","latitude","longitude","species","event_date","id","age","species_identification","comment","project_group","egg_width","length","weight","bill_height","station_name")
> fieldwork_write= fieldwork_df[ , keeps_field, drop = FALSE]

#Only three columns can be used to pivot the analytes. Usually use at least
#"analyte_value", "analyte","database_sample_id", but more can be added
#like "fat_percentage", "unit" etc.
> keeps_lab <- c("analyte_value", "analyte","database_sample_id","unit","id")
> lab_write = lab_df[ , keeps_lab, drop = FALSE]
#Pivot the analytes from the y-axis to the x-axis
> library(tidyr)
> lab_pivot = reshape(lab_write, direction = "wide", idvar = "database_sample_id", timevar = "analyte")

#Rename fieldwork_write column id to database_sample_id so R can merge besed on this column
> library(tidyverse)
> fieldwork_col_conv = fieldwork_write %>% rename (database_sample_id = id)

#To avoid spending too long time in merging, I found it wise to sort the two
#database_sample_id columns
> lab_merge = lab_pivot[order(lab_pivot$database_sample_id),]
> fieldwork_merge = fieldwork_col_conv[order(fieldwork_col_conv$database_sample_id),]

#Try to merge the two datasets unless the data frames are too big.
#If so it must either be split in two or do a manually merge.
> merged = merge(fieldwork_merge, lab_merge, all=TRUE) 

#Finally, create the excel sheet - file here is named "file.xslx"
#Note that with larger datasets your PC may run out of memory.
> library(xlsx)
> write.xlsx(merged, 'file.xlsx', sheetName = "Sheet1", col.names = TRUE, row.names = TRUE, append = FALSE)

#If you run out of memory, try to save as a csv file instead.
#Afterwards you can import it into Excel
> write.csv(merged,"file.txt", row.names = FALSE)

#The file can be found in your working directory, if uncertain where use
> getwd()

Via the API v2, https://v2-api.npolar.no (R language)

#First we fetch the files from the API:
#Receive as JSON file
> library(jsonlite)
> fieldwork_json = fromJSON("https://api-test.data.npolar.no/ecotox/fieldwork/?q=&filter-matrix=egg&format=json&limit=all")
> lab_json = fromJSON("https://api-test.data.npolar.no/lab/ecotox/?q=&filter-matrix=egg&fields=id,database_sample_id,analyte,analyte_value,fat_percentage,unit&format=json&limit=all")
...

How to get the https links

In order to get the data you will need the https link from the fieldwork database and the ecotox lab database along with the search you are looking for. Below are some example on how to construct these links.

Ecotox fieldwork database, search for glaucous gulls and matrix plasma:

https://v2-api.npolar.no/biology/fielddata/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma

https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma

Note that in order to get the data, you must be logged in. A way or seeing that you get something back could be to use curl from the command line (must be installed):

curl GET "https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma" -u your.email.address@npolar.no

Seabird observations database is an open database (no login required) so in contrast to ecotox/fieldwork you can experiment with how to create valid searches. F.ex. search for colony Zukovskijfjella, counting from boat:

https://v2-api.npolar.no/colony/seabird/_all_/observation/_search?and=dynamicProperties.colonyName:Zukovskijfjella&and=dynamicProperties.platform:boat

Seabird observations database, search for counting from boat and only for the species black guillemot:

https://v2-api.npolar.no/colony/seabird/_all_/observation/_search?and=dynamicProperties.platform:boat&and=scientificName:Cepphus+grylle

How to carry out the search (very technical):

https://v2-api.npolar.no/colony/seabird/_all_/observation/_doc

How to find which database parameter to use? JSON schemas for fieldwork,ecotox and seabird observation databases (technical):

https://gitlab.npolar.no/eds/other/v2-jsonschema/-/blob/development/biology/fielddata/fielddata.v1.0.5.json

https://gitlab.npolar.no/eds/other/v2-jsonschema/-/blob/development/biology/fielddata/ecotox.v1.0.5.json

https://gitlab.npolar.no/eds/other/v2-jsonschema/-/blob/development/colony/seabird/colony.v0.1.0.json

https://gitlab.npolar.no/eds/other/v2-jsonschema/-/blob/development/colony/seabird/colony_observation.v0.1.0.json

Alternatively, find the parameters directly from the databases, at http://hanna.npolar.no:5601

Please note that all our biology databases adhere to the Darwin Core standard.

Importing a CSV file into Excel

Finally, when you have the CSV extracted from R, conversion to Excel is simple: See https://github.com/npolar/npdc-sighting/wiki/Import-from-CSV-to-Excel

Clone this wiki locally