-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert IOPAN protist data from "total database" 2009-2013 into Darwin Core #44
Comments
Documentation of transform, with input/output example. Input ~/npolar/marine-db$ cat data/deposit/iopan/protist-biodiversity/total_database_npi2009-2013.tsv| ./bin/csv-transform --ndjson | ndjson-filter 'd.name === "MOSJ13" && d.no==="1670" && d.data==="2013-07-29" && d.takson==="Thalassiosira pacifica"' {
"name": "MOSJ13",
"no": "1670",
"station ": "R10",
"depth [m]": "0",
"data": "2013-07-29",
"V-taken [ml]": "10",
"Vth filtered [L]": "32",
"V bottle [ml]": "100",
"Class/Phylum": "Bacillariophyceae ",
"takson": "Thalassiosira pacifica",
"takson_add": "",
"Taxon_full": "Thalassiosira pacifica ",
"AphiaID": "",
"K": "450.02",
"N": "6",
"fields": "60",
"magn": "10",
"cells in chamb": "45.002",
"cells in V bottle [ml]": "450.02",
"cells in 1000 ml": "14.063125",
"Gear": "Micro"
} Output ~/npolar/marine-db$ cat data/input/iopan/2009-2010-2012-2013-protist-biodiversity-iopan.ndjson | ndjson-filter 'd.fieldNumber==="MOSJ13-1670" && d.scientificName==="Thalassiosira pacifica"' {
"maximumDepthInMeters": 0,
"magnification": 10,
"identifiedBy": "iopan.pl",
"organismQuantityType": "cells/l",
"scientificName": "Thalassiosira pacifica",
"materialSampleID": "MOSJ13-1670@MOSJ2013",
"year": 2013,
"expedition": "MOSJ2013",
"locationID": "R10",
"fieldNumber": "MOSJ13-1670",
"basisOfRecord": "Occurrence",
"organismQuantity": 14.063125,
"individualCount": 6,
"sampleSizeValue": 0.42664770454646467,
"sampleSizeUnit": "l",
"occurrenceStatus": "present",
"quantificationStatus": "verified",
"fieldsInCount": 60,
"maxFields": 450.02,
"takenVolume": 10,
"bottleVolume": 100,
"initialVolume": 32,
"cellsInChamber": 45.002,
"gear": "Niskin bottle"
} |
The input Gear is a mixed bag:
After:
|
Passes GBIF validataion Taxon match higherrank: 4617 |
No errors in quantification ~/npolar/marine-db$ cat data/input/iopan/2009-2010-2012-2013-protist-biodiversity-iopan.ndjson | ndjson-map [d.gear,d.quantificationStatus] | sort | uniq -c
715 ["Handnet","incalculable"]
20 ["Handnet","verified"]
22 ["Niskin bottle","calculated"]
13 ["Niskin bottle","incalculable"]
12677 ["Niskin bottle","verified"] |
These occurences must be merged with sampling event metadata.
|
Puh, not as bad, the actual missing samples of those 489 lines above are just 20: ~/npolar/marine-db$ ndjson-join --left d.fieldNumber data/input/iopan/2009-2010-2012-2013-protist-biodiversity-iopan.ndjson $events | ndjson-filter 'd[1]===null' | ndjson-map d[0] | ndjson-map '[d.expedition,d.fieldNumber]' | sort | uniq -c
34 ["ICE2010","ICE10-152"]
32 ["ICE2010","ICE10-155"]
16 ["ICE2010","ICE10-156"]
13 ["ICE2010","ICE10-157"]
13 ["ICE2010","ICE10-158"]
18 ["ICE2010","ICE10-253"]
27 ["ICE2010","ICE10-379"]
36 ["ICE2010","ICE10-380"]
38 ["ICE2010","ICE10-381"]
46 ["ICE2010","ICE10-382"]
21 ["ICE2010","ICE10-383"]
15 ["ICE2010","ICE10-384"]
30 ["ICE2012","Agneta"]
30 ["ICE2012","Divehole"]
14 ["ICE2012","ICE12-760"]
15 ["ICE2012","ICE12-822"]
25 ["ICE2012","ICE12-Core2.1.1"]
27 ["ICE2012","ICE12-Core2.1.2"]
20 ["ICE2012","Pond"]
19 ["ICE2012","Ridge"] |
XY could be resurrected by matching on locationID, eg. all ICE10-15x, are from "R4", and all other R4 from ICE10 are [22.1166, 80.605]
|
Found an alternative source of ICE10 data. Source: |
Alternate source of ICE2012 (2670 lines of data vs 2579 in "total database" :/)
|
Consider swapping in ICE2012 from
|
Checking 2009 against alt.
|
The "total database" contains:
The following are excluded, since there are alternate sources with more data.
After removal, we are left with:
The text was updated successfully, but these errors were encountered: