Skip to content
cgendreau edited this page Jan 8, 2013 · 7 revisions

The goal of data processing is to help users to find the most accurate results. The original data from the provider will not be overwritten since we only use the computed data to ease and enhance the search functionality. The processing is based on the narwhal-processor library.

Scientific name

The scientific name will be processed using a library from eCat, a tool developed by GBIF. When possible, the authorship will be separated from the scientific name, and kept in a separate field. The specie name will also be computed from the scientific name.

Country

The country will be processed using the library gbif-parsers from GBIF. The processing will try to match the country with the official name through a controlled list. This list also includes the most common misspellings.

Canadian province

The state/province will be processed using a library from Canadensys based on the gbif-parsers. The processing will try to match the province with the official name through a controlled list. This list also includes the most common misspellings. This processing will only be applied if the country is set to Canada.

Date

The event date will be processed with a combination of Canadensys's library and the ThreeTen library. The processing will try to standardize the data by splitting it into year/month/day in order to support partial date.

Latitude/Longitude

The decimallatitude/decimallongitude will be processed using a Canadensys's library. The processing will make sure the coordinates are valid numbers. It will not validate the coordinates with the other fields of the record. This means that a point in Australia marked in Canada will be left in the system for the moment.