This repo contains data types (e.g. ”Swedish municipality”), with value types (e.g. ”string”), allowed values, names lookup tables for alternative names, and definitions. It is used by the Statscraper repo, as a semi standardized ontology for scrapers.
All datatypes are listed in /datatypes.csv
. Allowed values are in the /values
folder, organized in further subfolders by domain.
datatypes.csv
contains for each datatype:
id
: A unique id. We use human readable id's.description
: Should include a definitionvalue_type
:int
,float
,str
,date
orbool
allowed_values
: See below
Each data type can have one of the following value types:
int
– a value that can be parsed as an integerfloat
– a value that can be parsed as a floating-point numberstr
– a value that can be parsed as a string. Empty strings are considered null.date
– a ISO 8601 date, e.g.2016-07-05
,2016-07-05T13:00:00
,2016-W27
, or1981-04
.bool
– 1 for True and 0 False. Blank means null.
Some data types, and some metadata fields, have a predefined set of allowed values (such as “regions”). In some domains, allowed values may be organized in categories (such as “Swedish municipalities”, “Swedish counties”).
Allowed values are specified in csv files under the values
directory, optionlly structured in subfolders by domain, e.g.regions/sweden/municipalities.csv
. They are referenced like this: regions/sweden/municipalities
, and regions
. If there is a regions/
folder, there can not be a regions.csv
in the same directory.
The allowed values csv's contain:
id
: A unique id. We use human readable id's, e.g. "Stockholms kommun", not "0180"label
: An optional labeldialect:
~: Columns prefixed withdialect:
contain corresponding id's, e.g. names useed by major statistical providers, WikiData id's, etc