This package provides you access to DBnomics data series. DBnomics is an open-source project with the goal of aggregating the world's economic data in one location, free of charge to the public. DBnomics covers hundreds of millions of series from international and national institutions (Eurostat, World Bank, IMF, ...).
To use this package, you have to provide the codes of the provider, dataset and series you want. You can retrieve them directly on the website.
To install DBnomics.jl
, go to the package manager with ]
:
add DBnomics
or install the github version with :
add https://github.com/s915/DBnomics.jl
All the functions, and their names, are derived from the R package rdbnomics which I also maintain.
Fetch time series by ids
:
# Fetch one series from dataset 'Unemployment rate' (ZUTN) of AMECO provider :
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN");
# Fetch two series from dataset 'Unemployment rate' (ZUTN) of AMECO provider :
df2 = rdb(ids = ["AMECO/ZUTN/EA19.1.0.0.0.ZUTN", "AMECO/ZUTN/DNK.1.0.0.0.ZUTN"]);
# Fetch two series from different datasets of different providers :
df3 = rdb(ids = ["AMECO/ZUTN/EA19.1.0.0.0.ZUTN", "IMF/BOP/A.FR.BCA_BP6_EUR"]);
In the event that you only use the argument ids
, you can drop it and run :
df1 = rdb("AMECO/ZUTN/EA19.1.0.0.0.ZUTN");
Fetch time series by mask
:
# Fetch one series from dataset 'Balance of Payments' (BOP) of IMF :
df1 = rdb("IMF", "BOP", mask = "A.FR.BCA_BP6_EUR");
# Fetch two series from dataset 'Balance of Payments' (BOP) of IMF :
df2 = rdb("IMF", "BOP", mask = "A.FR+ES.BCA_BP6_EUR");
# Fetch all series along one dimension from dataset 'Balance of Payments' (BOP) of IMF :
df3 = rdb("IMF", "BOP", mask = "A..BCA_BP6_EUR");
# Fetch series along multiple dimensions from dataset 'Balance of Payments' (BOP) of IMF :
df4 = rdb("IMF", "BOP", mask = "A.FR.BCA_BP6_EUR+IA_BP6_EUR");
In the event that you only use the arguments provider_code
, dataset_code
and mask
, you can drop the name mask
and run :
df1 = rdb("IMF", "BOP", "A.FR.BCA_BP6_EUR");
Fetch time series by dimensions
:
# Fetch one value of one dimension from dataset 'Unemployment rate' (ZUTN) of AMECO provider :
df1 = rdb("AMECO", "ZUTN", dimensions = Dict(:geo => "ea12"));
# or
df1 = rdb("AMECO", "ZUTN", dimensions = (geo = "ea12",));
# Fetch two values of one dimension from dataset 'Unemployment rate' (ZUTN) of AMECO provider :
df2 = rdb("AMECO", "ZUTN", dimensions = Dict(:geo => ["ea12", "dnk"]));
# or
df2 = rdb("AMECO", "ZUTN", dimensions = (geo = ["ea12", "dnk"],));
# Fetch several values of several dimensions from dataset 'Doing business' (DBS) of World Bank :
df3 = rdb("WB", "DBS", dimensions = Dict(:country => ["DZA", "PER"], :frequency => ["A"], :indicator => ["ENF.CONT.COEN.COST.ZS", "IC.REG.COST.PC.FE.ZS"]));
# or
df3 = rdb("WB", "DBS", dimensions = (country = ["DZA", "PER"], frequency = ["A"], indicator = ["ENF.CONT.COEN.COST.ZS", "IC.REG.COST.PC.FE.ZS"]));
Fetch time series with a query
:
# Fetch one series from dataset 'WEO:2019-10 by countries' (WEO:2019-10) of IMF provider:
df1 = rdb("IMF", "WEO:2019-10", query = "France current account balance percent");
# Fetch series from dataset 'WEO:2019-10 by countries' (WEO:2019-10) of IMF provider:
df2 = rdb("IMF", "WEO:2019-10", query = "current account balance percent");
Fetch one series from the dataset 'Doing Business' of WB provider with the link:
df1 = rdb(api_link = "https://api.db.nomics.world/v22/series/WB/DBS?dimensions=%7B%22country%22%3A%5B%22FRA%22%2C%22ITA%22%2C%22ESP%22%5D%7D&q=IC.REG.PROC.FE.NO&observations=1&format=json&align_periods=1&offset=0&facets=0");
In the event that you only use the argument api_link
, you can drop the name and run:
df1 = rdb("https://api.db.nomics.world/v22/series/WB/DBS?dimensions=%7B%22country%22%3A%5B%22FRA%22%2C%22ITA%22%2C%22ESP%22%5D%7D&q=IC.REG.PROC.FE.NO&observations=1&format=json&align_periods=1&offset=0&facets=0");
Fetch the available datasets of a provider
# Example with the IMF datasets:
df_datasets = rdb_datasets("IMF");
# Example with the IMF and BDF datasets:
df_datasets = rdb_datasets(["IMF", "BDF"]);
In the event that you only request the datasets for one provider, if you define
simplify = true
, then the result will be a DataFrame
not a Dict
.
df_datasets = rdb_datasets("IMF", simplify = true);
Fetch the possible dimensions of available datasets of a provider
# Example for the dataset WEO:2019-10 of the IMF:
df_dimensions = rdb_dimensions("IMF", "WEO:2019-10");
In the event that you only request the dimensions for one dataset for one
provider, if you define simplify = true
, then the result will be a Dict
of
DataFrame
s not a nested Dict
.
df_dimensions = rdb_dimensions("IMF", "WEO:2019-10", simplify = true);
Fetch the number of series of available datasets of a provider
# Example for the dataset WEOAGG:2019-10 of the IMF:
df_series = rdb_series("IMF", "WEOAGG:2019-10");
# With dimensions
df_series = rdb_series("IMF", "WEO:2019-10", dimensions = Dict(Symbol("weo-country") => "AGO"));
df_series = rdb_series("IMF", "WEO:2019-10", dimensions = Dict(Symbol("weo-subject") => "NGDP_RPCH"), simplify = true);
# With a query
df_series = rdb_series("IMF", "WEO:2019-10", query = "ARE");
df_series = rdb_series("IMF", ["WEO:2019-10", "WEOAGG:2019-10"], query = "NGDP_RPCH");
When using the functions rdb
or rdb_...
, if you come across an error concerning your internet connection, you can get round this situation by :
-
configuring curl of the function
HTTP.get
orHTTP.post
to use a specific and authorized proxy. -
using the functions
readlines
anddownload
if you have problem withHTTP.get
.
In DBnomics.jl, by default the function HTTP.get
or HTTP.post
are used to fetch the data. If a specific proxy must be used, it is possible to define it permanently with the package global variable curl_config
or on the fly through the argument curl_config
. In that way the object is passed to the keyword arguments of the function HTTP.get
or HTTP.post
.
To see the available parameters, visit the website https://curl.haxx.se/libcurl/c/curl_easy_setopt.html.
Once they are chosen, you define the curl object as follows :
h = Dict(:proxy => "http://<proxy>:<port>");
Regarding the functioning of HTTP.jl, you might need to modify another option to change the db/editor.nomics.world url from https:// to http:// (see JuliaWeb/HTTP.jl#390) :
DBnomics.options("secure", false);
The curl connection can be set up for a session by modifying the following package option :
DBnomics.options("curl_config", h);
After configuration, just use the standard functions of DBnomics.jl e.g. :
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN");
This option of the package can be disabled with :
DBnomics.options("curl_config", nothing);
If a complete configuration is not needed but just an "on the fly" execution, then use the argument curl_config
of the functions rdb
and rdb_...
:
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN", curl_config = h);
To retrieve the data DBnomics.jl can also use the standard functions readlines
and download
.
To activate this feature for a session, you need to enable an option of the package :
DBnomics.options("use_readlines", true);
And then use the standard function as follows :
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN");
This configuration can be disabled with :
DBnomics.options("use_readlines", false);
If you just want to do it once, you may use the argument use_readlines
of the functions rdb
and rdb_...
:
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN", use_readlines = true);
The DBnomics.jl package can interact with the Time Series Editor of DBnomics to transform time series by applying filters to them.
Available filters are listed on the filters page https://editor.nomics.world/filters.
Here is an example of how to proceed to interpolate two annual time series with a monthly frequency, using a spline interpolation:
filters = Dict(:code => "interpolate", :parameters => Dict(:frequency => "monthly", :method => "spline"));
df = rdb(ids = ["AMECO/ZUTN/EA19.1.0.0.0.ZUTN", "AMECO/ZUTN/DNK.1.0.0.0.ZUTN"], filters = filters);
If you want to apply more than one filter, the filters
argument will be a Tuple of valid filters:
filter1 = Dict(:code => "interpolate", :parameters => Dict(:frequency => "monthly", :method => "spline"));
filter2 = Dict(:code => "aggregate", :parameters => Dict(:frequency => "bi-annual", :method => "end_of_period"));
filters = (filter1, filter2);
df = rdb(ids = ["AMECO/ZUTN/EA19.1.0.0.0.ZUTN", "AMECO/ZUTN/DNK.1.0.0.0.ZUTN"], filters = filters);
The DataFrame
columns change a little bit when filters are used. There are two new columns:
period_middle_day
: the middle day oforiginal_period
(can be useful when you compare graphically interpolated series and original ones).filtered
(boolean):true
if the series is filtered,false
otherwise.
The content of two columns are modified:
series_code
: same as before for original series, but the suffix_filtered
is added for filtered series.series_name
: same as before for original series, but the suffix(filtered)
is added for filtered series.
For some analysis, it is more convenient to have a TimeArray
object instead of a DataFrame
object. To transform
it, you can use the following functions :
using DBnomics
using DataFrames
using TimeSeries
function to_namedtuples(x::DataFrames.DataFrame)
nm = names(x)
vl = try
[x[!, col] for col in names(x)]
catch
[x[:, col] for col in names(x)]
end
nm = tuple(Symbol.(nm)...)
vl = tuple(vl...)
NamedTuple{nm}(vl)
end
function to_timeseries(
x::DataFrames.DataFrame,
index = :period, variable = :series_code, value = :value
)
x = unstack(x, index, variable, value)
x = to_namedtuples(x)
x = TimeArray(x, timestamp = index)
x
end
rdb("IMF", "BOP", mask = "A.FR+ES.BCA_BP6_EUR")
#> 162×18 DataFrame. Omitted printing of 12 columns
#> │ Row │ @frequency │ FREQ │ Frequency │ INDICATOR │ Indicator │ REF_AREA │
#> │ │ String │ String │ String │ String │ String │ String │
#> ├─────┼────────────┼────────┼───────────┼─────────────┼────────────────────────────────────┼──────────┤
#> │ 1 │ annual │ A │ Annual │ BCA_BP6_EUR │ Current Account, Total, Net, Euros │ ES │
#> │ 2 │ annual │ A │ Annual │ BCA_BP6_EUR │ Current Account, Total, Net, Euros │ ES │
#> │ ... │ ... │ ... │ ... │ ... │ ... │ ... │
#> │ 161 │ annual │ A │ Annual │ BCA_BP6_EUR │ Current Account, Total, Net, Euros │ FR │
#> │ 162 │ annual │ A │ Annual │ BCA_BP6_EUR │ Current Account, Total, Net, Euros │ FR │
to_timeseries(rdb("IMF", "BOP", mask = "A.FR+ES.BCA_BP6_EUR"))
#> 81×2 TimeArray{Union{Missing, Float64},2,Date,Array{Union{Missing, Float64},2}} 1940-01-01 to 2020-01-01
#> │ │ A.ES.BCA_BP6_EUR │ A.FR.BCA_BP6_EUR │
#> ├────────────┼──────────────────┼──────────────────┤
#> │ 1940-01-01 │ missing │ missing │
#> │ 1941-01-01 │ missing │ missing │
#> │ 1942-01-01 │ missing │ missing │
#> │ ... │ ... │ ... │
#> │ 2019-01-01 │ 24899.0 │ -16239.4 │
#> │ 2020-01-01 │ missing │ missing │
Visit https://db.nomics.world/ !