Workflow for organizing and projecting GDP (Y
), population (P
), capital stock (K
), and related variables for historical (1950-2020) and future (2010-2100) timelines
This version: last updated on March 30, 2022
This directory contains the data acquistion, clean-up, and projection notebook files to organize and project variables including GDP, GDP per capita (GDPpc), population, and capital stock for both historical (1950-2020) and future or projected (2010-2100) timelines. Many of the data sources used to generate historical and future panels have missing data, and therefore efforts were made to impute these missing data through either some form of extrapolation or other established methods. Also, efforts were made to keep the PPP and USD units consistent (e.g., constant 2019 PPP USD) across different sources having different vintages of PPP and USD units.
Below is a quick summary of what each file seeks to accomplish (where the header ypk
stands for "GDP, population, and capital stock").
ypk1_prep_clean.ipynb
: cleans up selected raw datasets requiring more attention than others to be consistent and workable with other datasets.ypk2_reorg_and_impute.ipynb
: reorganizes the raw and previously-cleaned historical datasets so that each variable considered has a single, consistent stream of values for each country. After this process, imputes missing GDPpc, GDP, and population values that might still be missing from the cleaned historical dataset.ypk3_demo_ratios_historical_reg.ipynb
: contains code to clean and extrapolate demographic (age-group) ratios and create the "demographic variables" necessary to conduct the "historical regression" (According to Higgins, 1998) of finding the relationship between investment-to-GDP ratio (I/Y ratio) and demographic variables, (relative) GDPpc, and GDPpc growth rate. Furthermore, the said historical regression is conducted to acquire estimates of investment-to-GDP ratios for missing country-years.ypk4_impute_hist_capital.ipynb
: contains code to use the historical and estimated investment-to-GDP ratios to create current-PPP investment values. These are used to replicate the initial-year capital stock estimation (country-by-country) as described in Inklaar, Woltjer, and Albarrán (2019). Also, the investment values are used in conjunction with GEG-15 and LitPop data sources to fill in missing values for the latter parts of the historical capital stock data. The end product is a filled (1950-2020) capital stock data for all relevant countries.ypk5_projected_yp.ipynb
: contains code to clean up GDP, GDPpc, and population for the future timeline, with some basic extrapolation conducted for countries with missing projections.ypk6_projected_capital.ipynb
: generates projections of capital stocks based on the Dellink et al. (2017) methodology.
For running these files, note that they have to be run consecutively (i.e., from ypk1~
to ypk7~
). Each notebook file contains basic descriptions on what each step does; in all cases, the cells must be run consecutively from top to bottom.
We describe below some key variables produced by the above process. Note that our naming conventions largely follow Penn World Table 10.0.
cgdpo_19
: Current PPP (purchasing power parity) GDP in millions of 2017 and 2019 USDcgdpo_pc_19
: Current PPP GDP per capita in ones of 2017 and 2019 USDrgdpna_19
: (National account-based) GDP in millions of constant 2019 PPP USDrgdpna_pc_19
: (National account-based) GDP per capita in ones of constant 2019 PPP USDcn_19
: Current PPP capital stock in millions of 2019 USDrnna_19
: Capital stock in millions of constant PPP 2019 USDpop
: Population in millions of peoplek_movable_ratio
: ratio movable capital out of total physical capital (values in )iy_ratio
: Investment-to-GDP ratiodelta
: Physical capital depreciation rate
Note that for GDP, GDP per capita, and capital stock variables, there are also versions with _17
at the end instead of _19
. For current PPP variables, this means using 2017 USD; for constant PPP variables, this means using constant 2017 PPP USD (i.e., constant PPP of 2017 and 2017 USD).
We import the SLIIDERS settings.py
as sset
, which can be done as follows:
from sliiders import as settings as sset
For the aggregate long-panel format historical and future timeline variables, you may refer to the following:
- Historical:
sset.DIR_YPK_FINAL / "gdp_gdppc_pop_capital_1950_2020.parquet"
- Future:
sset.DIR_YPK_FINAL / "gdp_gdppc_pop_capital_proj_2010_2100.parquet"
where the metadata (e.g., units and sources) are also attached to the respective files.
We elaborate on the regression involving investment-to-GDP ratios mentioned in Section A3.2 in the notebook ypk4_demo_ratios_historical_reg.ipynb
. The said notebook also contains information on how to derive each variable involved. We present the results below, where the dependent variable is investment-to-GDP ratio (denoted as in the notebook).