Skip to content

Latest commit

 

History

History
12 lines (5 loc) · 1.02 KB

README.md

File metadata and controls

12 lines (5 loc) · 1.02 KB

The repository contains the source code and dataset to reproduce the parallel computing exercise described in the paper:

Time Series Econometrics at Scale - A Practical Guide to Parallel Computing in (Py)Spark

Abstract

This paper provides a practical programming guide to setting up a minimum working example of a distributed system for parallel time series analysis. The system is built in Apache Spark on top of Amazon's Hadoop-based service Elastic MapReduce (EMR). A simple forecasting exercise with 1,000 time series illustrates the proposed parallelization scheme, which reduces total runtime performance by about 95% relative to a single-core, single-machine setting. The ease of implementing this scheme makes this guide a useful reference for econometricians with a limited background in parallel programming. To facilitate reproducibility of the practical steps in this guide, the PySpark/Python code is available for download on github.

Link to the paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3226976