This project involves processing Canadian Climate data by using a data engineer approach to download data from the Canadian Climate website and outputting one file.
The project requires downloading data from the Canadian Climate website, concatenating the downloaded files into one final CSV file named all_years.csv
, and uploading the scripts and the final output file to a GitHub repository.
-
Shell script:
toronto-climate-data-de.sh
This is used to control all operations, including data downloading, log setting, and running the Python script. -
Python script:
scripts/concat.py
This script is used to concatenate all the downloaded data into one file. -
Output file:
output/all_years.csv
This is the output file containing all concatenated downloads.
The program procedure involves the following steps:
- Download data with a shell command.
- Install the required python packages using
pipenv
- Concatenate data to one file with the Python script.
- Save output file in the Python script, with both
exec 1
(STDOUT) andexec 2
(STDERR) - Print out SUCCESS with a shell command.