This document details steps to reproducing the geohoods-to
dataset.
- Install Anaconda 3
- Run
bin/install
to create aconda
environment - Activate the
conda
environment (undertmp/
)
In Windows:
bin\install
bin\activate
In Linux/Mac OS:
source bin/install.sh
source bin/activate.sh
To reproduce the dataset:
- Run the jupyter notebook interface with
bin/notebook
- Open src/download.ipynb and run all cells (this downloads all the raw data)
- Open src/preprocess.ipynb and run all cells (this cleans the raw data)
- Open src/run.ipynb and run all cells (this produces the final datasets)
- A folder with the final data should be processed under the
tmp/dist
folder
In Windows:
bin\notebook
In Linux/Mac OS:
source bin/notebook.sh
Mac Mini (M1, 2020) Apple M1 CPU 8-Core 16 GB RAM 250GB SSD
Approximate running times for latest release:
- src/download.ipynb: 2 min
- src/preprocess.ipynb: 3.33 min
- src/run.ipynb: 2.66 min