Skip to content

Latest commit



136 lines (105 loc) · 4.59 KB

File metadata and controls

136 lines (105 loc) · 4.59 KB

Taxi Rides

Taxi rides is an attempt to model hourly demand for taxi demand, by attempting prediction learned from a set of sample rides in a period of 122 days.

The deliverable is a report, but the project tries to make it possible for anyone to reproduce datasets and models that are described in this report.

Central to this attempt and the heart of the project are the 5 Jupyter Notebooks in the notebooks/ folder.

Project Organization

├── Makefile           <- Makefile with commands like `make data` or `make train`
├──          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
├── docs               <- A default Sphinx project; see for details
├── models             <- Trained and serialized models, model predictions, or model summaries
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
├──           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├──    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └──
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └──
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├──
│   │   └──
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └──
└── tox.ini            <- tox file with settings for running tox; see


Setup environment

To setup the project use virtualenv and create a python3 virtual environment to install all dependencies. For example:

mkvirtualenv --python=python3 taxi-rides
workon taxi-rides

Install taxi-rides

To install the source files and project dependencies, from the project root folder run:

pip install .
pip install -r requirements.txt

The library fbprophet may appear to fail at first, but will shortly resume installation.


Use the inventory above to locate where everything is in the project. Use make to explore your options, however, some are there for showcasing.

All reports and figures are in the relevant folder:


To run the notebooks you need to reproduce the datasets. Place the raw data inside:


and run Jupyer Notebook

cd ./notebooks
jupyter notebook

Recreate datasets

You need the following folders:

mkdir ./data
mkdir ./data/interim
mkdir ./data/raw
mkdir ./data/external
mkdir ./data/processed

To recreate all datasets, make sure you have ./data/raw/routes.csv and run:

make data

The process takes several minutes to complete.

After the process is complete, you should be able to run notebooks out of order in order to replicate results and models. Models will be added in the relevant directory.

Project based on the cookiecutter data science project template. #cookiecutterdatascience