This repository includes example notebooks and data packages for intake.
They should run locally if you download this repository. They are also available in a running environment on the cloud by clicking on the link below:
The subdirectory tutorial/
contains three notebooks describing work-flows for three different
roles by which you
might interact with Intake. The separation of concerns is important, so that each person can concentrate on the
job that they have in front of them with clear communication paths between each. It may be that in a small
organisation, a person fulfills multiple of these roles, but it is still useful to consider problems from each
vantage in turn:
- data scientist - for the end-user who want to find, load and analyse their data
- data engineer - for the curator of data and catalogues, who decides how best to store and expose data
- developer - for authors of new drivers and other extensions to Intake's capabilities. This person does not necessarily develop any code for the Intake package itself.
This directory contains examples of Intake catalogs and scripts:
- data-us-states - Conda data package with embedded data (see docs for more details). Available
via
conda install -c intake data-us-states
. - airline_flights - Conda data package with external data and extra dependencies. Available via
conda install -c intake airline_flights
. - us_crime - Conda data package used in plotting documentation. Available
via
conda install -c intake us_crime
. - nyc_taxi - Full cloud-based installable dataset of Taxi trip data in NYC in parquet format. Available
via
conda install -c intake nyc-taxi
. - precip - Precipitation data with example notebook
- data_package - generic/pip-installable package prototype
The following example is an online data-set listing automatically generated from an Intake catalog:
- pangeo catalog - List of earth science products for the Pangeo project