How to count tabular and map datasets in a CKAN instance using the API, as a Jupyter Notebook. For more information, please see the blog post.
Make sure you have either Jupyter Notebook or Lab
installed. Install the Jupytext
extension -- this is useful for Jupyter Notebooks to play nice with Git's
line based version control, and required to open .md
(or .py
files) as
Notebooks inside Jupyter. Learn more about Jupytext on this
Towards Data Science blog post.
Create a new Python environtment and activate it
$ python -m venv env
$ source env/bin/activate
Inside the environment, install ipykernel
and the new environment inside
Jupyter.
$ pip install ipykernel
$ python -m ipykernel install --user --name=ckanapi --display-name="CKAN API"
Now install the requirements for this specific notebook inside the new
environment. In particular, we use just ckanapi
, a
Python wrapper for the
CKAN API, and
tqdm to display a nice progress bar.
$ pip install -r requirements.txt
Finally, to generate back the .ipynb
file, use the command
$ jupytext --sync tabular-and-map-datasets.md
Remember to put a rule in the .gitignore
file to exclute .ipynb
files, if
you want notebooks to keep playing nice with Git diffs.
Now you can just open the Jupyter Notebook provided and run it. Adjust the parameters for your CKAN instance and enjoy!