Geowrangler

Tools for wrangling with geospatial data

Overview

Geowrangler is a python package for geodata wrangling. It helps you build data transformation workflows with no out-of-the-box solutions from other geospatial libraries.

We have surveyed our past geospatial projects to extract these solutions for our work and hope it will be useful for others as well.

Our audience are researchers, analysts and engineers delivering geospatial projects.

We welcome your comments, suggestions, bug reports and code contributions to make Geowrangler better.

Modules

Grid Tile Generation
Geometry Validation
Vector Zonal Stats
Raster Zonal Stats
Demographic and Health Surveys (DHS) Pre-processing Utilities (planned)
DHS Wealth Index Calculation (planned)

Check this page for more details about our Roadmap

Installation

pip install git+https://github.com/thinkingmachines/geowrangler.git

Documentation

The documentation for the package is available here

Development

Development Setup

If you want to learn more about Geowrangler and explore its inner workings, you can setup a local development environment. You can run geowrangler's jupyter notebooks to see how the different modules are built and how they work.

Pre-requisites

OS: Linux, MacOS, Windows Subsystem for Linux (WSL) on Windows
Requirements:
- python 3.7 or higher
- virtualenv, venv or conda for python environment virtualization
- poetry for dependency management

Github Repo Fork

If you plan to make contributions to geowrangler, we encourage you to create your fork of the Geowrangler repo.

This will then allow you to push commits to your forked repo and then create a Pull Request (PR) from your repo to the main geowrangler repo for approval by geowrangler's maintainers.

Development Installation

We recommend creating a virtual python environment via virtualenv or conda for your geowrangler development environment. Please see the relevant documentation for more details.

The example below uses virtualenv to create a separate environment on Linux or WSL using python3.9.

This next command will install libgeos (required for building pygeos/shapely). See libgeos documentation for installation details on other systems.

sudo apt install libgeos-dev  # skip this if you already have GEOS

Replace the github url below with git@github.com:<your-github-id>/geowrangler.git if you created a fork.

git clone https://github.com/thinkingmachines/geowrangler.git
cd geowrangler
virtualenv -p /usr/bin/python3.9 .venv
source .venv/bin/activate
pip install pre-commit poetry
pre-commit install
poetry install
poetry run pip install pip --upgrade
poetry run pip uninstall pygeos shapely -y
poetry run pip install pygeos shapely --no-binary shapely --no-binary pygeos
poetry run pip install -e .

This completes the installation and setup of a local geowrangler environment.

Activating the geowrangler environment

To activate the geowrangler environment, you can cd <your-local-geowrangler-folder> and run poetry shell to activate the environment.

Jupyter Notebook Development

The code for the geowrangler python package resides in Jupyter notebooks located in the notebooks folder.

Using nbdev, we generate the python modules residing in the geowrangler folder from code cells in jupyter notebooks marked with an #export comment. A #default_exp <module_name> comment at the first code cell of each notebook directs nbdev to put the code in a module named <module_name> in the geowrangler folder.

See the nbdev cli documentation for more details on the commands to generate the package as well as the documentation.

Running notebooks

Run the following to view the jupyter notebooks in the notebooks folder

poetry run jupyter lab

Generating and viewing the documentation site

To generate and view the documentation site on your local machine, the quickest way is to setup Docker. The following assumes that you have setup docker on your system.

poetry run nbdev_build_docs --mk_readme False --force_all True
docker-compose up jekyll

As an alternative if you don't want to use Docker you can install jekyll to view the documentation site locally.

nbdev converts notebooks within the notebooks/ folder into a jekyll site.

From this jekyll site, you can then create a static site.

To generate the docs, run the following


poetry run nbdev_build_docs -mk_readme False --force_all True
cd docs && bundle i && cd ..

To run the jekyll site, run the following

cd docs
bundle exec jekyll serve

Running tests

We are using pytest as our test framework. To run all tests and generate a generate a coverage report, run the following.

poetry run pytest --cov --cov-config=.coveragerc -n auto

To run a single test or test file

# for a single test function
poetry run pytest tests/test_grids.py::test_create_grids
# for a single test file
poetry run pytest tests/test_grids.py

Contributing

Please read CONTRIBUTING.md and CODE_OF_CONDUCT.md before anything.

Development Notes

For more details regarding our development standards and processes, please see our wiki.

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
.github/workflows		.github/workflows
data		data
docs		docs
geowrangler		geowrangler
notebooks		notebooks
tests		tests
.coveragerc		.coveragerc
.devcontainer.json		.devcontainer.json
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
firebase.json		firebase.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geowrangler

Overview

Modules

Installation

Documentation

Development

Development Setup

Pre-requisites

Github Repo Fork

Development Installation

Activating the geowrangler environment

Jupyter Notebook Development

Running notebooks

Generating and viewing the documentation site

Running tests

Contributing

Development Notes

About

Releases

Packages

Languages

License

mosesckim/geowrangler

Folders and files

Latest commit

History

Repository files navigation

Geowrangler

Overview

Modules

Installation

Documentation

Development

Development Setup

Pre-requisites

Github Repo Fork

Development Installation

Activating the geowrangler environment

Jupyter Notebook Development

Running notebooks

Generating and viewing the documentation site

Running tests

Contributing

Development Notes

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages