Progression path for a GIS analyst who wants to become proficient in using Python for GIS: from apprentice to guru
This is a work in progress
This is an attempt to provide a structured collection of resources that could help a GIS professional to learn how to use Python when working with spatial data management, mapping, and analysis. The resources are organized by progress category so basically everyone should be able to learn something new along the way.
The resources will include books, web pages and blog posts, online courses, videos, Q/A from GIS.SE, links to code snippets, and some bedtime readings.
The resources will be applicable both for Esri software users as well as open-source GIS professionals.
You should be able to write short simple scripts in pure Python with no connection to GIS. To learn the basics of Python, you can find a ton of resources online such as CodeAcademy, Learn Python the Hard Way, Dive into Python, A Whirlwind Tour of Python, and many other books from Python.org and this Free programming books GitHub repo.
If you don't want to learn Python this way and would rather like to catch up learning how Python can be used for GIS:
- Python Scripting for ArcGIS for Esri users
- Geoprocessing with Python for open-source users
Going through these books may be sufficient to learn everything you may ever need if you are an Esri or an open-source GIS user, respectively.
- Esri instructor-led course Introduction to Geoprocessing Scripts Using Python
- Esri free web course Python for Everyone
- Esri web course Basics of Python (for ArcGIS 10)
- Penn State free web course GIS Programming and Automation
Look for videos at Esri Video web page and search for Python
and sort by most recent. An example of URL.
- Esri video Working with Feature Data Using ArcPy
- Esri video ArcMap and Python: Closing the VBA Gap
- Esri video Python: Getting Started 2014
- Esri video Python: Developing Geoprocessing Tools
- Esri video Python: Getting Started 2013
- arcpy tutorials
- Esri Blog post Scheduling a Python script or model to run at a prescribed time
- PyQGIS Developer Cookbook
- Esri blog Seven easy ways to start learning Python and ArcPy
- GIS.SE What are some resources for learning ArcPy?
- GIS.SE Learning resources for PyQGIS?
- Python for GIS tutorials
- Python GDAL/OGR Cookbook!
At this point, you should be able to:
- write some simple scripts either using
arcpy
site-package orogr/gdal/pyqgis
libraries - report information about your GIS assets (data format, geometry type, data schema, spatial reference)
- write code for calling ArcGIS geoprocessing tools (inspectig the
arcpy.Result
object returned) /ogr
geometry methods /PyQGIS
tools from Python code - perform an operation on multiple datasets in batch mode using
arcpy
/ogr
listing functions - read and update attributes & geometry of features using
arcpy.da
cursors orogr
data source methods - create and operate
arcpy.Geometry()
objects (accessing their properties and methods) orogr.Feature()
- create an ArcGIS toolbox with a simple script tool executing a Python source file
- report information about map layers (eg. data sources, broken paths, definition queries) within an ArcMap map document (.mxd) using
arcpy.mapping
module orpyqgis
module
At this point, you should be familiar with:
- variables of different data types (numeric, string, Boolean, date etc.)
- data structures of different types (list, tuple, dictionary, set)
for
andwhile
loops,if-elif-else
blocks- import of external Python modules and packages (eg.
import os
) - functions and how they work (eg. input arguments and
return
statement) - reading/writing text files using the built-in
open
function - reading/writing
.csv
files using thecsv
module andunicodecsv
module
This section contains examples of tasks that you might need to write at some point of time. Implementing these tasks in Python code would be a good sign that you have mastered the basics of Python for GIS.
- get a list of field names of Date type in a file geodatabase feature class
- copy multiple shapefiles into a file geodatabase at once
- re-project all rasters in a folder copying the results into a new folder
- update data sources for layers in a map document and save a new map document
- write to a
.txt
or a.csv
file information about your GIS assets
- Learn about various Python IDEs and find out which one you like most. Many start with free PyScripter and then move to something else:
- SO question What IDE to use for Python?
- Check PyCharm, WingIDE, Eclipse PyDev, Visual Studio Code, Python Tools for Visual Studio, or rich text editors such as Sublime with Python support
- Check Esri blog post Choosing the right Python Integrated Development Environment
Now, for getting started with Python development, Visual Studio Code with Python extension(s) is arguably the best choice. It's completely free, you can install it on any of your physical or virtual machines and it has great support for Python development. Choosing between commercial IDEs, Wing IDE or PyCharm would be a great choice.
-
Learn about VCS such as
Git
for managing the source code. BitBucket by Atlassian and GitLab provides free private repositories and GitHub provides free public repositories (you need to pay to create private ones).- Find out whether there is a VCS solution deployed in-house within your organization, such as Microsoft TFS, which you could use to check in the code
- Should you like to dive deeper into
Git
, read the Git Pro book for free online
-
Watch Python courses on training sites such as Pluralsight [Python Fundamentals](https://app.pluralsight.com/library/courses/python-fundamentals/table-of-contents, Enthought Python Foundation Series or Safari Books online
-
Watch Esri video Python: Useful Libraries for the GIS Professional
-
Learn about type hinting in Python. There is an excellent blog post on how type hints are used in PyCharm and a help page from Wing IDE people. Find out whether your Python IDE supports static code analysis and start using the type hints (with support both for Python 2.7 and 3.5+).
- Learn about MyPy static type checker and support for type hints with the
typing
module in Python 3.6
- Learn about MyPy static type checker and support for type hints with the
-
Learn how run other Python programs or executable files from your program using
subprocess
module. This is handy when you need to run an.exe
program in the middle of your Python program. This is often the case when you usearcpy
/ogr
code in the beginning of the script, but then need to run ArcObjects console app / compiled C++ app to get something done before you can proceed further.
- Esri blog Generating a choice list from a field for a custom script tool
- Python and GIS blog 1 and Python and GIS blog 2
- Convert ArcGIS script toolbox (.tbx) to Python toolbox (.pyt) with tbxtopyt
- Esri blog How to Debug Python Toolboxes in 3 Easy Steps
- Esri blog Field mapping and Python scripting
- Esri ArcPy team blog
- Convenience functions for
arcpy
in repo arcapi - Sample ArcGIS toolbox SampleArcPyMappingScriptTools_10_v1 for working with
arcpy.mapping
(20+ tools) - Learn how to use pip for managing Python packages
- Learn
virtualenv
andvenv
for managing Python 2 for Python 3 environments, respectively - Learn
tox
andretox
to run your tests/programs on multiple Python installations (can be handy when building script tools to be used both in ArcGIS Desktop with Python 2.7 and ArcGIS Pro with Python 3.x or QGIS with Python 2.7 and QGIS with Python 3.x) - Learn R-ArcGIS bridge to combine Python and R code
At this point, you should be able to:
- automate map production using
arcpy.mapping
with data-driven pages orpyqgis
- manage
.pdf
files (eg. re-ordering, merging, splitting) usingarcpy
or pure Python packages such aspypdf2
- export ArcMap map document layout to various file formats such as
.png
and.pdf
- update text elements content in layout of an ArcMap map document
- executing SQL queries from Python using
arcpy.ArcSDESQLExecute()
orGDALDataset::ExecuteSQL()
- use
FieldInfo
,FieldMap
, andFieldMappings
classes fromarcpy
orogr.FieldDefn()
to manage data schema changes - customize custom ArcGIS script tool behavior using
ToolValidator
class or build simple QGIS plugins - start using Python toolboxes and Python add-ins in ArcGIS when it makes sense
- debug
arcpy
-driven code with the help of geoprocessing messages - writing smaller unit tests for GIS workflows
- handling
JSON
data in Python andarcpy
andGeoJSON
forogr
- read Excel files using
xlrd
Python package - generate simple Excel files from datasets with Python and
xlsxwriter
package orxlwd
- using
arcpy.da.Walk()
andos.walk()
to traverse folders with GIS datasets recursively
At this point, you should be familiar with:
- installing Python packages using
pip
PYTHONPATH
environment variable and concept of paths and running Python programs fromcmd
- Python 3 to be able to write code that will be ported later to ArcGIS Pro / QGIS 3.x
- Python PEP-8 style guide
collections
module data structures such asdefaultdict
,namedtuple
,Counter
- list, dictionary comprehensions, and set comprehensions + set theory operations
- enumerating sequences using the built-in
enumerate
function - writing own functions and handling the arguments with
*args
and**kwargs
- lambda/anonymous and convenience functions
- accessing DBMS databases using Python
- working with disk-based databases such as
SQLite
from Python - using non-Latin characters in the source file, handling Unicode, encoding shebang
- Python exceptions and
try/except
block - Python
traceback
module - tuple unpacking with function calls
- sending emails with Python
- accessing ftp sites with Python using
ftplib
module - running Python files with the
cmd
and a task scheduler - zipping folders and files with Python and reading/unpacking archive files (using
zipfile
module for.zip
files andtarfile
for.tar
and.tar.gz
files) - sending SMS using Python and
Twilio
- logging your Python programs (using
logging
module) - handy to use instead ofprint
statements
-
Learn how to use ArcObjects from Python:
- GIS.SE Accessing ArcObjects from Python?
- GIS.SE Guidelines for using ArcObjects from Python
- Esri video Extending Python Using C++ and ArcObjects
- Esri recorded live-training seminar on ArcObjects
- Esri Help page Learning ArcObjects will help you determine which ArcObjects will provide the functionality required by your customization
- Esri Help page Reading OMDs provides a description of the diagram notation used on the ArcObjects object model diagrams (OMDs)
- Blog post Accessing ArcObjects in Python
- GitHub repo with tutorial on ArcObjects
-
Learn how to access ArcGIS Pro .NET libraries from Python:
- Learn the basics of C# or VB.NET
- Learn to use the
pythonnet
package - Learn to build a class library to access its methods from Python using
clr
- Learn how to build a console app that would embed Pro libraries in Pro .NET SDK Help
-
Learn about other GIS packages:
- Go through a comprehensive list of Essential Python Geospatial Libraries (also available on this GitHub page)
- Watch the series of recorded workshops on using open source GIS packages: Geospatial data in Python: Database, Desktop, and the Web
- Watch the series of various talks on using open source GIS packages: Geospatial Python talks
-
Learn about how to build desktop GUI applications using Tkinter, WxPython, PyQt, PySide, or Kivy and then embed them into ArcGIS or just let them be aware of spatial datasets:
- Python Add-Ins and Tkinter
- Esri blog post Using Python and QML to build native apps
- GitHub repo with samples for ArcGIS, Qt and Python integration
- Watch Esri video Developing Custom Tools with PyQt
- Go through examples of PyQt applications on the Riverbank GitHub page
- Explore the workflow of building a custom QGIS plugin with PyQt
- Explore PyQt5 desktop application for executing SQL queries against Esri file geodatabases GDBee
-
Learn about computational geometry and find out how it can help you in your work. Maybe you could use a tool that is not present in your desktop GIS or you are looking for something that performs faster. There are two main computational geometry libraries and both were written in C:
- qhull. Its Python wrapper is accessed via
scipy.spatial
module, an exceptional tool for anyone who deals with geometrical data. - CGAL. Its Python bindings are generated with
SWIG
. TheCGAL
is somewhat difficult to install and compile, but does provide much richer functionality. - Watch Computational Geometry in Python - PyCon 2016 and Python Powered Computational Geometry to learn more about
qhull
andCGAL
, respectively.
- qhull. Its Python wrapper is accessed via
-
Learn about using Python with FME Desktop:
- Find out how to run an FME workbench from Python
- Find out how to call Python code from within an FME workbench using PythonCaller transformer
- Find out how to process data without using any FME workbench with the help of FME Objects Python API
-
Learn the ArcGIS REST API:
- Learn
requests
module - ArcGIS toolbox ArcGIS Server Administration Toolkit - 10.1+
- Learn ArcGIS Python API to manage ArcGIS Online / Portal organizations and ArcGIS Server resources
- Learn
-
Learn about managing and processing larger spatial datasets as performance will matter:
- Learn profiling techniques to find out what code takes most time to execute (
cProfile
) - Learn benchmarking to compare execution time for functions that do the same thing using different tools (eg. looking for the fastest way to count points in polygons)
- Read Esri blog Accessing Multidimensional Scientific Data using Python
- Learn how to use
multiprocessing
module with ArcGIS at Esri blog post Multiprocessing with ArcGIS – Approaches and Considerations (Part 1) - Read Esri blog Be successful overlaying large, complex datasets in Geoprocessing
- Learn profiling techniques to find out what code takes most time to execute (
-
Learn about using Python for Big Data management and analysis
- Learn about
PySpark
that will let you use Spark in Python as well as various geospatial libraries that will let you do geospatial analysis using Spark:magellan
,spatialspark
, andGeoSpark
- Watch a video showcasing use of Spark for large data analysis: Large Scale Geospatial Analytics with Python, Spark, and Impala
- Read an article showcasing use of
Hadoop
andPresto
for large data analysis at Uber: Query the planet: Geospatial big data analytics at Uber - Learn about Presto geospatial functions and Presto Python client
- Learn about Omnisci geospatial Map-d core and its Python JayDeBeApi
- Learn about Geomesa and its integration with the Spark Python API for accessing data in GeoMesa data stores
- Become familiar with Esri Geometry API for Java as it is used by many Java-based Big Data platforms that provide some kind of geospatial support
- Learn about
-
Learn about Esri File Geodatabase C++ API with .NET bindings to be able to work with file geodatabases programmatically using C++ or .NET
- Learn how to use
ESRI File Geodatabase (OpenFileGDB)
andESRI File Geodatabase (FileGDB)
GDAL drivers to connect to Esri file geodatabase programmatically or using open-source tools - Explore an example of using
GDALDataset::ExecuteSQL()
in a PyQT desktop SQL editorGDBee
- Learn how to use
-
Learn how Python is used in the enterprise watching the Enterprise Software with Python O'reilly video course
-
Learn IPython and the concept of reproducible research:
- Learn how to use Jupyter notebook
- Learn how to combine Python and R code in the same Jupyter notebook
-
Learn about using Python for web development:
- Learn
flask
anddjango
. Start withflask
and only then move todjango
- Learn
geodjango
to serve spatial datasets on the web. Read through slides ArcGIS JavaScript Plus Django Equals Dynamic Web App
- Learn
-
Watch Python – Beyond the Basics on Pluralsight
-
Learn about
nlpk
Python package to work with human language data (eg. parsing address data) -
Learn about
regex
Python package to work with regular expressions in Python (eg. finding addresses in a specific format) -
Learn about
difflib
andLevenshtein C extension
to do fuzzy string matching (eg. finding the closest address string in the registry for an input address) -
Learn
Selenium
Python package to be able to automate web app testing. Read the docs for Python bindings here -
Learn about numerical computing and data science:
- Install
Anaconda
and learn aboutconda
. This is helpful as Python in ArcGIS Pro is implemented using a conda environment - Install Enthought Canopy scientific Python distribution and learn what tools it has
- Learn numerics, science, and data with Python with scipy-lectures
- Learn what
scipy.spatial
can do for your GIS work - Read Python for Data Analysis and Python for Computational Science and Engineering (free book)
- Install
-
Learn about connecting to various DBMS from Python:
- For Microsoft SQL Server -
pymssql
- For Oracle -
cx_Oracle
- For PostgreSQL -
psycopg2
orsqlalchemy
- For Microsoft SQL Server -
-
Learn about using machine learning with Python:
- Start using scikit-learn for various GIS-related operations such as data classification and regression as well as scikit-image for image processing (e.g., satellite imagery recognition)
-
Learn about using computer vision (CV) with Python to do image processing:
-
Learn about creating and parsing HTML:
- Parse and construct HTML pages with Python using
BeautifulSoup
. Having this skill would be handy when a web page should be searched for some information and loaded into a GIS dataset or when you are building HTML reports - Learn how the
registrant
package reports information about the Esri geodatabase contents - Learn about web scraping using
Scrapy
- Parse and construct HTML pages with Python using
-
Learn about creating and parsing XML:
- Parse existing
.xml
files using built-inxml.etree.ElementTree
class and 3rd party packagelxml
- Parse existing
-
Learn about source code testing, linting, and refactoring:
- Learn
unittest
built-in module and more advancedpytest
framework - Learn
coverage.py
module to create code coverage reports - Learn
Hypothesis
for writing more powerful unit tests - Learn Python linters such as
pylint
,flake8
, andpyflakes8
to keep the code tidy - Learn about Python style guides such as Google style guide. This will be particular useful when you start working in a team
- Learn about most comprehensive Python linter
wemake-python-styleguide
. It is just aflake8
plugin; however, it combines violations from a lot of otherflake8
plugins - Learn Python formatters such
yapf
andautopep8
to automatically reformat the source code to conform to a style. It is best to runautopep8
with aggressive option enabled to reformat the code and then runyapf
on the result code - Learn SonarPython static code analyzer to find code smells and refactoring options. Many of the rules from
SonarPython
are implemented inwemake-python-styleguide
- Learn about Python interface files (PEP-484) and how to use them to help your Python IDE to do static code analysis and provide better intellisense
- Learn
-
Start looking for doing certain things outside of GIS applications using pure Python, for instance, using
pandas
-
Learn best practices for organizing configuration and settings for a larger workflow where you need to keep the config values separately from the business logic (eg. using
json
,ConfigParser
or using OOP constructors) -
Learn about extending Python with C or C++:
- Learn how to create a C++ extension for Python (
.pyd
compiled file that can be imported as a regular module into Python module) - Go through the Interfacing with other languages course from Enthought to start building Python native modules
- Learn and compare various alternatives for wrapping C++ code for Python such as using
Boost
,SWIG
, native Python C API, andpybind11
.pybind11
is the most user-friendly one
- Learn how to create a C++ extension for Python (
At this point, you should be able to:
- execute ArcObjects code from Python using
comtypes
library - export the data from tables and feature classes into Excel with custom formatting using
xlsxwriter
- generate
.pdf
files from scratch that would contain map images, custom charts, and tables usingreportlab
- split, merge, crop, and transform
.pdf
documents usingpypdf2
- generate
.pdf
report files using ArcGIS report templates (.rlf
) andarcpy
- generate graphs using
arcpy.Graph
,arcpy.GraphTemplate
with graph template files (.tee
), and Make Graph GP tool - perform graph theory operations on linear datasets using
networkx
(eg. point-to-point routing) - plot geodata with
Matplotlib
(both vector and raster) - use
numpy
andpandas
for manipulating spatial dataset attribute table - use
requests
and/orarcrest
package to access ArcGIS Server site, ArcGIS Online / Portal organizations through the ArcGIS REST API - call FME workbenches from Python
- access readers and writers in FME with
fmeobjects
- read, modify, and write a georeferenced image
- generate useful information about a point dataset (most isolated points, a pair of two furthest points, etc)
At this point, you should be familiar with:
- building desktop GUI applications using PyQt, PySide, or Kivy (eg. visualize a shapefile's features in an application window)
- contributing to open-source projects such as
arcrest
orgeopandas
reporting bugs or pulling in new functionality - creating new
conda
environments and installing various packages into specific environments - refactoring wrapping the code into functions, modules, and packages
- OOP basics and creating own classes
- compile a simple Python extension module (
.pyd
) and write a.pyi
interface file to provide the intellisense for your Python IDE
This section contains the examples of tasks that you might need to write at some point of time. Implementing these tasks in Python code would be a good sign that you have mastered the advanced concepts of Python for GIS.
- hide/show map grid of data frame in a map layout before exporting the map in a map document using
arcpy
package and ArcObjects - update label's text of a scale bar in a map layout using pure ArcObjects
- generate a service area (drive-time) polygon for an arbitrary point on a street network stored as a shapefile using
networkx
- find out the fastest spatial join - ArcGIS Spatial Join GP tool,
rtree
in PostGIS, SQL ServerSTContains
, orshapely
Python package - create a new .csv file from an existing one by filtering certain rows using
pandas
- classify point dataset features into clusters using
scikit-learn
to mimic some of the ArcGIS Spatial Statistics tools - write a program that will calculate the area of a lake automatically recognized from a satellite imagery
- build with the help of
PyQt
a GUI application for executing SQL queries against file geodatabases