Sandbox for map-reduce style distributed template matching. Very much a work in progress, not for general consumption yet.
- Python
- The usual suspects (
pip install -r requirements.txt
) - scarplet
- AWS EC2, S3, EFS, CloudWatch
Worker.py
: Classes for matcher and reducer instancesmatch.py
: Start and maintain template matching workerreduce.py
: Initialize reducer that reduces working results directory, then exitsreduce_loop.py
: Start reducer that reduces all intermediate results, until working data directory is empty
launch_instaces.py
: Various conveience functions for AWS EC2 instance managementmonitor.py
: Monitors and restarts idle instancesmanage_data.py
: Fetches tiles from S3 bucket in batches. Fetches entire bucket, or subset with specified starting file.
Also contains various utilities for copying files in bounding box, tiling a large GeoTIFF dataset, and padding tiles.
All processing currently relies on a filename convention based on UTM coordinates of tile bounding box, e.g. fg396_4508.tif
. This is based on EarthScope survey naming conventions:
ccXXX_YYYY.fmt
where:
cc
is a data code (u
for unfiltered,fg
for filtered ground returns only, etc)XXX
andYYYY
are the most siginificant digits of the dataset's lower left corner (XXX000, YYYY000) in UTM coordinate system. In this case, I work in UTM zone 10N.
See scarplet-python issue tracker for all tasks related to scarplet project
- Refactor to use rasterio instead of GDAL bindings, remove various hack-y subprocess calls
- Tests, ack
- Add SQS task management from private repo