Skip to content

Latest commit

 

History

History
524 lines (407 loc) · 29.2 KB

README.md

File metadata and controls

524 lines (407 loc) · 29.2 KB

license

testing testing testing

Makeflow for Drone Processing Pipeline

The Scientific Filesystem is used as to provide the entry points for the different tasks available (known as "apps" with the Scientific Filesystem). These apps are used to create workflows.

Table of contents

Terminology used

Here are the definition of some of the terms we use with links to additional information

Running the apps

This section contains information on running the different apps in existing Docker workflow container. By tying these different applications together, flexible workflows can be created and distributed.

To determine what apps are available, try the following command:

docker run --rm agdrone/canopycover-workflow:latest apps

The different components of the command line are:

  • docker run --rm tells Docker to run an image and remove the resulting container automatically after the run
  • agdrone/canopycover-workflow:latest is the Docker image to run
  • apps the command that lists the available apps

Prerequisites

  • Docker needs to be installed to run the apps. How to get Docker
  • Create an inputs folder in the current working directory (or other folder of your choice) to hold input files
mkdir -p "${PWD}/inputs"
  • Create an outputs folder in the current working directory (or other folder of your choice) to hold the results
mkdir -p "${PWD}/outputs"
  • Create an checkpoints folder. The checkpoints folder will contain the generated workflow checkpoint data allowing easy error recovery and helps prevent re-running an already completed workflow. Removing the workflow checkpoint files will enable a complete re-run of the workflow
mkdir -p "${PWD}/checkpoints"

Configuration JSON file

Most of the apps described in this document need additional information to perform; such as the source image name. This information is provided through a JSON file that is made available to a running container.

Each of the apps described provide the keys they expect to find, along with a description of the associated value.

We recommend naming the configuration JSON files something that is related to the intent; such as the workflow that they are a part of.

Generating GeoJSON plot geometries

Plot geometries are needed when clipping source files to where they intersect the plots. The plot geometries need to be in GeoJSON format. Apps are provided to convert shapefiles and BETYdb URLs to the GeoJSON format.

BETYdb to GeoJson

This app retrieves the plots from a BETYdb instance and saves them to a file in the GeoJSON format.

JSON configuration
There are two JSON key/value pairs needed by this app.

  • BETYDB_URL: the URL of the BETYdb instance to query for plot geometries
  • PLOT_GEOMETRY_FILE: the path to write the plot geometry file to, including the file name

For example:

{
  "BETYDB_URL": "https://terraref.ncsa.illinois.edu/bety",
  "PLOT_GEOMETRY_FILE": "/output/plots.geojson"
}

Sample command line \

docker run --rm -v ${PWD}/outputs:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run betydb2geojson

The different components of the command line are:

  • docker run --rm tells Docker to run an image and remove the resulting container automatically after the run (--rm)
  • -v ${PWD}/outputs:/output mounts the previously created outputs folder to the /output location on the running image
  • -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json mounts the JSON configuration file so that it's available to the app
  • agdrone/canopycover-workflow:latest is the Docker image to run
  • run betydb2geojson the command that runs the app

Please notice that the /output folder on the command line corresponds with the PLOT_GEOMETRY_FILE starting path value in the configuration JSON

Shapefile to GeoJson

This app loads plot geometries from a shapefile and saves them to a file in the GeoJSON format.

JSON configuration
There are two JSON key/value pairs needed by this app.

  • PLOT_SHAPEFILE: the path to the shapefile to load and save as GeoJSON
  • PLOT_GEOMETRY_FILE: the path to write the plot geometry file to, including the file name

For example:

{
  "PLOT_SHAPEFILE": "/input/plot_shapes.shp",
  "PLOT_GEOMETRY_FILE": "/output/plots.geojson"
}

Sample command line \

docker run --rm -v ${PWD}/inputs:/input -v ${PWD}/outputs:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run shp2geojson

The different components of the command line are:

  • docker run --rm tells Docker to run an image and remove the resulting container automatically after the run (--rm)
  • -v ${PWD}/inputs:/input mounts the previously created inputs folder to the /input location on the running image
  • -v ${PWD}/outputs:/output mounts the previously created outputs folder to the /output location on the running image
  • -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json mounts the JSON configuration file so that it's available to the app
  • agdrone/canopycover-workflow:latest is the Docker image to run
  • run shp2geojson the command that runs the app

Please notice the following:

  • the /input folder on the command line corresponds with the PLOT_SHAPEFILE starting path value in the configuration JSON; this is where the app expects to find the shapefile to load and convert
  • the /output folder on the command line corresponds with the PLOT_GEOMETRY_FILE starting path value in the configuration JSON

Soilmask images

This app masks out soil from an image.

JSON configuration
There are JSON key/value pairs for this app

  • SOILMASK_SOURCE_FILE: the path to the image to mask the soil from
  • SOILMASK_MASK_FILE: the name of the mask file to write. Will be written to the path defined in SOILMASK_WORKING_FOLDER if a path is not specified
  • SOILMASK_WORKING_FOLDER: the path to where the results of processing should be placed
  • SOILMASK_OPTIONS: any options to be passed to the script

The following JSON example would have the soilmask app write the mask to a file named orthomosaic_masked.tif in the /output/ folder of the running Docker image:

{
  "SOILMASK_SOURCE_FILE": "/input/orthomosaic.tif",
  "SOILMASK_MASK_FILE": "orthomosaic_masked.tif",
  "SOILMASK_WORKING_FOLDER": "/output",
  "SOILMASK_OPTIONS": ""
}

The following options are available to be specified on the SOILMASK_OPTIONS JSON entry:

  • --metadata METADATA this option indicates a metadata YAML or JSON file to use when processing
  • --help displays the soilmask help information without any file processing

Sample command line \

docker run --rm -v ${PWD}/test/inputs:/input -v ${PWD}/test/output:/output -v ${PWD}/chris-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run soilmask

The different components of the command line are:

  • docker run --rm tells Docker to run an image and remove the resulting container automatically after the run (--rm)
  • -v ${PWD}/inputs:/input mounts the previously created inputs folder to the /input location on the running image
  • -v ${PWD}/outputs:/output mounts the previously created outputs folder to the /output location on the running image
  • -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json mounts the JSON configuration file so that it's available to the app
  • agdrone/canopycover-workflow:latest is the Docker image to run
  • run soilmask the command that runs the app

Please notice the following:

  • the /input folder on the command line corresponds with the SOILMASK_SOURCE_FILE path value in the configuration JSON; this is where the app expects to find the source image
  • the /output folder on the command line corresponds with the SOILMASK_WORKING_FOLDER path value in the configuration JSON; this is where the masked image is stored

Plotclip images

This app clips georeferenced images to plot boundaries.

JSON configuration
There are JSON key/value pairs for this app

  • PLOTCLIP_SOURCE_FILE: the path to the image to clip
  • PLOTCLIP_PLOTGEOMETRY_FILE: the path to the GeoJSON file containing the plot boundaries; see also BETYdb to GeoJsonand Shapefile to GeoJson
  • PLOTCLIP_WORKING_FOLDER: the path to where the results of processing should be placed; each plot clip is placed in a folder corresponding to the plot name
  • PLOTCLIP_OPTIONS: any options to be passed to the script

The following JSON example would have the plot clips written to the /output/ folder of the running Docker image:

{
  "PLOTCLIP_SOURCE_FILE": "/input/orthomosaic_mask.tif",
  "PLOTCLIP_PLOTGEOMETRY_FILE": "/input/plots.geojson",
  "PLOTCLIP_WORKING_FOLDER": "/output",
  "PLOTCLIP_OPTIONS": ""
}

The following options are available to be specified on the PLOTCLIP_OPTIONS JSON entry:

  • --metadata METADATA this option indicates a metadata YAML or JSON file to use when processing
  • --keep_empty_folders specifying this option will create a folder with the plot name even if the plot doesn't intersect the image
  • --plot_column PLOT_COLUMN specifies the column name ("properties" sub-key with GeoJSON) to use as the plot name
  • --help displays the plotclip help information without any file processing

Sample command line \

docker run --rm -v ${PWD}/test/inputs:/input -v ${PWD}/test/output:/output -v ${PWD}/chris-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run plotclip

The different components of the command line are:

  • docker run --rm tells Docker to run an image and remove the resulting container automatically after the run (--rm)
  • -v ${PWD}/inputs:/input mounts the previously created inputs folder to the /input location on the running image
  • -v ${PWD}/outputs:/output mounts the previously created outputs folder to the /output location on the running image
  • -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json mounts the JSON configuration file so that it's available to the app
  • agdrone/canopycover-workflow:latest is the Docker image to run
  • run plotclip the command that runs the app

Please notice the following:

  • the /input folder on the command line corresponds with the PLOTCLIP_SOURCE_FILE path value in the configuration JSON; this is where the app expects to find the source image
  • the /output folder on the command line corresponds with the PLOTCLIP_WORKING_FOLDER path value in the configuration JSON; this is where the plot image clips are saved

Find files and write JSON

This app locates files with a specific name and writes a JSON file that can then be used to process those files. Makeflow is a deterministic scheduler, meaning that when it's run it needs to "know" everything about a job; such as which files are input. Apps like Plotclip are non-deterministic in that there isn't a way ahead of time of knowing which plots intersect an image (unless complete plot coverage is guaranteed, which doesn't always happen). Even in cases where the output of a step is deterministic, it may still be handy to use this app to build up a JSON file.

The source top-level folder is shallowly searched, only immediate sub-folders are searched, and the top folder is ignored.

JSON configuration
There are JSON key/value pairs for this app

  • FILES2JSON_SEARCH_NAME: the complete name of the file to find
  • FILES2JSON_SEARCH_FOLDER: the starting path to begin searching in
  • FILES2JSON_JSON_FILE: the path to the found file's JSON is written to

The following JSON example would have the JSON file written to the /output/files.json file of the running Docker image:

{
  "FILES2JSON_SEARCH_NAME": "orthomosaic_mask.tif",
  "FILES2JSON_SEARCH_FOLDER": "/input",
  "FILES2JSON_JSON_FILE": "/output/files.json"
}

Sample command line \

docker run --rm -v ${PWD}/inputs:/input -v ${PWD}/output:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run find_files2json

The different components of the command line are:

  • docker run --rm tells Docker to run an image and remove the resulting container automatically after the run (--rm)
  • -v ${PWD}/inputs:/input mounts the previously created inputs folder to the /input location on the running image
  • -v ${PWD}/outputs:/output mounts the previously created outputs folder to the /output location on the running image
  • -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json mounts the JSON configuration file so that it's available to the app
  • agdrone/canopycover-workflow:latest is the Docker image to run
  • run find_files2json the command that runs the app

Please notice the following:

  • the /input folder on the command line corresponds with the FILES2JSON_SEARCH_FOLDER path value in the configuration JSON; this is where the app will start its search
  • the /output folder on the command line is included as part of the FILES2JSON_JSON_FILE path value in the configuration JSON; this is the folder where the found file's JSON are saved

Canopy Cover calculation

This app calculates the canopy cover of soilmasked images and writes the CSV files next to the source image (in the same folder).

JSON configuration
There are JSON key/value pairs for this app

  • CANOPYCOVER_OPTIONS: any options to be passed to the script

The following JSON example shows how to define runtime options when running this app:

{
  "CANOPYCOVER_OPTIONS": ""
}

The following options are available to be specified on the CANOPYCOVER_OPTIONS JSON entry:

  • --metadata METADATA this option indicates a metadata YAML or JSON file to use when processing
  • --help displays the canopy cover help information without any file processing. This is useful for finding options which affect the output

Sample command line \

docker run --rm -v ${PWD}/test/inputs:/input -v ${PWD}/test/output:/output -v ${PWD}/chris-jx-args.json:/scif/apps/src/jx-args.json -v ${PWD}/canopy_cover_files.json:/scif/apps/src/canopy_cover_files.json agdrone/canopycover-workflow:latest run canopycover

The different components of the command line are:

  • docker run --rm tells Docker to run an image and remove the resulting container automatically after the run (--rm)
  • -v ${PWD}/inputs:/input mounts the previously created inputs folder to the /input location on the running image
  • -v ${PWD}/outputs:/output mounts the previously created outputs folder to the /output location on the running image
  • -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json mounts the JSON configuration file so that it's available to the app
  • -v ${PWD}/canopy_cover_files.json:/scif/apps/src/canopy_cover_files.json mounts the JSON file containing information on the files to process so that it's available to the app; also see Find files and write JSON
  • agdrone/canopycover-workflow:latest is the Docker image to run
  • run canopycover the command that runs the app

Please notice the following:

  • the /input folder on the command line corresponds to where the files to be processed are expected to be found and where the CSV files are written to

Greenness Indices calculation

This app calculates several greenness indices of soilmasked images and writes the CSV files next to the source image (in the same folder).

JSON configuration
There are JSON key/value pairs for this app

  • CANOPYCOVER_OPTIONS: any options to be passed to the script

The following JSON example shows how to define runtime options when running this app:

{
  "GREENNESS_INDICES_OPTIONS": ""
}

The following options are available to be specified on the GREENNESS_INDICES_OPTIONS JSON entry:

  • --metadata METADATA this option indicates a metadata YAML or JSON file to use when processing
  • --help displays the greenness indices help information without any file processing. This is useful for finding options which affect the output

Sample command line \

docker run --rm -v ${PWD}/test/inputs:/input -v ${PWD}/test/output:/output -v ${PWD}/chris-jx-args.json:/scif/apps/src/jx-args.json -v ${PWD}/greenness_indices_files.json:/scif/apps/src/greenness-indices_files.json agdrone/canopycover-workflow:latest run greenness-indices

The different components of the command line are:

  • docker run --rm tells Docker to run an image and remove the resulting container automatically after the run (--rm)
  • -v ${PWD}/inputs:/input mounts the previously created inputs folder to the /input location on the running image
  • -v ${PWD}/outputs:/output mounts the previously created outputs folder to the /output location on the running image
  • -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json mounts the JSON configuration file so that it's available to the app
  • -v ${PWD}/greenness_indices_files.json:/scif/apps/src/greenness-indices_files.json mounts the JSON file containing information on the files to process so that it's available to the app; also see Find files and write JSON
  • agdrone/canopycover-workflow:latest is the Docker image to run
  • run greenness-indices the command that runs the app

Please notice the following:

  • the /input folder on the command line corresponds to where the files to be processed are expected to be found and where the CSV files are written to

Merge CSV files

This app recursively merges same-named CSV files to a destination folder. If the folder contains multiple, differently named, CSV files, there will be one resulting merged CSV file for each unique CSV file name. All the source CSV files are left intact.

JSON configuration
There are JSON key/value pairs for this app

  • MERGECSV_SOURCE: the path to the top-level folder containing CSV files to merge
  • MERGECSV_TARGET: the path where the merged CSV file is written
  • MERGECSV_OPTIONS: any options to be passed to the script

For example:

{
  "MERGECSV_SOURCE": "/input",
  "MERGECSV_TARGET": "/output",
  "MERGECSV_OPTIONS": ""
}

The following options are available to be specified on the MERGECSV_OPTIONS JSON entry:

  • --no_header this option indicates that the source CSV files do not have header lines
  • --header_count <value> indicates the number of header lines to expect in the CSV files; defaults to 1 header line
  • --filter <file name filter> one or more comma-separated filters of files to process; files not matching a filter aren't processed
  • --ignore <file name filter> one or more comma-separated filters of files to skip; files matching a filter are ignored
  • --help displays the help information without any file processing

By combining filtering options and header options, it's possible to precisely target the CSV files to process.

The filters work by matching up the file name found on disk with the names specified with the filter to determine if a file should be processed. Only the body and extension of a file name is compared, the path to the file is ignored when filtering.

Sample command line \

docker run --rm -v ${PWD}/inputs:/input -v ${PWD}/outputs:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run merge_csv

The different components of the command line are:

  • docker run --rm tells Docker to run an image and remove the resulting container automatically after the run (--rm)
  • -v ${PWD}/inputs:/input mounts the previously created inputs folder to the /input location on the running image
  • -v ${PWD}/outputs:/output mounts the previously created outputs folder to the /output location on the running image
  • -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json mounts the JSON configuration file so that it's available to the app
  • agdrone/canopycover-workflow:latest is the Docker image to run
  • run merge_csv the command that runs the app

Please notice the following:

  • the /input folder on the command line corresponds with the MERGECSV_SOURCE path value in the configuration JSON; this is where the app expects to find the CSV files to merge
  • the /output folder on the command line corresponds with the MERGECSV_TARGET path value in the configuration JSON; this is where the merged CSV files are stored

Clean runs

Cleaning up a workflow run will delete workflow generated files and folders. Be sure to copy the data you want to a safe place before cleaning.

By adding the --clean flag to the end of the command line used to execute the workflow, the artifacts of a previous run will be cleaned up.

It's recommended, but not necessary, to run the clean app between processing runs by either running this command or through other means.

Example:

The following docker command line will clean up the files generated using the Canopy Cover: Orthomosaic and Shapefile example above.

docker run --rm -v ${PWD}/outputs:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run betydb2geojson --clean

Notice the additional parameter at the end of the command line (--clean).

Build The Container

This section describes how the Docker container could be built. Please refer to the Docker documentation for more information on building Docker containers.

cp jx-args.json.example jx-args.json
docker build -t agdrone/canopycover-workflow:latest .

Monitoring the Workflow

To monitor the running workflows, you will need to be using the checkpoints folder as described in the Prerequisites section.

Makeflow has monitoring tools available that can be used to follow the progress of the workflows. The makeflow_monitor tool can be a good starting point.

A Note On Docker Sibling Containers

The OpenDroneMap workflow uses sibling containers. This is a technique for having one Docker container start another Docker container to perform some work. We plan to find a secure alternative for future releases (see AgPipeline/issues-and-projects#240), primarily because of a potential security risk that makes this approach not suitable for shared cluster computing environments (it is also a concern for containers such as websites and databases that are exposed to the internet, but that is not the case here). You can just as safely run these workflows on your own computer as you can any trusted Docker container. However, with sibling containers the second container requires administrator ("root") privileges - please see Docker documentation for more details.

Acceptance Testing

There are automated test suites that are run via GitHub Actions. In this section we provide details on these tests so that they can be run locally as well.

These tests are run when a Pull Request or push occurs on the develop or main branches. There may be other instances when these tests are automatically run, but these are considered the mandatory events and branches.

PyLint and PyTest

These tests are run against any Python scripts that are in the repository.

PyLint is used to both check that Python code conforms to the recommended coding style, and checks for syntax errors. The default behavior of PyLint is modified by the pylint.rc file in the Organization-info repository. Please also refer to our Coding Standards for information on how we use pylint.

The following command can be used to fetch the pylint.rc file:

wget https://raw.githubusercontent.com/AgPipeline/Organization-info/main/pylint.rc

Assuming the pylint.rc file is in the current folder, the following command can be used against the betydb2geojson.py file:

# Assumes Python3.7+ is default Python version
python -m pylint --rcfile ./pylint.rc betydb2geojson.py

PyTest is used to run Unit and Integration Testing. The following command can be used to run the test suite:

# Assumes Python3.7+ is default Python version
python -m pytest -rpP

If pytest-cov is installed, it can be used to generate a code coverage report as part of running PyTest. The code coverage report shows how much of the code has been tested; it doesn't indicate how well that code has been tested. The modified PyTest command line including coverage is:

# Assumes Python3.7+ is default Python version
python -m pytest --cov=. -rpP

shellcheck and shfmt

These tests are run against shell scripts within the repository. It's expected that shell scripts will conform to these tools (no reported issues).

shellcheck is used to enforce modern script coding. The following command runs shellcheck against the "prep-canopy-cover.sh" bash shell script:

shellcheck prep-canopy-cover.sh

shfmt is used to ensure scripts conform to Google's shell script style guide. The following command runs shfmt against the "prep-canopy-cover.sh" bash shell script:

shfmt -i 2 -ci -w prep-canopy-cover.sh

Docker Testing

The Docker testing Workflow replicate the examples in this document to ensure they continue to work.