GeoNode task execution engine

GeoNode needs to perform a set of tasks that are potentially intensive from network, IO and compute angles. Some examples of such tasks are the creation of new resources, which can take multiple forms:

Import local data via django admin commands
Data upload via GeoNode API (and GUI, which is to rely on the API too)
Harvesting metadata from remote services
Harvesting data from remote services
etc.

The tasks described above have a mix of the following properties:

Require a multi-step workflow that involves coordination between multiple services, i.e. GeoNode, GeoServer.
Require dynamic scheduling capability
- Run on a configurable periodic frequency
- Run on demand when some event is triggered
Take a long time to run
Interface with external agents and systems

As such, it is desirable that GeoNode tasks:

Run asynchronously in a different process than the main GeoNode server
Are built in a scalable way, such that they can run on different hosts than the main GeoNode web application
Provide means to monitor their execution progress at all times during their execution
Provide means to cancel their execution
Notify users of their own completion
Are able to deal with failure. Execute in a transaction-like fashion whereby a rollback mechanism is applied whenever there is an error.

Additionally GeoNode shall store a log of task execution details

Requirements for GeoNode workflow execution engine

The following is a set of requirements that the GeoNode workflow execution engine should meet in order to fulfill the use cases and needs exposed above.

GeoNode workflows shall be composed of multiple tasks arranged in a directed graph of dependencies. The unit of work shall be the task. However, the unit of configuration shall be the workflow.
The workflow execution engine shall schedule workflows (and their tasks) to run in asynchronous fashion. Moreover, tasks shall run in parallel, on different processes than the main GeoNode web service. This shall allow the main GeoNode service to maintain a smooth and speedy operation, even when heavy tasks are being run.
The workflow execution engine shall allow distributing tasks over multiple hosts. This shall allow for a horizontal scaling of resources, as needed.
The workflow engine shall allow multiple workflow scheduling methods, including on-demand, one-off and periodic execution
Workflows shall expose a set of configuration parameters and these shall be able to be created and parametrized at runtime, i.e. without requiring a restart of any service. This includes things like modifying the frequency of execution, etc
There needs to be some way by which a running workflow is able to communicate its own execution progress
The workflow execution engine shall be able to terminate running workflows on demand
The workflow engine shall ensure that GeoNode's is not left in an inconsistent state in case of an error when executing a workflow task. Each workflow shall ensure a transactional process: either all of the workflow's tasks run successfully or, in case of an error, all state is returned to where it was before the workflow started its execution
The workflow engine shall store workflow and task execution details for subsequent analysis and processing by GeoNode. This information shall be collected and stored during task execution and be kept long after a task has ended in order to ensure a history log is preserved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeoNode task execution engine

Requirements for GeoNode workflow execution engine

Clone this wiki locally