Skip to content

GeoNode task execution engine

Ricardo Garcia Silva edited this page Mar 10, 2021 · 1 revision

GeoNode needs to perform a set of tasks that are potentially intensive from network, IO and compute angles. Some examples of such tasks are the creation of new resources, which can take multiple forms:

  1. Import local data via django admin commands
  2. Data upload via GeoNode API (and GUI, which is to rely on the API too)
  3. Harvesting metadata from remote services
  4. Harvesting data from remote services
  5. etc.

The tasks described above have a mix of the following properties:

  • Require a multi-step workflow that involves coordination between multiple services, i.e. GeoNode, GeoServer.
  • Require dynamic scheduling capability
    • Run on a configurable periodic frequency
    • Run on demand when some event is triggered
  • Take a long time to run
  • Interface with external agents and systems

As such, it is desirable that GeoNode tasks:

  • Run asynchronously in a different process than the main GeoNode server
  • Are built in a scalable way, such that they can run on different hosts than the main GeoNode web application
  • Provide means to monitor their execution progress at all times during their execution
  • Provide means to cancel their execution
  • Notify users of their own completion
  • Are able to deal with failure. Execute in a transaction-like fashion whereby a rollback mechanism is applied whenever there is an error.

Additionally GeoNode shall store a log of task execution details

Requirements for GeoNode workflow execution engine

The following is a set of requirements that the GeoNode workflow execution engine should meet in order to fulfill the use cases and needs exposed above.

  1. GeoNode workflows shall be composed of multiple tasks arranged in a directed graph of dependencies. The unit of work shall be the task. However, the unit of configuration shall be the workflow.

  2. The workflow execution engine shall schedule workflows (and their tasks) to run in asynchronous fashion. Moreover, tasks shall run in parallel, on different processes than the main GeoNode web service. This shall allow the main GeoNode service to maintain a smooth and speedy operation, even when heavy tasks are being run.

  3. The workflow execution engine shall allow distributing tasks over multiple hosts. This shall allow for a horizontal scaling of resources, as needed.

  4. The workflow engine shall allow multiple workflow scheduling methods, including on-demand, one-off and periodic execution

  5. Workflows shall expose a set of configuration parameters and these shall be able to be created and parametrized at runtime, i.e. without requiring a restart of any service. This includes things like modifying the frequency of execution, etc

  6. There needs to be some way by which a running workflow is able to communicate its own execution progress

  7. The workflow execution engine shall be able to terminate running workflows on demand

  8. The workflow engine shall ensure that GeoNode's is not left in an inconsistent state in case of an error when executing a workflow task. Each workflow shall ensure a transactional process: either all of the workflow's tasks run successfully or, in case of an error, all state is returned to where it was before the workflow started its execution

  9. The workflow engine shall store workflow and task execution details for subsequent analysis and processing by GeoNode. This information shall be collected and stored during task execution and be kept long after a task has ended in order to ensure a history log is preserved.

Clone this wiki locally