Multi-Blast 2.0 Design

Table of Contents

The Stack
Query Service
- Actions
- Dependencies
Report Service
- Actions
- Dependencies
Concepts
Metrics

The Stack

The Multi-Blast 2.0 service stack consists of 4 containers and 3 external dependencies.

Containers

The Query Service
The Report Service
A RabbitMQ message queue
A PostgreSQL database

Dependencies

The VEuPathDB Oracle user database.
The VEuPathDB BLAST+ databases.
An S3 instance with a bucket created for each of the Query Service and Report Service.

Query Service

The Query Service, built on the Async Platform, exposes a REST API through which API consumers may create, customize, and execute asynchronous BLAST+ query jobs against VEuPathDB’s BLAST+ databases.

REST API Documentation

The results of jobs executed through the query service will be cached for a configurable amount of time before they are automatically expired.

Expired jobs may be re-run at a later date by any user linked to the expired job.

Query Service Overview

Actions

Action	Source	Description
List Jobs	Client	Lists the query jobs that are linked to the requesting user.
Create Job	Client	Creates a new query job and may optionally link the new job to the requesting user.
Lookup Job	Client	Get the details and configuration for an existing job and optionally link the requesting user to the target job.
Restart Job	Client	Re-runs an existing job that has expired.
Update Job	Client	Update a user’s metadata attached to a job.
Delete Job	Client	Unlink a job from the requesting user.
Get Job Query	Client	Retrieve the raw query submitted for a target job.
Get Job Result	Client	Retrieve the ASN1 query result of the target job.
Get Job Errors	Client	Retrieve the stderr output for a target job.
Bulk Status Check	Client	Check the current statuses for a batch of job IDs.
Get All Targets	Client	List the BLAST+ databases currently visible to the service.
Link Guest	Client	Links the jobs associated with a guest user with a target non-guest user.
Execute Query	Queue	Asynchronously executes a BLAST+ query.

List Jobs

Look up the jobs that are linked to the requesting user. Optionally, the results may be filtered by project ID.

Workflow

Response

The response will be a list of entries representing jobs that are linked to the requesting user.

Result Objects

interface Result {
  queryJobID: string
  status:     string
  site:       string
  createdOn:  string
  userMeta?:  UserMeta
}

interface UserMeta {
  summary?:     string
  description?: string
}

Example Result

[
  {
    "queryJobID": "9f444b23ceec3ee5588cc4c784c16696",
    "site": "PlasmoDB",
    "status": "expired",
    "createdOn": "2020-10-31T23:00:00Z"
  },
  {
    "queryJobID": "bc49f1a3bc36cd15b84890439d19d395",
    "site": "TriTrypDB",
    "status": "complete",
    "createdOn": "2020-10-31T23:00:00Z",
    "userMeta": {
      "summary": "A blast job"
    }
  },
  {
    "queryJobID": "297a61dda47317f11d8e50e6ab8508c9",
    "site": "VectorBase",
    "status": "failed",
    "createdOn": "2020-10-31T23:00:00Z",
    "userMeta": {
      "summary": "Another blast job.",
      "description": "This job will fail."
    }
  }
]

Create Job

Creates a new job record if one does not already exist matching the POSTed configuration. See Job IDs.

Workflow

Query Job Submission Flow

Validate job submission

Handle new root job creation

Handle Parent Job
Handle Child Jobs

Handle existing root job w/o link

Handle existing root job with link

Result

The response will be an object containing the ID of the job that was created or found.

Result Object

interface Result {
  queryJobID: string
}

Example Result

{
  "queryJobID": "9f444b23ceec3ee5588cc4c784c16696"
}

Lookup Job

Retrieves a detailed record for a specific target job which will include the original configuration from which the job was created.

Additionally, as a simplistic form of job "sharing", users who make a request to get a job’s details may optionally be linked to the target job, adding it to the requesting user’s job collection.

To maintain compatibility with the legacy behavior of the v0.x and v1.x Multi-Blast API, the job saving behavior is opt-out only and by default users will be linked to jobs they request that they are not already linked to.

Workflow

Result

The response will be an object describing the requested job, this object will include:

job id
job status
job configuration:
- target BLAST+ databases
- target project id
blast configuration
user metadata

Result Object

interface Result {
  queryJobID:  string
  status:      string
  jobConfig:   JobConfig
  blastConfig: Object
  createdOn:   string
  userMeta?:   UserMeta
}

interface JobConfig {
  site:    string
  targets: QueryTarget[]
}

interface QueryTarget {
  targetDisplayName: string
  targetFile:        string
}

interface UserMeta {
  summary?:     string
  description?: string
}

Example Result

{
  "queryJobID": "9f444b23ceec3ee5588cc4c784c16696",
  "status": "complete",
  "jobConfig": {
    "site": "PlasmoDB",
    "targets": [
      {
        "targetDisplayName": "PfalciparumGB4",
        "targetFile": "PfalciparumGB4AnnotatedTranscripts"
      }
    ]
  },
  "blastConfig": {
    ...
  },
  "createdOn": "2020-10-31T23:00:00Z",
  "userMeta": {
    "summary": "Some blast job"
  }
}

Restart Job

Restarts an expired job. Once a job has expired from the cache, users are allowed to re-run the job without needing to resubmit the configuration.

The configuration for the job is stored and will be resubmitted to the job queue the same as if the job was brand new.

Workflow

Update Job

Updates the metadata a user has associated with a target job to which they are already linked.

Workflow

Delete Job

Removes a target job from the user’s job collection, deleting the link between the user and the target job.

Workflow

Get Job Query

Retrieves the query submitted for a job.

Workflow

Get Job Result

Retrieves the ASN1 query result generated by a query job that has completed successful.

Workflow

Get Job Errors

Retrieves the stderr output from the BLAST+ command-line tool that was executed as part of a job.

Workflow

Bulk Status Check

The bulk status check takes a JSON array of job IDs as input, and for each valid ID in the input, returns the job status in a map.

All job IDs that are found to be invalid will be ignored and will not appear in the result status map.

Workflow

Result

A JSON object containing key/value pairs of query job ID mapped to job status.

Result Type

interface Result {
  [queryJobID: string]: string
}

Example Result

{
  "dd6060e5367622e574ffb38f32bfa049": "queued",
  "29e07b0b80181222ad33cbc8f679d672": "complete",
  "748ba381dd81bb8de615319837ffa350": "in-progress",
  "f4757ea84c455b04a1d307d4ac33049d": "expired"
}

Get All Targets

Returns a tree of all the queryable BLAST+ databases that are available to use.

Workflow

Result

Result Types

interface Result {
  [project: string]: TargetMap
}

interface TargetMap {
  [target: string]: TargetDatabases
}

interface TargetDatabases {
  naTargets?: string[]
  aaTargets?: string[]
}

Example Result

{
  "PlasmoDB": {
    "Pberghei": {
      "naTargets": [
        "PbergheiESTs"
      ]
    },
    "PfalciparumGB4": {
      "naTargets": [
        "PfalciparumGB4AnnotatedCDSs",
        "PfalciparumGB4AnnotatedTranscripts",
        "PfalciparumGB4Genome"
      ],
      "aaTargets": [
        "PfalciparumGB4AnnotatedProteins"
      ]
    }
  }
}

Link Guest

RPC-like API endpoint used to migrate ownership of jobs created by a WDK guest user to a logged-in user. The use case being situations where a user creates jobs before either realizing they weren’t logged in, or deciding to create an account.

Workflow

Execute Query

Internal, asynchronous execution of a target BLAST+ command-line tool using a user provided configuration.

This execution happens in worker threads that pull jobs from the RabbitMQ message queue backing the Async Platform.

Workflow

Result

The result of the job execution will be a CLI call exit code and a list of files that will be persisted to S3 by the Async Platform.

Dependencies

S3: S3 is used to store a temporary cache of query job inputs and outputs.
RabbitMQ: RabbitMQ is used to queue up query jobs for eventual execution.
PostgreSQL: PostgreSQL is used as a backing database for queue and job history bookkeeping.
Oracle: The permanent store of job configurations and user to job-links are stored in the Oracle user database.
BLAST+ Databases: BLAST+ database files that are the targets of user queries. These have to be mounted into the running container for the service to be able to access them.

Report Service

The Report Service, built on the Async Platform, exposes a REST API through which API consumers may generate custom reports from BLAST+ queries executed using the Query Service.

REST API Documentation

Report Service Overview

Actions

Action	Source	Description
List Jobs	Client	Lists the jobs that are linked to the requesting user.
Create Job	Client	Creates a new report job and may optionally link the new job to the requesting user.
Lookup Job	Client	Get the details and configuration for an existing job.
Restart Job	Client	Re-runs an existing job that has expired.
Update Job	Client	Update a user’s metadata attached to a job.
Delete Job	Client	Unlink a job from the requesting user.
List Job Outputs	Client	List the report files generated by a target job.
Get Job Output	Client	Retrieve a report file generated by a target job.
Get Job Errors	Client	Retrieve the stderr output for a target job.
Bulk Status Check	Client	Check the current statuses for a batch of job IDs.
Link Guest	Client	Links the jobs associated with a guest user with a target non-guest user.
Execute Report	Queue	Executes the BLAST+ CLI tool `blast_formatter` using a target query job’s result as the input.

List Jobs

Looks up the jobs that are linked to the requesting user. Optionally the results may be filtered by query job ID.

Workflow

Result

The result will be a list of zero or more report job items that are linked to the requesting user and optionally limited to only those items whose target query job ID matches a provided filter value.

Result Definition

type Result = ResultItem[]

interface ResultItem {
  reportJobID: string
  queryJobID:  string
  status:      string
  userMeta?:   UserMeta
}

interface UserMeta {
  summary?:     string
  description?: string
}

Result Example

[
  {
    "reportJobID": "37b4e2d82900d5e94b8da524fbeb33c0",
    "queryJobID": "64e8bb9742929ab718dba7bc048e6120",
    "status": "failed",
    "userMeta": {
      "summary": "some report job summary"
    }
  }
]

Create Job

Creates a new job if one does not already exist matching the POSTed configuration.

If the job did not previously exist, or was previously expired, it will be queued to be executed.

Workflow

Result

The response will be an object containing the ID of the job that was created or found.

Result Object

interface Result {
  reportJobID: string
}

Example Result

{
  "reportJobID": "9f444b23ceec3ee5588cc4c784c16696"
}

Lookup Job

Retrieves a detailed record for a specific target job which will include the original configuration from which the job was created.

Additionally, as a simplistic form of job "sharing", users who make a request to get a job’s details may optionally be linked to the target job, adding it to the requesting user’s job collection.

To maintain compatibility with the legacy behavior of the v0.x and v1.x Multi-Blast API, the job saving behavior is opt-out only and by default, users will be linked to jobs they request that they are not already linked to.

Workflow

Look up job in the user database
Optionally link the requesting user to the job in the user database
Check the status of the job
Return the job details which will include:
- report job id
- query job id
- job status
- blast configuration
- user metadata

Restart Job

Restarts an expired job. Once a job has expired from the cache, users are allowed to re-run the job without needing to resubmit the configuration.

The configuration for the job is stored and will be resubmitted to the job queue the same as if the job was brand new.

Workflow

Update Job

Updates the metadata a user has associated with a target job to which they are already linked.

Workflow

Look up the target job in the user database
Verify the user is linked to the target job
Update the user’s metadata for the job in the user database

Delete Job

Removes a target job from the user’s job collection, deleting the link between the user and the target job.

Workflow

List Job Outputs

Lists the files generated by a completed report job.

Workflow

Result

This endpoint will return a listing of available files and their sizes.

Result Definition

type Result = ReportFile[]

interface ReportFile {
  name: string
  size: number
}

Result Example

[
  {
    "name": "somefile1.txt",
    "size": 1023
  },
  {
    "name": "somefile2.json",
    "size": 58372
  },
  {
    "name": "report.zip",
    "size": 10234
  }
]

Get Job Output

Retrieves the target file generated by a completed report job.

Workflow

Get Job Errors

Retrieves the stderr output from the BLAST+ command-line tool that was executed as part of a job.

Workflow

Result

The result of this call will be the stderr output from the BLAST+ CLI command call, which may be empty.

Bulk Status Check

Looks up a bulk batch of job statuses for the jobs whose IDs were requested.

Workflow

Result

A JSON object containing key/value pairs of report job ID mapped to job status.

Result Type

interface Result {
  [reportJobID: string]: string
}

Example Result

{
  "dd6060e5367622e574ffb38f32bfa049": "queued",
  "29e07b0b80181222ad33cbc8f679d672": "complete",
  "748ba381dd81bb8de615319837ffa350": "in-progress",
  "f4757ea84c455b04a1d307d4ac33049d": "expired"
}

Link Guest

Migrates the ownership of links between a target guest user and a target job to be owned by a logged-in user. The use case being situations where a WDK user creates jobs before either realizing they weren’t logged in, or deciding to create an account.

Workflow

Execute Report

Internal, asynchronous execution of the BLAST+ formatter command-line tool using a user provided configuration.

This execution happens in worker threads that pull jobs from the RabbitMQ message queue backing the Async Platform.

Workflow

Dependencies

Query Service: The query service is used to retrieve the result of the target query job on which a report will be run.
S3: S3 is used to store a temporary cache of query job inputs and outputs.
RabbitMQ: RabbitMQ is used to queue up query jobs for eventual execution.
PostgreSQL: PostgreSQL is used as a backing database for queue and job history bookkeeping.
Oracle: The permanent store of job configurations and user to job-links are stored in the Oracle user database.

Concepts

Parent & Child Jobs

When submitting a query to the Multi-Blast service, if the config is valid, one or more jobs will be created. One job will be created for the entire input, and child jobs may be created for each individual sequence in the input query.

If the input query contains only one sequence, only one job will be created, a "parent" job with no children.

If the input query contains multiple sequences, a parent job will be created for the overall input, and a child job will be created for each individual sequence in the input.

Child jobs are linked to the parent job from which they were created.

Single Sequence

Single-Sequence Query

> First
IYSLVCWPLDDPFSRPDMLSLSERMLDVWRGKQVAEDLSPLINQLSLADMIRSCERNETL

Resulting Jobs:

Name	Sequences
Parent Job 1	First

Multi-Sequence

Multi-Sequence Query

> First
IYSLVCWPLDDPFSRPDMLSLSERMLDVWRGKQVAEDLSPLINQLSLADMIRSCERNETL
> Second
QKQRAYLRSMEEKARERRRIFIQNEQARLERFAKERAERQTTTTTTTTATTPTTTTPTTT
TPTTTPTTTKAPGIP
> Third
YRPQNSSVDTVTSEQSIPVWMYGLVLLLLLSVGLLTCLSLLLSYKLKQLKVASCADSSTA
TSEPFHNVYVTTSSHYSSPYGLRREVPASPRCPPSPYPVFFKEPFVNMTA

Resulting Jobs:

Name	Sequences
Parent Job 1	First, Second, Third
Child Job 1	First
Child Job 2	Second
Child Job 3	Third

Job to User Links

TODO

Jobs may be linked to users
When creating a job, only the parent job is linked
Job link contains user metadata
When accessing a job’s details, a user may optionally be linked to a job regardless of whether it is a parent job or child job
Only jobs that are linked to the requesting user will be returned in list endpoints.
User metadata is stored on the job-to-user link

Job IDs

A job ID is a hash of the job’s configuration and query. This means that if the same configuration is submitted multiple times, the resulting job ID will be the same every time.

For the Query Service

For the Query Service, the generated job IDs are dependent on:

the BLAST+ query tool configuration
the target project ID
the input query text
the selected query targets
- the name of the target
- the name of the database file

For the Report Service

For the Report Service, the generated job IDs are dependent on:

the ID of the query service job for which the report will be generated
the BLAST+ formatter tool configuration

Metrics

The following metrics are gathered from the Multi-Blast services:

Common Metrics

Metrics common to both the query and report services.

Name	By	Params	Description
`http_total_requests`	jaxrs-container-core	HTTP method path response code	Counter of requests.
`http_request_duration`	jaxrs-container-core	HTTP method path	Histogram of request durations.
`process_total_memory`	prometheus-jvm-stats		Total memory allocated by the Java process
`process_free_memory`	prometheus-jvm-stats		Unused allocated memory
`process_active_memory`	prometheus-jvm-stats		Allocated memory currently in use
`gc_count`	prometheus-jvm-stats		Total number of garbage collections
`gc_time`	prometheus-jvm-stats		Total time used by the garbage collector
`job_failures`	lib-compute-platform	queue name	Number of async job executions that ended with a failed status
`job_successes`	lib-compute-platform	queue name	Number of async job executions that ended with a success status
`queue_time`	lib-compute-platform	queue name	Histogram of time spent by jobs waiting in the queue.
`queued_jobs`	lib-compute-platform	queue name	Gauge of the number of currently queued jobs.

Query Service

Metrics specific to the query service.

Name	Params	Description
`blast_command_time_millis`	BLAST+ tool	BLAST+ CLI tool execution time in milliseconds.

Report Service

Metrics specific to the report service.

Name	Params	Description
`blast_command_time_millis`	BLAST+ tool	BLAST+ CLI tool execution time in milliseconds.

Files

design.adoc

Latest commit

History

design.adoc

File metadata and controls

Multi-Blast 2.0 Design

The Stack

Query Service

Actions

List Jobs

Workflow

Response

Create Job

Workflow

Result

Lookup Job

Workflow

Result

Restart Job

Workflow

Update Job

Workflow

Delete Job

Workflow

Get Job Query

Workflow

Get Job Result

Workflow

Get Job Errors

Workflow

Bulk Status Check

Workflow

Result

Get All Targets

Workflow

Result

Link Guest

Workflow

Execute Query

Workflow

Result

Dependencies

Report Service

Actions

List Jobs

Workflow

Result

Create Job

Workflow

Result

Lookup Job

Workflow

Restart Job

Workflow

Update Job

Workflow

Delete Job

Workflow

List Job Outputs

Workflow

Result

Get Job Output

Workflow

Get Job Errors

Workflow

Result

Bulk Status Check

Workflow

Result

Link Guest

Workflow

Execute Report

Workflow

Dependencies

Concepts

Parent & Child Jobs

Single Sequence

Multi-Sequence

Job to User Links