The Multi-Blast 2.0 service stack consists of 4 containers and 3 external dependencies.
-
The Query Service
-
The Report Service
-
A RabbitMQ message queue
-
A PostgreSQL database
-
The VEuPathDB Oracle user database.
-
The VEuPathDB BLAST+ databases.
-
An S3 instance with a bucket created for each of the Query Service and Report Service.
The Query Service, built on the Async Platform, exposes a REST API through which API consumers may create, customize, and execute asynchronous BLAST+ query jobs against VEuPathDB’s BLAST+ databases.
The results of jobs executed through the query service will be cached for a configurable amount of time before they are automatically expired.
Expired jobs may be re-run at a later date by any user linked to the expired job.
Action | Source | Description |
---|---|---|
Client |
Lists the query jobs that are linked to the requesting user. |
|
Client |
Creates a new query job and may optionally link the new job to the requesting user. |
|
Client |
Get the details and configuration for an existing job and optionally link the requesting user to the target job. |
|
Client |
Re-runs an existing job that has expired. |
|
Client |
Update a user’s metadata attached to a job. |
|
Client |
Unlink a job from the requesting user. |
|
Client |
Retrieve the raw query submitted for a target job. |
|
Client |
Retrieve the ASN1 query result of the target job. |
|
Client |
Retrieve the stderr output for a target job. |
|
Client |
Check the current statuses for a batch of job IDs. |
|
Client |
List the BLAST+ databases currently visible to the service. |
|
Client |
Links the jobs associated with a guest user with a target non-guest user. |
|
Queue |
Asynchronously executes a BLAST+ query. |
Look up the jobs that are linked to the requesting user. Optionally, the results may be filtered by project ID.
The response will be a list of entries representing jobs that are linked to the requesting user.
interface Result {
queryJobID: string
status: string
site: string
createdOn: string
userMeta?: UserMeta
}
interface UserMeta {
summary?: string
description?: string
}
[
{
"queryJobID": "9f444b23ceec3ee5588cc4c784c16696",
"site": "PlasmoDB",
"status": "expired",
"createdOn": "2020-10-31T23:00:00Z"
},
{
"queryJobID": "bc49f1a3bc36cd15b84890439d19d395",
"site": "TriTrypDB",
"status": "complete",
"createdOn": "2020-10-31T23:00:00Z",
"userMeta": {
"summary": "A blast job"
}
},
{
"queryJobID": "297a61dda47317f11d8e50e6ab8508c9",
"site": "VectorBase",
"status": "failed",
"createdOn": "2020-10-31T23:00:00Z",
"userMeta": {
"summary": "Another blast job.",
"description": "This job will fail."
}
}
]
Creates a new job record if one does not already exist matching the POSTed configuration. See Job IDs.
Retrieves a detailed record for a specific target job which will include the original configuration from which the job was created.
Additionally, as a simplistic form of job "sharing", users who make a request to get a job’s details may optionally be linked to the target job, adding it to the requesting user’s job collection.
To maintain compatibility with the legacy behavior of the v0.x
and v1.x
Multi-Blast API, the job saving behavior is opt-out only and by default users
will be linked to jobs they request that they are not already linked to.
The response will be an object describing the requested job, this object will include:
-
job id
-
job status
-
job configuration:
-
target BLAST+ databases
-
target project id
-
-
blast configuration
-
user metadata
interface Result {
queryJobID: string
status: string
jobConfig: JobConfig
blastConfig: Object
createdOn: string
userMeta?: UserMeta
}
interface JobConfig {
site: string
targets: QueryTarget[]
}
interface QueryTarget {
targetDisplayName: string
targetFile: string
}
interface UserMeta {
summary?: string
description?: string
}
{
"queryJobID": "9f444b23ceec3ee5588cc4c784c16696",
"status": "complete",
"jobConfig": {
"site": "PlasmoDB",
"targets": [
{
"targetDisplayName": "PfalciparumGB4",
"targetFile": "PfalciparumGB4AnnotatedTranscripts"
}
]
},
"blastConfig": {
...
},
"createdOn": "2020-10-31T23:00:00Z",
"userMeta": {
"summary": "Some blast job"
}
}
Restarts an expired job. Once a job has expired from the cache, users are allowed to re-run the job without needing to resubmit the configuration.
The configuration for the job is stored and will be resubmitted to the job queue the same as if the job was brand new.
Updates the metadata a user has associated with a target job to which they are already linked.
Removes a target job from the user’s job collection, deleting the link between the user and the target job.
Retrieves the ASN1 query result generated by a query job that has completed successful.
Retrieves the stderr output from the BLAST+ command-line tool that was executed as part of a job.
The bulk status check takes a JSON array of job IDs as input, and for each valid ID in the input, returns the job status in a map.
All job IDs that are found to be invalid will be ignored and will not appear in the result status map.
A JSON object containing key/value pairs of query job ID mapped to job status.
interface Result {
[queryJobID: string]: string
}
{
"dd6060e5367622e574ffb38f32bfa049": "queued",
"29e07b0b80181222ad33cbc8f679d672": "complete",
"748ba381dd81bb8de615319837ffa350": "in-progress",
"f4757ea84c455b04a1d307d4ac33049d": "expired"
}
Returns a tree of all the queryable BLAST+ databases that are available to use.
interface Result {
[project: string]: TargetMap
}
interface TargetMap {
[target: string]: TargetDatabases
}
interface TargetDatabases {
naTargets?: string[]
aaTargets?: string[]
}
{
"PlasmoDB": {
"Pberghei": {
"naTargets": [
"PbergheiESTs"
]
},
"PfalciparumGB4": {
"naTargets": [
"PfalciparumGB4AnnotatedCDSs",
"PfalciparumGB4AnnotatedTranscripts",
"PfalciparumGB4Genome"
],
"aaTargets": [
"PfalciparumGB4AnnotatedProteins"
]
}
}
}
RPC-like API endpoint used to migrate ownership of jobs created by a WDK guest user to a logged-in user. The use case being situations where a user creates jobs before either realizing they weren’t logged in, or deciding to create an account.
Internal, asynchronous execution of a target BLAST+ command-line tool using a user provided configuration.
This execution happens in worker threads that pull jobs from the RabbitMQ message queue backing the Async Platform.
- S3
-
S3 is used to store a temporary cache of query job inputs and outputs.
- RabbitMQ
-
RabbitMQ is used to queue up query jobs for eventual execution.
- PostgreSQL
-
PostgreSQL is used as a backing database for queue and job history bookkeeping.
- Oracle
-
The permanent store of job configurations and user to job-links are stored in the Oracle user database.
- BLAST+ Databases
-
BLAST+ database files that are the targets of user queries. These have to be mounted into the running container for the service to be able to access them.
The Report Service, built on the Async Platform, exposes a REST API through which API consumers may generate custom reports from BLAST+ queries executed using the Query Service.
Action | Source | Description |
---|---|---|
Client |
Lists the jobs that are linked to the requesting user. |
|
Client |
Creates a new report job and may optionally link the new job to the requesting user. |
|
Client |
Get the details and configuration for an existing job. |
|
Client |
Re-runs an existing job that has expired. |
|
Client |
Update a user’s metadata attached to a job. |
|
Client |
Unlink a job from the requesting user. |
|
Client |
List the report files generated by a target job. |
|
Client |
Retrieve a report file generated by a target job. |
|
Client |
Retrieve the stderr output for a target job. |
|
Client |
Check the current statuses for a batch of job IDs. |
|
Client |
Links the jobs associated with a guest user with a target non-guest user. |
|
Queue |
Executes the BLAST+ CLI tool |
Looks up the jobs that are linked to the requesting user. Optionally the results may be filtered by query job ID.
The result will be a list of zero or more report job items that are linked to the requesting user and optionally limited to only those items whose target query job ID matches a provided filter value.
type Result = ResultItem[]
interface ResultItem {
reportJobID: string
queryJobID: string
status: string
userMeta?: UserMeta
}
interface UserMeta {
summary?: string
description?: string
}
[
{
"reportJobID": "37b4e2d82900d5e94b8da524fbeb33c0",
"queryJobID": "64e8bb9742929ab718dba7bc048e6120",
"status": "failed",
"userMeta": {
"summary": "some report job summary"
}
}
]
Creates a new job if one does not already exist matching the POSTed configuration.
If the job did not previously exist, or was previously expired, it will be queued to be executed.
Retrieves a detailed record for a specific target job which will include the original configuration from which the job was created.
Additionally, as a simplistic form of job "sharing", users who make a request to get a job’s details may optionally be linked to the target job, adding it to the requesting user’s job collection.
To maintain compatibility with the legacy behavior of the v0.x
and v1.x
Multi-Blast API, the job saving behavior is opt-out only and by default, users
will be linked to jobs they request that they are not already linked to.
Restarts an expired job. Once a job has expired from the cache, users are allowed to re-run the job without needing to resubmit the configuration.
The configuration for the job is stored and will be resubmitted to the job queue the same as if the job was brand new.
Updates the metadata a user has associated with a target job to which they are already linked.
Removes a target job from the user’s job collection, deleting the link between the user and the target job.
Lists the files generated by a completed report job.
This endpoint will return a listing of available files and their sizes.
type Result = ReportFile[]
interface ReportFile {
name: string
size: number
}
[
{
"name": "somefile1.txt",
"size": 1023
},
{
"name": "somefile2.json",
"size": 58372
},
{
"name": "report.zip",
"size": 10234
}
]
Retrieves the stderr output from the BLAST+ command-line tool that was executed as part of a job.
Looks up a bulk batch of job statuses for the jobs whose IDs were requested.
A JSON object containing key/value pairs of report job ID mapped to job status.
interface Result {
[reportJobID: string]: string
}
{
"dd6060e5367622e574ffb38f32bfa049": "queued",
"29e07b0b80181222ad33cbc8f679d672": "complete",
"748ba381dd81bb8de615319837ffa350": "in-progress",
"f4757ea84c455b04a1d307d4ac33049d": "expired"
}
Migrates the ownership of links between a target guest user and a target job to be owned by a logged-in user. The use case being situations where a WDK user creates jobs before either realizing they weren’t logged in, or deciding to create an account.
- Query Service
-
The query service is used to retrieve the result of the target query job on which a report will be run.
- S3
-
S3 is used to store a temporary cache of query job inputs and outputs.
- RabbitMQ
-
RabbitMQ is used to queue up query jobs for eventual execution.
- PostgreSQL
-
PostgreSQL is used as a backing database for queue and job history bookkeeping.
- Oracle
-
The permanent store of job configurations and user to job-links are stored in the Oracle user database.
When submitting a query to the Multi-Blast service, if the config is valid, one or more jobs will be created. One job will be created for the entire input, and child jobs may be created for each individual sequence in the input query.
If the input query contains only one sequence, only one job will be created, a "parent" job with no children.
If the input query contains multiple sequences, a parent job will be created for the overall input, and a child job will be created for each individual sequence in the input.
Child jobs are linked to the parent job from which they were created.
> First
IYSLVCWPLDDPFSRPDMLSLSERMLDVWRGKQVAEDLSPLINQLSLADMIRSCERNETL
Name | Sequences |
---|---|
Parent Job 1 |
First |
> First
IYSLVCWPLDDPFSRPDMLSLSERMLDVWRGKQVAEDLSPLINQLSLADMIRSCERNETL
> Second
QKQRAYLRSMEEKARERRRIFIQNEQARLERFAKERAERQTTTTTTTTATTPTTTTPTTT
TPTTTPTTTKAPGIP
> Third
YRPQNSSVDTVTSEQSIPVWMYGLVLLLLLSVGLLTCLSLLLSYKLKQLKVASCADSSTA
TSEPFHNVYVTTSSHYSSPYGLRREVPASPRCPPSPYPVFFKEPFVNMTA
Name | Sequences |
---|---|
Parent Job 1 |
First, Second, Third |
Child Job 1 |
First |
Child Job 2 |
Second |
Child Job 3 |
Third |
TODO
-
Jobs may be linked to users
-
When creating a job, only the parent job is linked
-
Job link contains user metadata
-
When accessing a job’s details, a user may optionally be linked to a job regardless of whether it is a parent job or child job
-
Only jobs that are linked to the requesting user will be returned in list endpoints.
-
User metadata is stored on the job-to-user link
A job ID is a hash of the job’s configuration and query. This means that if the same configuration is submitted multiple times, the resulting job ID will be the same every time.
For the Query Service, the generated job IDs are dependent on:
-
the BLAST+ query tool configuration
-
the target project ID
-
the input query text
-
the selected query targets
-
the name of the target
-
the name of the database file
-
For the Report Service, the generated job IDs are dependent on:
-
the ID of the query service job for which the report will be generated
-
the BLAST+ formatter tool configuration
The following metrics are gathered from the Multi-Blast services:
Metrics common to both the query and report services.
Name | By | Params | Description |
---|---|---|---|
|
jaxrs-container-core |
|
Counter of requests. |
|
jaxrs-container-core |
|
Histogram of request durations. |
|
prometheus-jvm-stats |
Total memory allocated by the Java process |
|
|
prometheus-jvm-stats |
Unused allocated memory |
|
|
prometheus-jvm-stats |
Allocated memory currently in use |
|
|
prometheus-jvm-stats |
Total number of garbage collections |
|
|
prometheus-jvm-stats |
Total time used by the garbage collector |
|
|
lib-compute-platform |
|
Number of async job executions that ended with a failed status |
|
lib-compute-platform |
|
Number of async job executions that ended with a success status |
|
lib-compute-platform |
|
Histogram of time spent by jobs waiting in the queue. |
|
lib-compute-platform |
|
Gauge of the number of currently queued jobs. |
Metrics specific to the query service.
Name | Params | Description |
---|---|---|
|
|
BLAST+ CLI tool execution time in milliseconds. |