caesar-rest

caesar-rest is a REST-ful web service for astronomical source extraction and classification with the caesar source extractor [https://github.com/SKA-INAF/caesar]. The software is developed in python and consists of a few containerized microservices, deployable on standalone servers or on a distributed cloud infrastructure. The core component is the REST web application, based on the Flask framework and running behind a nginx+uwsgi http server, and providing APIs for managing the input data (e.g. data upload/download/removal) and source finding jobs (e.g. submit, get status, get outputs) with different job management systems (Kubernetes, Slurm, Celery). Additional services (AAI, user DB, log storage, job monitor, accounting) enable the user authentication, the storage and retrieval of user data and job information, the monitoring of submitted jobs, and the aggregation of service logs and user data/job stats. Besides caesar, we also foresee to integrate other tools widely used in the radio community (e.g. Aegean, PyBDSF) and newly developed source finders based on deep learning models.

Status

This software is under development. Tested originally with python 2.7 but switched to python 3.6 later on (some apps are only available for python 3).

Credit

This software is distributed with GPLv3 license. If you use caesar-rest for your research, please add repository link or acknowledge authors in your papers.

Installation

Install dependencies

To run caesar rest service you need to install the following tools:

Flask [https://palletsprojects.com/p/flask/]
uwsgi [https://uwsgi-docs.readthedocs.io/en/latest/index.html]
nginx [https://nginx.org/]
mongodb [https://www.mongodb.com/]
flask-pymongo python module [https://flask-pymongo.readthedocs.io/en/latest/]
structlog python module [https://www.structlog.org/en/stable/]

For the Celery-based job management, you need to install celery, a broker and a result backend service:

celery [http://www.celeryproject.org/]
broker: rabbitmq [https://www.rabbitmq.com/]
result backend: redis [https://redis.io/] or mongodb [https://www.mongodb.com/]

For the Kubernetes-based job management, you need to install the Kubernetes python client library:

kubernetes [https://pypi.org/project/kubernetes/]

For the Slurm-based job management, you need to install these python modules:

requests [https://docs.python-requests.org/en/master/]
jwt [https://pypi.org/project/jwt/]

To enable OpenID Connect based authentication you need to install:

flask-oidc-ex python module [https://pypi.org/project/flask-oidc-ex/]

To enable log forwarding to a LogStash/ElasticSearch service, you need to install the filebeat service:

filebeat [https://www.elastic.co/guide/en/beats/filebeat/index.html]

Package installation

To build and install the package:

Create a local install directory, e.g. $INSTALL_DIR
Add installation path to your PYTHONPATH environment variable:
export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python3.6/site-packages
Build and install package:
python3.6 setup.py sdist bdist_wheel
python3.6 setup build
python3.6 setup install --prefix=$INSTALL_DIR

All dependencies will be automatically downloaded and installed in $INSTALL_DIR.

To use package scripts:

Add binary directory to your PATH environment variable:
export PATH=$PATH:$INSTALL_DIR/bin

App containers

Apps are run as Docker (Kuberneter deploy) or Singularity (Slurm deploy) containers. Docker images are available in DockerHub:

caesar source finder job: docker://sriggi/caesar-job
aegean source finder job: docker://sriggi/aegean-job
cutex source finder job: docker://sriggi/cutex-job
mrcnn object detector (TensorFlow 1.x): docker://sriggi/mrcnn-detect
classifier-cnn image classifier (TensorFlow 2.x): docker://sriggi/cnn-classifier
umap dimensionality reduction: docker://sriggi/umap-job
outlier-finder with Isolation Forest: docker://sriggi/outlier-finder-job
hdbscan cluster search: docker://sriggi/hdbscan-job
similarity-search: docker://sriggi/similarity-search-job

Singularity containers can be created from docker images with:

singularity pull [DOCKER URL]

Try to change these Singularity environment variables in case you don't have enough disk space for building the containers in the Singularity default cache/tmp directories:

SINGULARITY_CACHEDIR
SINGULARITY_TMPDIR

NB: You may experience this error when running Singularity containers that produces large outputs (e.g. hundreds or MB or more): OSError: [Errno 28] No space left on device. Try to increase the default value (64 MB) of the sessiondir max size parameter in Singularity configuration file /usr/local/etc/singularity/singularity.conf.

How to run the service?

In the following we describe the steps done to deploy and run the application and the auxiliary services. Three possible options are described below for the deployment, depending of whether the job management is done with celery, Kubernetes, or with Slurm. To ease the deployment we provide Docker containers and configuration files for Docker Compose or Kubernetes.

Preliminary setup

Before running the application you must do some preparatory stuff:

(OPTIONAL) Create a dedicated user & group (e.g. caesar) allowed to run the application and services and give it ownership to the directories created below * Create the application working dir (by default /opt/caesar-rest)
(OPTIONAL) Mount an external storage in the application working dir, for example using rclone: /usr/bin/rclone mount --daemon [--uid=[UID] --gid=[UID]] --umask 000 --allow-other --file-perms 0777 --dir-cache-time 0m5s --vfs-cache-mode full [RCLONE_REMOTE_STORAGE]:[RCLONE_REMOTE_STORAGE_PATH] /opt/caesar-rest -vvv where UID is the Linux user id of the user previously created.
Create the top directory for data upload (by default /opt/caesar-rest/data). Place here also supported pre-configured datasets.
Create the top directory for jobs (by default /opt/caesar-rest/jobs)
Create the top directory for models (by default /opt/caesar-rest/models) and put TensorFlow/PyTorch model & weights files under this path.
(OPTIONAL) Create the log directory for system services (see below), e.g. /opt/caesar-rest/logs
(OPTIONAL) Create the run directory for system services (see below), e.g. /opt/caesar-rest/run

Run DB service

caesar-rest requires a MongoDB service where to store user data and job information. To start the DB service:

systemctl start mongodb.service

Alternatively you can use the Docker container sriggi/caesar-rest-db:latest (see https://hub.docker.com/r/sriggi/caesar-rest-db) and deploy it with DockerCompose or Kubernetes (see the configuration files under the repository config directory.

Run Filebeat service (OPTIONAL)

caesar-rest uses filebeat to forward file logs to an ElasticSearch service. To start the service:

systemctl start filebeat.service

Alternatively, you can use the Docker container for the application sriggi/caesar-rest:latest (see https://hub.docker.com/r/sriggi/caesar-rest) setting the container option FORWARD_LOGS=1. This will start the filebeat service in the web application container.

Run Celery services (OPTIONAL)

If you want to manage jobs with Celery, you must run a message broker service (i.e. rabbitmq), a task store service (i.e. redis or mongdb) and one or more Celery worker services.
NB: Celery job management option is not developed and maintained anymore in caesar-rest application. We suggest to use Slurm or Kubernetes deployment.

Run broker service

To run the rabbimq message broker service:

systemctl start rabbitmq-server.service

Alternatively, you can use the Docker container sriggi/caesar-rest-broker:latest (see https://hub.docker.com/r/sriggi/caesar-rest-broker) and deploy it with DockerCompose or Kubernetes (see the configuration files under the repository config directory.

Run task store service

If you have chosen MongoDB as task store, you are already running the service (see previous section Run DB service). However, if you want to use Redis as task store, run it as follows:

systemctl start redis.service

Docker container is still to be produced.

Run celery workers

Run celery worker with desired concurrency level (e.g. 2), message queue (e.g. celery), broker and result backend urls:

celery --broker=[BROKER_URL] --result-backend=[RESULT_BACKEND_URL] --app=caesar_rest worker --loglevel=INFO --concurrency=2 -Q celery

In production you may want to run this as a system service:

Create a /etc/default/caesar-workers configuration file (e.g. see the example in the config/celery directory):

# The names of the workers. Only one here. 
CELERYD_NODES="caesar_worker"    

# The name of the Celery App   
CELERY_APP="caesar_rest"
 
# Working dir    
CELERYD_CHDIR="/opt/caesar-rest"    

# Additional options    
CELERYD_OPTS="--time-limit=300 --concurrency=4"

# Log and PID directories    
CELERYD_LOG_FILE="/opt/caesar-rest/logs/%n%I.log"    
CELERYD_PID_FILE="/opt/caesar-rest/run/%n.pid"    

# Log level    
CELERYD_LOG_LEVEL=INFO    

# Path to celery binary, that is in your virtual environment    
CELERY_BIN=/usr/local/bin/celery

Create a /etc/systemd/system/caesar-workers.service systemd service file:

[Unit]    
Description=Caesar Celery Worker Service    
After=network.target rabbitmq-server.target redis.target   

[Service]    
Type=forking   
User=caesar   
Group=caesar   
EnvironmentFile=/etc/default/caesar-workers     
Environment="PATH=$INSTALL_DIR/bin"   
Environment="PYTHONPATH=$INSTALL_DIR/lib/python2.7/site-packages"   
WorkingDirectory=/opt/caesar-rest   
ExecStart=/bin/sh -c '${CELERY_BIN} multi start ${CELERYD_NODES} \    
  -A ${CELERY_APP} --pidfile=${CELERYD_PID_FILE} \   
  --logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} ${CELERYD_OPTS}'    
ExecStop=/bin/sh -c '${CELERY_BIN} multi stopwait ${CELERYD_NODES} \    
  --pidfile=${CELERYD_PID_FILE}'   
ExecReload=/bin/sh -c '${CELERY_BIN} multi restart ${CELERYD_NODES} \   
  -A ${CELERY_APP} --pidfile=${CELERYD_PID_FILE} \   
  --logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} ${CELERYD_OPTS}'    

[Install]    
WantedBy=multi-user.target

Start the service:

systemctl start caesar-workers.service

Alternatively, you can use the Docker container sriggi/caesar-rest-worker:latest (https://hub.docker.com/r/sriggi/caesar-rest-worker) and deploy it with DockerCompose or Kubernetes (see the configuration files under the repository config directory.

Run Slurm services (OPTIONAL)

If you want to manage jobs with Slurm, you must run the following services:

systemctl start munge.service
systemctl start slurmd.service
systemctl start slurmdbd.service
systemctl start slurmctld.service
systemctl start slurmrestd.service

Below, we report a sample configuration file (/usr/lib/systemd/system/slurmrestd.service) for the Slurm REST service:

[Unit]
Description=Slurm REST daemon
After=network.target munge.service slurmctld.service
ConditionPathExists=/etc/slurm/slurm.conf

[Service]
Type=simple
User=caesar
Group=caesar
EnvironmentFile=-/etc/sysconfig/slurmrestd
# Default to local auth via socket
ExecStart=/usr/sbin/slurmrestd -f /etc/slurm/slurmrestd.conf -a rest_auth/jwt -s openapi/v0.0.36 -vvvv 0.0.0.0:6820
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

NB: Slurm is currently the suggested job management option for caesar-rest application.

Run the web application

Run the application in development mode

To run caesar-rest in development mode, e.g. for debug or testing purposes:

$INSTALL_DIR/bin/run_app.py --[ARGS]

where supported ARGS are:

MAIN OPTIONS

datadir=[DATADIR]: Directory where to store uploaded data (default: /opt/caesar-rest/data)
jobdir=[JOBDIR]: Top directory where to store job data (default: /opt/caesar-rest/jobs)
job_scheduler=[SCHEDULER]: Job scheduler to be used. Options are: {celery,kubernetes,slurm} (default=celery)
debug: Run Flask application in debug mode if given
ssl: To enable run of Flask application over HTTPS

AAI OPTIONS

aai: Enable service authentication
secretfile=[SECRETFILE]: File (.json) with OpenID Connect client auth credentials

DB OPTIONS

dbname=[DBNAME]: Name of MongoDB database (default=caesardb)
dbhost=[DBHOST]: Host of MongoDB database (default=localhost)
dbport=[DBPORT]: Port of MongoDB database (default=27017)

LOGGING OPTIONS

loglevel=[LEVEL]: Log level to be used (default=INFO)
logtofile: Enable logging to file (default=no)
logdir: Directory where to store logs (default=/opt/caesar-rest/logs)
logfile: Name of json log file (default=app_logs.json)
logfile_maxsize: Max file size in MB (default=5)

CELERY OPTIONS

result_backend_host=[BACKEND_HOST]: Host of Celery result backend service (default=localhost)
result_backend_port=[BACKEND_PORT]: Port of Celery result backend service (default=6379)
result_backend_proto=[BACKEND_PROTO]: Celery result backend type. Options are: {mongodb,redis} (default=redis)
result_backend_dbname=[BACKEND_DBNAME]: Celery result backend database name (default=0)
broker_host=[BROKER_HOST]: Host of Celery broker service (default=localhost)
broker_port=[BROKER_PORT]: Port of Celery broker service (default=5672)
broker_proto=[BROKER_PROTO]: Protocol of Celery broker. Options are: {amqp,redis} (default=amqp)
broker_user=[BROKER_USER]: Username used in Celery broker (default=guest)
broker_pass=[BROKER_PASS]: Password used in Celery broker (default=guest)

KUBERNETES OPTIONS

kube_config=[FILE_PATH]: Kube configuration file path (default=search in standard path)
kube_cafile=[FILE_PATH]: Kube certificate authority file path
kube_keyfile=[FILE_PATH]: Kube private key file path
kube_certfile=[FILE_PATH]: Kube certificate file path

SLURM OPTIONS

slurm_keyfile=[FILE_PATH]: Slurm rest service private key file path
slurm_user=[SLURM_USER]: Username enabled to run in Slurm cluster (default=cirasa)
slurm_host=[SLURM_HOST]: Slurm cluster host/ipaddress (default=localhost)
slurm_port=[SLURM_PORT]: Slurm rest service port (default=6820)
slurm_batch_workdir=[SLURM_BATCH_WORKDIR]: Cluster directory where to place Slurm batch logs (must be writable by slurm_user) (default=/opt/slurm/batchlogs/caesar-rest)
slurm_queue=[SLURM_QUEUE]: Slurm cluster queue for submitting jobs (default=normal)
slurm_jobdir=[SLURM_JOBDIR]: Path at which the job directory is mounted in Slurm cluster (default=/mnt/storage/jobs)
slurm_datadir=[SLURM_DATADIR]: Path at which the data directory is mounted in Slurm cluster (default=/mnt/storage/data)
slurm_max_cores_per_job=[SLURM_MAX_CORES_PER_JOB]: Slurm maximum number of cores reserved for a job (default=4)

VOLUME MOUNT OPTIONS

mount_rclone_volume: Enable mounting of Nextcloud volume through rclone in container jobs (default=no)
mount_volume_path=[PATH]: Mount volume path for container jobs (default=/mnt/storage)
rclone_storage_name=[NAME]: rclone remote storage name (default=neanias-nextcloud)
rclone_storage_path=[PATH]: rclone remote storage path (default=.)

SINGULARITY CONTAINER OPTIONS

caesar_container: Path to caesar job Singularity container (default=/opt/containers/caesar/caesar-job_latest.sif)
aegean_container: Path to aegean job Singularity container (default=/opt/containers/aegean/aegean-job_latest.sif)
cutex_container: Path to cutex job Singularity container (default=/opt/containers/cutex/cutex-job_latest.sif)
mrcnn_container: Path to caesar-mrcnn job Singularity container (default=/opt/containers/mrcnn/mrcnn-detect_latest.sif)
cnn_classifier_container: Path to CNN classifier Singularity container (default=/opt/containers/sclassifier/cnn-classifier_latest.sif)
umap_container: Path to UMAP Singularity container (default=/opt/containers/sclassifier/umap_latest.sif)
outlier_finder_container: Path to OutlierFinder Singularity container (default=/opt/containers/sclassifier/outlier_finder_latest.sif)
hdbscan_container: Path to HDBSCAN Singularity container (default=/opt/containers/sclassifier/hdbscan_latest.sif)
simsearch_container: Path to Similarity Search Singularity container (default=/opt/containers/sclassifier/similarity-search_latest.sif)

DATASET OPTIONS

dataset_smgps: Path to smgps dataset json filelist
dataset_smgps_feats_simclr: Path to smgps_feats_simclr dataset json filelist
dataset_smgps_feats_siglip: Path to smgps_feats_siglip dataset json filelist
dataset_smgps_feats_dinov2: Path to smgps_feats_dinov2 dataset json filelist
dataset_emu_pilot: Path to emu-pilot dataset json filelist
dataset_emu: Path to emu dataset json filelist
dataset_emu_scorpio_pilot: Path to emu-scorpio-pilot dataset json filelist
dataset_emu_gp_pilot: Path to emu-gp-pilot dataset json filelist

Flask default options are defined in the config.py. Celery options are defined in the celery_config.py. Other options may be defined in the future to override default Flask and Celery options.

Run the application in production

In a production environment you can run the application behind a nginx+uwsgi (or nginx+gunicorn) server. In the config directory of the repository you can find sample files to create and configure required services. For example:

Start the application with uwsgi:

uwsgi --wsgi-file $INSTALL_DIR/bin/run_app.py --callable app [WSGI_CONFIG_FILE]

where WSGI_CONFIG_FILE is a configuration file (.ini format) for uwsgi. A sample configuration file is provided in the config/uwgsi directory:
```
[uwsgi]
processes = 4   
threads = 2   
socket = ./run/caesar-rest.sock   
;socket = :5000
;http-socket = :5000
socket-timeout = 65

buffer-size = 32768  
master = true   
chmod-socket = 660   
vacuum = true  
die-on-term = true  
```
Alternatively you can configure options from command line, e.g.:

uwsgi --uid=[RUNUSER] --gid=[RUNUSER] --binary-path /usr/local/bin/uwsgi --wsgi-file=$INSTALL_DIR/bin/run_app.py --callable=app --pyargv=[APP_ARGS] --workers=[NWORKERS] --enable-threads --threads=[NTHREADS] --http-socket="0.0.0.0:[PORT]" --http-timeout=[SOCKET_TIMEOUT] --http-enable-proxy-protocol --http-auto-chunked --socket-timeout=[SOCKET_TIMEOUT] --master --chmod-socket=660 --chown-socket=[RUNUSER] --buffer-size=[BUFFER_SIZE] --vacuum --die-on-term

where APP_ARGS are the application command line options described in the previous paragraph and RUNUSER is the username chosen for running the service. The other options are described in the uwsgi online documentation.

In production you may want to run this as a system service:
- Create an /etc/systemd/system/caesar-rest.service systemd service file, for example following the example provided in the config/uwsgi directory:
```
[Unit]
Description=uWSGI instance to serve caesar-rest application    
After=network.target caesar-workers.target   

[Service]
User=caesar  
Group=www-data   
WorkingDirectory=/opt/caesar-rest  
Environment="PATH=$INSTALL_DIR/bin"   
Environment="PYTHONPATH=$INSTALL_DIR/lib/python2.7/site-packages"  
ExecStart=/usr/bin/uwsgi --wsgi-file $INSTALL_DIR/bin/run_app.py --callable app --ini /opt/caesar-rest/config/uwsgi.ini

[Install]   
WantedBy=multi-user.target    
```
- Start the service:
  sudo systemctl caesar-rest.service start
Alternatively, you can use the Docker container sriggi/caesar-rest:devel (see https://hub.docker.com/r/sriggi/caesar-rest) and deploy it with DockerCompose or Kubernetes (see the configuration files under the repository config directory. All application command line options described in the previous section can be configured from container env variables.

Start the nginx service:

Create a /etc/nginx/conf.d/nginx.conf configuration file (see example file provided in the config/nginx directory):

server {   
  listen 8080;   
  client_max_body_size 1000M;   
  sendfile on;    
  keepalive_timeout 0;   
  location / {   
    include uwsgi_params;    
    uwsgi_pass unix:/opt/caesar-rest/run/caesar-rest.sock;   
  }       
}

With this sample configuration the nginx server will listen at port 8080 and call the caesar-rest application via socket. An alternative configuration could be:

upstream backend {
  least_conn;  # load balancing strategy
  server [HOST1]:[PORT];
  server [HOST1]:[PORT];
  keepalive 64;
}

server {
  listen 8080;
  client_max_body_size 1000M;
  large_client_header_buffers 4 32k;
  sendfile on;
  keepalive_timeout 0;
  location / {
    include uwsgi_params;
    uwsgi_pass backend;
  }
}

with nginx load balancing incoming requests, sending them to 2 caesar-rest http applications listening at HOST1 and HOST2 on port PORT.

Create a /etc/systemd/system/nginx.service systemd file, e.g. see the example provided in the config/nginx directory:

[Unit]   
Description=The NGINX HTTP and reverse proxy server  
After=syslog.target network.target remote-fs.target nss-lookup.target caesar-rest.target   

[Service]   
Type=forking    
PIDFile=/run/nginx.pid   
ExecStartPre=/usr/sbin/nginx -t   
ExecStart=/usr/sbin/nginx   
ExecReload=/usr/sbin/nginx -s reload   
ExecStop=/bin/kill -s QUIT $MAINPID   
PrivateTmp=true    

[Install]   
WantedBy=multi-user.target

Run nginx server:

sudo systemctl start nginx.service

Alternatively you can use the Docker container sriggi/caesar-rest-lb:latest (see https://hub.docker.com/r/sriggi/caesar-rest-lb) and deploy it with DockerCompose. In Kubernetes this functionality is provided by ingresses (see sample configuration files).

Run job monitoring service

The job monitoring service periodically monitors user jobs, updating their status on the DB. It can be started as:

$INSTALL_DIR/bin/run_jobmonitor.py --[ARGS]

where supported ARGS are:

job_monitoring_period=[PERIOD]: Job monitoring poll period in seconds (default=30)
job_scheduler=[SCHEDULER]: Job scheduler to be used. Options are: {celery,kubernetes,slurm} (default=celery)
dbname=[DBNAME]: Name of MongoDB database (default=caesardb)
dbhost=[DBHOST]: Host of MongoDB database (default=localhost)
dbport=[DBPORT]: Port of MongoDB database (default=27017)
kube_config=[FILE_PATH]: Kube configuration file path (default=search in standard path)
kube_cafile=[FILE_PATH]: Kube certificate authority file path
kube_keyfile=[FILE_PATH]: Kube private key file path
kube_certfile=[FILE_PATH]: Kube certificate file path
slurm_keyfile=[FILE_PATH]: Slurm rest service private key file path
slurm_user=[SLURM_USER]: Username enabled to run in Slurm cluster (default=cirasa)
slurm_host=[SLURM_HOST]: Slurm cluster host/ipaddress (default=localhost)
slurm_port=[SLURM_PORT]: Slurm rest service port (default=6820)

Alternatively, you can use the Docker container sriggi/caesar-rest-jobmonitor:latest (see https://hub.docker.com/r/sriggi/caesar-rest-jobmonitor) and deploy it with DockerCompose or Kubernetes (see sample configuration files).

Run accounting service

The accounting service periodically monitors user data and job info, storing aggregated stats in the DB. It can be started as:

$INSTALL_DIR/bin/run_accounter.py --[ARGS]

where supported ARGS are:

datadir=[DATADIR]: Directory where to store uploaded data (default: /opt/caesar-rest/data)
jobdir=[JOBDIR]: Top directory where to store job data (default: /opt/caesar-rest/jobs)
job_monitoring_period=[PERIOD]: Job info monitoring poll period in seconds (default=30)
dbname=[DBNAME]: Name of MongoDB database (default=caesardb)
dbhost=[DBHOST]: Host of MongoDB database (default=localhost)
dbport=[DBPORT]: Port of MongoDB database (default=27017)
mount_rclone_volume: Enable mounting of Nextcloud volume through rclone in container jobs (default=no)
mount_volume_path=[PATH]: Mount volume path for container jobs (default=/mnt/storage)
rclone_storage_name=[NAME]: rclone remote storage name (default=neanias-nextcloud)
rclone_storage_path=[PATH]: rclone remote storage path (default=.)

Alternatively, you can use the Docker container sriggi/caesar-rest-accounter:latest (see https://hub.docker.com/r/sriggi/caesar-rest-accounter) and deploy it with DockerCompose or Kubernetes (see sample configuration files).

Usage

caesar-rest provides the following REST endpoints:

Data upload

URL:http://server-address:port/caesar/api/v1.0/upload
Request methods: POST
Request header: content-type: multipart/form-data

A sample curl request would be:

curl -X POST \   
  -H 'Content-Type: multipart/form-data' \   
  -F 'file=@VGPS_cont_MOS017.fits' \   
  --url 'http://localhost:8080/caesar/api/v1.0/upload'

Server response is:

{
  "date":"2020-04-24T17:04:26.174333",
  "filename_orig":"VGPS_cont_MOS017.fits",
  "format":"fits",
  "size":4.00726318359375,
  "status":"File uploaded with success",
  "uuid":"250fdf5ed6a044888cf4406338f9e73b"
}

A file uuid (or file path) are returned and can be used to download the file or set job input file information.

Data download

URL:http://server-address:port/caesar/api/v1.0/download/[file_id]
Request methods: GET, POST
Request header: None

A sample curl request would be:

curl  -X GET \
  --fail -o data.fits \
  --url 'http://localhost:8080/caesar/api/v1.0/download/67a49bf7555b41739095681bf52a1f99'

The above request will fail if file is not found, otherwise the downloaded file will be saves as data.fits. Without the -o argument raw output is written to stdout. If file is not found a json response is returned:

{
  "status": "File with uuid 67a49bf7555b41739095681bf52a1f99 not found on the system!"
}

Get uploaded data ids

URL:http://server-address:port/caesar/api/v1.0/fileids
Request methods: GET
Request header: None

A sample curl request would be:

curl  -X GET \
  --url 'http://localhost:8080/caesar/api/v1.0/fileids'

with response:

{"file_ids":["a668c353ba4d4c7395ad94b4e8647d92","c54db5ef95734c62a499db38587c48a5","26bc9a545c8f4f05a2c719ec5c3917e0"]}

Dataset list & description

To get the list of supported datasets:

URL:http://server-address:port/caesar/api/v1.0/datasets
Request methods: GET
Request header: none

Server response contains a list of configured datasets that can be used as inputs in job submission:

{
  "smgps": {
    "description": "A collection of 178,057 image cutouts of size 256x256 pixels extracted from the SARAO MeerKAT Galactic Plane survey (Goedhart+24)."
  },
  "smgps-feats-simclr": {
    "description": "Feature data (#512 features) obtained with a SimCLR self-supervised pre-trained model from a collection of 178,057 image cutouts of size 256x256 pixels extracted from the SARAO MeerKAT Galactic Plane survey (Goedhart+24)."
  }
}

App list

To get the list of supported apps:

URL:http://server-address:port/caesar/api/v1.0/apps
Request methods: GET
Request header: none

Server response contains a list of valid apps that can be queried for further description and used in job submission:

{
  "apps": [
    "caesar",
    "mrcnn",
    "aegean",
    "cutex",
    "classifier-cnn",
    "featextractor-simclr",
    "umap",
    "outlier-finder",
    "hdbscan",
    "similarity-search"	
  ]
}

App description

To get information about a given app:

URL:http://server-address:port/caesar/api/v1.0/app/[app_name]/describe
Request methods: GET
Request header: none

Server response contains a list of app options that can be used in job submission. Below we report a description of the umap app (url: http://server-address:port/caesar/api/v1.0/app/umap/describe):

{
	"datalist-key": {
		"advanced": 0,
		"category": "INPUT",
		"default": "data",
		"description": "Dictionary key name to be read in input datalist (default=data)",
		"enum": false,
		"mandatory": false,
		"max": "",
		"min": "",
		"subcategory": "",
		"type": "str"
	},
	"mindist": {
		"advanced": 0,
		"category": "PROCESSING",
		"default": 0.1,
		"description": " Min dist UMAP parameter (default=0.1)",
		"enum": false,
		"mandatory": false,
		"max": 1.0,
		"min": 0.0,
		"subcategory": "",
		"type": "float"
	},
	"nfeats": {
		"advanced": 0,
		"category": "PROCESSING",
		"default": 2,
		"description": "Encoded data dim in UMAP (default=2)",
		"enum": false,
		"mandatory": false,
		"max": 512,
		"min": 2,
		"subcategory": "",
		"type": "int"
	},
	"nneighbors": {
		"advanced": 0,
		"category": "PROCESSING",
		"default": 15,
		"description": "N neighbors UMAP parameter (default=15)",
		"enum": false,
		"mandatory": false,
		"max": 10000,
		"min": 1,
		"subcategory": "",
		"type": "int"
	},
	"no-logredir": {
		"advanced": 0,
		"category": "RUN",
		"description": "Do not redirect logs to output file in script",
		"enum": false,
		"mandatory": false,
		"subcategory": "",
		"type": "none"
	},
	"no-save-ascii": {
		"advanced": 0,
		"category": "OUTPUT",
		"description": "Do not save output in ascii format",
		"enum": false,
		"mandatory": false,
		"subcategory": "",
		"type": "none"
	},
	"no-save-json": {
		"advanced": 0,
		"category": "OUTPUT",
		"description": "Do not save output in json format",
		"enum": false,
		"mandatory": false,
		"subcategory": "",
		"type": "none"
	},
	"no-save-model": {
		"advanced": 0,
		"category": "OUTPUT",
		"description": "Do not save model",
		"enum": false,
		"mandatory": false,
		"subcategory": "",
		"type": "none"
	},
	"normalize_minmax": {
		"advanced": 0,
		"category": "PREPROCESSING",
		"description": "Normalize each channel in range",
		"enum": false,
		"mandatory": false,
		"subcategory": "",
		"type": "none"
	},
	"outfile-sup": {
		"advanced": 0,
		"category": "OUTPUT",
		"default": "featdata_umap_sup.dat",
		"description": "Name of UMAP encoded data output file for supervised run in ascii format (default=featdata_umap_sup.dat)",
		"enum": false,
		"mandatory": false,
		"max": "",
		"min": "",
		"subcategory": "",
		"type": "str"
	},
	"outfile-unsup": {
		"advanced": 0,
		"category": "OUTPUT",
		"default": "featdata_umap.dat",
		"description": "Name of UMAP encoded data output file in ascii format (default=featdata_umap.dat)",
		"enum": false,
		"mandatory": false,
		"max": "",
		"min": "",
		"subcategory": "",
		"type": "str"
	},
	"outfile-unsup-json": {
		"advanced": 0,
		"category": "OUTPUT",
		"default": "featdata_umap.json",
		"description": "Name of UMAP encoded data output file in json format (default=featdata_umap.json)",
		"enum": false,
		"mandatory": false,
		"max": "",
		"min": "",
		"subcategory": "",
		"type": "str"
	},
	"run-supervised": {
		"advanced": 0,
		"category": "RUN",
		"description": "Run UMAP also on labelled data alone (if available)",
		"enum": false,
		"mandatory": false,
		"subcategory": "",
		"type": "none"
	},
	"selcols": {
		"advanced": 0,
		"category": "INPUT",
		"default": "",
		"description": "Data column ids to be selected from input data, separated by commas (default=all columns)",
		"enum": false,
		"mandatory": false,
		"max": "",
		"min": "",
		"subcategory": "",
		"type": "str"
	}
}

Job submission

URL:http://server-address:port/caesar/api/v1.0/job
Request methods: POST
Request header: content-type: application/json

A sample curl request for running the caesar source finder app would be:

curl -X POST \   
  -H 'Content-Type: application/json' \   
  -d '{"app": "caesar","data_inputs": {"data": "39ca08fc5c7c446d8756a48088ee684c"},"job_options": {"run": true,"no-logredir": true,"no-mpi": true,"no-nestedsearch": true,"no-extendedsearch": true}}' \   
  --url 'http://localhost:8080/caesar/api/v1.0/job'

Job data must contain a valid app name (in this case caesar) and desired job inputs, e.g. a dictionary with app valid options. Valid options for caesar app are named as in caesar and can be retrieved using app description url described above.

Server response is:

{
  "app": "caesar",
  "data_inputs": "39ca08fc5c7c446d8756a48088ee684c",
  "job_id": "a4095b815a074d81a0cc447762aa29f1",
  "job_options": {
    "no-extendedsearch": true,
    "no-logredir": true,
    "no-mpi": true,
    "no-nestedsearch": true,
    "run": true
   },
   "state": "PENDING",
   "status": "Job submitted and registered with success",
   "submit_date": "2024-12-19T10:00:42.865802",
   "tag": ""
}

A job id is returned in the response which can be used to query the status of the job or cancel it or retrieve output data at completion.

Get job status

URL:http://server-address:port/caesar/api/v1.0/job/[job_id]/status
Request methods: GET
Request header: None

A sample curl request would be:

curl -X GET \   
  --url 'http://localhost:8080/caesar/api/v1.0/job/f135bcee-562b-4f01-ad9b-103c35b13b36/status'

Server response is:

{
  "elapsed_time": "27.3435878754",
  "exit_status": 0,
  "job_id": "f135bcee-562b-4f01-ad9b-103c35b13b36",
  "pid": "11539",
  "state": "SUCCESS",
  "status": "Process terminated with success"
}

Exit status is the shell exit status of background task executed and pid the corresponding process id. Possible job states are: {STARTED, TIMED-OUT, ABORTED, RUNNING, SUCCESS, FAILURE}.

Get job output

URL:http://server-address:port/caesar/api/v1.0/job/[job_id]/output
Request methods: GET
Request header: None

A sample curl request would be:

curl -X GET \   
  --fail -o job_output.tar.gz \
  --url 'http://localhost:8080/caesar/api/v1.0/job/c3c9348a-bea0-4141-8fe9-7f64076a2327/output'

The response is a tar.gz file containing all job directory files (logs, output data, run scripts, etc).

Cancel job

URL:http://server-address:port/caesar/api/v1.0/job/[job_id]/cancel
Request methods: POST
Request header: None

Get job ids

URL:http://server-address:port/caesar/api/v1.0/jobs
Request methods: GET
Request header: None

Name		Name	Last commit message	Last commit date
Latest commit History 372 Commits
apps		apps
caesar_rest		caesar_rest
config		config
scripts		scripts
share		share
COPYRIGHT.md		COPYRIGHT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

SKA-INAF/caesar-rest

Folders and files

Latest commit

History

Repository files navigation

caesar-rest

Status

Credit

Installation

Install dependencies

Package installation

App containers

How to run the service?

Preliminary setup

Run DB service

Run Filebeat service (OPTIONAL)

Run Celery services (OPTIONAL)

Run broker service

Run task store service

Run celery workers

Run Slurm services (OPTIONAL)

Run the web application

Run the application in development mode

Run the application in production

Run job monitoring service

Run accounting service

Usage

Data upload

Data download

Get uploaded data ids

Dataset list & description

App list

App description

Job submission

Get job status

Get job output

Cancel job

Get job ids

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages