Skip to content

Commit

Permalink
docs: add introduction and system overview (#2405)
Browse files Browse the repository at this point in the history
Co-authored-by: Theo Sanderson <theo@sndrsn.co.uk>
  • Loading branch information
chaoran-chen and theosanderson committed Aug 8, 2024
1 parent f11acf0 commit b1ad460
Show file tree
Hide file tree
Showing 5 changed files with 51 additions and 1 deletion.
9 changes: 8 additions & 1 deletion docs/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,17 @@ export default defineConfig({
github: "https://github.com/loculus-project/loculus",
},
sidebar: [
{
label: "Introduction",
items: [
{ label: "What is Loculus?", link: "/introduction/what-is-loculus/" },
{ label: "Glossary", link: "/introduction/glossary/" },
{ label: "System overview", link: "/introduction/system-overview/" },
],
},
{
label: "Guides",
items: [
// Each item here is one entry in the navigation menu.
{ label: "Getting started", link: "/guides/getting-started/" },
{ label: "User administration", link: "/guides/user-administration/" },
],
Expand Down
File renamed without changes.
29 changes: 29 additions & 0 deletions docs/src/content/docs/introduction/system-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: System overview
description: Overview of the architecture of Loculus
---

Below, we provide an overview of the components of Loculus. This is in particular important if you would like to set up an own instance or join the development team but also useful if you would like to interact with a Loculus instance through an API.

Loculus has a modular architecture and consists of several sub-services:


- **Backend server:** The backend is the central service of Loculus and most other services interact with it. It manages all persistent data (such as the sequence entries and submitting groups) except the data for user authentication which are managed by Keycloak.
- **Backend database:** The backend server stores all the data in a PostgreSQL database and it is the only service that has direct access to this database.
- **Keycloak:** Loculus uses the open-source software [Keycloak](https://github.com/keycloak/keycloak) for identity and access management which supports features such as password reset and two-factor authentication. Each Loculus instance includes a Keycloak instance. (Note: Keycloak is a local/on-premise software, data are not sent to any cloud service!)
- **Keycloak database:** Keycloak also uses a PostgreSQL database to store its data (this may be hosted by the same PostgreSQL server as the backend database).
- **SILO(s):** [SILO](https://github.com/GenSpectrum/LAPIS-SILO) is an open-source query engine for genetic sequences optimized for high performance and supporting alignment-specific queries such as mutation searches. It regularly pulls data from the backend server and indexes them. By default, SILO is not exposed to the users but accessed via LAPIS. For each [organism](../glossary#organism) of a Loculus [instance](../glossary#instance), there is a separate instance of SILO.
- **LAPIS(es):** [LAPIS](https://github.com/GenSpectrum/LAPIS) provides a convenient interface to SILO, offering a lightweight web API and additional data and compression formats. For each SILO instance, there is a corresponding LAPIS instance.
- **Website:** The frontend application of Loculus accesses the APIs of the backend server and LAPIS. It uses the backend server for everything related to data submission and LAPIS for searching and downloading released data. For logins and registrations, users are redirected to Keycloak.
- **Preprocessing pipeline(s):** A preprocessing pipeline fetches [unprocessed/user-submitted data](../glossary#unprocessed-data) from the backend server, processes them (which usually includes cleaning, alignment and adding annotations), and sends [processed data](../glossary#processed-data) back to the backend server. The pipeline contains [organism](../glossary#organism)-specific logic, thus, there is a separate pipeline for each organism. We maintain a customizeable preprocessing pipeline that uses [Nextclade](https://github.com/nextstrain/nextclade) for alignment, quality checks and annotations but it is easy to write a new one by following the [preprocessing pipeline specifications](https://github.com/loculus-project/loculus/blob/main/preprocessing/specification.md).

![Architecture overview](../../../../../backend/docs/plantuml/architectureOverview.svg)

## Motivation

Why did we choose this (rather complex) architecture? Here are some of the key advantages:

- **Performant query engine and existing API:** By using LAPIS and SILO (which were originally developed for [CoV-Spectrum](https://cov-spectrum.org)), Loculus reuses a query engine that has proven to be able to handle datasets with millions of viral sequences and offers a simple and flexible API.
- **Pathogen agnostic:** By keeping the preprocessing pipeline separate from the backend server, it is very simple to develop new and pathogen-specific pipelines. The pipelines do not need to be written in the same programming language as the backend but it is possible to use any language and framework and call existing tools.
- **Secure and convenient authentication:** By using Keycloak, Loculus offers a secure authentication system with support for two-factor authentication and social identity providers such as ORCID. It is also possible to connect it with LDAP or Active Directory servers which might be useful for institution-internal systems.
- **Easy to back up:** Despite having many sub-services, it is simple to create backups because many services do not persist own data. All data are stored in only two PostgreSQL databases: the backend database and the Keycloak database.
13 changes: 13 additions & 0 deletions docs/src/content/docs/introduction/what-is-loculus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: What is Loculus?
description: A brief description of Loculus
---

Loculus is a software package to power microbial genomial databases. Major features include:

- Upload and storage of consensus sequences and metadata using a simple web interface or a web API
- Flexible data preprocessing: Loculus comes with a [Nextclade](https://clades.nextstrain.org)-based preprocessing pipeline that is able to align and translate sequences, but it is also easy to implement and plug-in your own pipeline using custom tools.
- Powerful searches: Loculus provides a user-friendly interface to search and view sequences as well as an API to query the data using [LAPIS](https://github.com/GenSpectrum/LAPIS) in the backend.
- Highly configurable: The list of metadata fields is fully configurable and Loculus supports both single- and multi-segmented genomes.

Loculus aims to be widely useful to groups who need to manage sequencing data. It can be used by small public health or research laboratories with a few members for storing their own, internal data as well as by international databases facilitating global pathogen sequence sharing.
1 change: 1 addition & 0 deletions docs/src/content/docs/reference/limitations.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Limitations
description: Current limitations of Loculus
draft: true
---

**Genome size**: Loculus is currently designed for viral genomes. It will not be performant for very large genomes beyond, say, a megabase in size.

0 comments on commit b1ad460

Please sign in to comment.