404
+ +Page not found
+ + +diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..455f3cdc --- /dev/null +++ b/404.html @@ -0,0 +1,127 @@ + + +
+ + + + +Page not found
+ + +The Dockerfile provided in the chanjo2 repository can be built and runned to deploy the software. Alternatively an image containing chanjo2 and all its dependencies is hosted on Docker Hub.
+The file named Dockerfile
is a generic Docker file to run the application. Whenever the app is launched with the ENV param DEMO
(check the settings present in file docker-compose.yml
) it will create a test SQLite database to work with.
To start the demo via docker, run:
+docker run -d --rm -p 8000:8000 --expose 8000 clinicalgenomics/chanjo2:latest
+
+The endpoints of the app will be now reachable from any web browser: http://0.0.0.0:8000/docs or http://localhost:8000/docs
+An example of this setup is provided in the docker-compose-mysql.yml
file.
+Here we connect the app to a MySQL (MariaDB) and provide the connection settings to use it.
Note that a file containing environmnent variables is required to run this setup. The template.env
file offers an example of the required variables and can be customised according to your local settings.
To check the configuration (env variables passed to the docker-compose file) run:
+docker-compose -f docker-compose-mysql.yml --env-file template.env config
+
+The docker-compose file contains 2 services:
+- MariaDB database, runned from a Docker file that includes the script to create an empty testdb
database
+- The chanjo2 web app, a REST API
To start the demo, run:
+docker-compose -f docker-compose-mysql.yml --env-file template.env up
+
+The endpoints of the app will be now reachable from any web browser: http://0.0.0.0:8000/docs or http://localhost:8000/docs
+Keep in mind that Chanjo2 reads the variables necessary for connecting to the database from a default .env file.
+If you run a dockerized version of Chanjo2 and want to connect to a real database, you'll need to replace the default .env file with a custom environment file containing the correct settings to connect to your MySQL database.
+The last line present on the .env file (DEMO=Y
) should be removed or commented out.
Given a local database running on localhost and port 3306, a custom .env file like this:
+MYSQL_USER=dbUser
+MYSQL_PASSWORD=dbPassword
+MYSQL_DATABASE_NAME=chanjo2_test
+MYSQL_HOST_NAME=host.docker.internal
+MYSQL_PORT=3306
+
+Should suffice to override the parameters present in the default .env file of Chanjo2:
+docker run -d --rm -v $(pwd)/.env:/home/worker/app/.env -p 8000:8000 --expose 8000 clinicalgenomics/chanjo2:latest
+
+When generating coverage and genes overview reports, the metrics showcased in these documents are calculated across various coverage levels, such as 10x, 20x, and 50x.
+These specific coverage levels can be directly specified in queries to the /report
and /overview
endpoints using the completeness_thresholds parameter
, which accepts a list of integers.
+For detailed instructions, please refer to the coverage-reports documentation.
+Alternatively, you can define these coverage levels in the .env file by adding the following line:
REPORT_COVERAGE_LEVELS=[100,150,,..]
+
+If the REPORT_COVERAGE_LEVELS
parameter is not present in the .env file and a request does not include a completeness_thresholds value, the report metrics will default to the following coverage level values: [10, 15, 20, 50, 100].
It's important to note that if coverage level values are provided through multiple methods as described above, the application will prioritize them in the following order:
+Furthermore, it's worth considering that the more coverage levels provided, the longer it will take for the report pages to load.
+ +Given a conda environment containing Rust, Python >=3.8 with poetry installed, clone the repository from GitHub with the following command:
+git clone https://github.com/Clinical-Genomics/chanjo2.git
+
+The command will create a folder named chanjo2
in your current working directory. Move inside this directory:
cd chanjo2
+
+And install the software with poetry:
+poetry install
+
+If you encounter any difficulties installing the pyd4 library due to its lack of support for PEP 517 builds, simply retry the aforementioned step using a version of poetry that is less than 1.8.
+You can run a demo instance of the web server by typing:
+uvicorn src.chanjo2.main:app --reload
+
+The server will run on localhost and default port 8000 (http://0.0.0.0:8000)
+The demo server is connecting to a SQLite database whose temporary tables are destroyed and re-created every time the server is started.
+In order to connect to a permanent MYSQL database instance, you'd need to customise the settings present on the .env
file:
MYSQL_USER=dbUser
+MYSQL_PASSWORD=dbPassword
+MYSQL_DATABASE_NAME=chanjo2_test
+MYSQL_HOST_NAME=localhost
+MYSQL_PORT=3306
+
+The last line present on this file (DEMO=Y
) should be removed or commented out.
Chanjo2 is coverage analysis tool for clinical sequencing data using the d4 (Dense Depth Data Dump) format. +It's implemented in Python FastAPI and provides API endpoints to communicate with a d4tools software in order to +return coverage and coverage completeness over genomic intervals (genes, transcripts, exons as well as custom intervals) over +single d4 files or samples stored in the database with associated d4 files.
+The tool is flexible and can be used in different ways. The simplest use case would be calculating sequencing coverage over one or more intervals for a d4 file stored locally or remotely on the internet.
+Chanjo2 image contains d4tools and can be used to directly retrieve statistics over d4 files.
+docker run --entrypoint d4tools --rm clinicalgenomics/chanjo2:latest
+
+docker run --entrypoint d4tools --platform linux/x86_64 --rm -v <path-to-local-d4-files-folder>:/home/worker/infiles clinicalgenomics/chanjo2:latest view /home/worker/infiles/<d4file.d4> 1:1234560-1234580 X:1234560-1234580
+
+Please note that the d4 file containing the coverage data can be also stored on a remote server. In this case the command above could be replaced by this one:
+docker run --entrypoint d4tools --platform linux/x86_64 --rm clinicalgenomics/chanjo2:latest view <url-to-remote-d4-file.d4> 1:1234560-1234580 X:1234560-1234580
+
+The coverage computation on a file hosted on a remote server, will be consistently slower than when hosting the file on a local server.
+When chanjo2 is launched and runs as a REST server, it is offering many additional features, including:
+Support for calculating coverage and coverage completeness over genes, transcripts and exons for different genome builds
Instructions on how to set up and run Chanjo2 as a REST server as well as the functionalities that it offers are better illustrated in these dedicated pages.
+ +' + escapeHtml(summary) +'
' + noResultsText + '
'); + } +} + +function doSearch () { + var query = document.getElementById('mkdocs-search-query').value; + if (query.length > min_search_length) { + if (!window.Worker) { + displayResults(search(query)); + } else { + searchWorker.postMessage({query: query}); + } + } else { + // Clear results for short queries + displayResults([]); + } +} + +function initSearch () { + var search_input = document.getElementById('mkdocs-search-query'); + if (search_input) { + search_input.addEventListener("keyup", doSearch); + } + var term = getSearchTermFromLocation(); + if (term) { + search_input.value = term; + doSearch(); + } +} + +function onWorkerMessage (e) { + if (e.data.allowSearch) { + initSearch(); + } else if (e.data.results) { + var results = e.data.results; + displayResults(results); + } else if (e.data.config) { + min_search_length = e.data.config.min_search_length-1; + } +} + +if (!window.Worker) { + console.log('Web Worker API not supported'); + // load index in main thread + $.getScript(joinUrl(base_url, "search/worker.js")).done(function () { + console.log('Loaded worker'); + init(); + window.postMessage = function (msg) { + onWorkerMessage({data: msg}); + }; + }).fail(function (jqxhr, settings, exception) { + console.error('Could not load worker.js'); + }); +} else { + // Wrap search in a web worker + var searchWorker = new Worker(joinUrl(base_url, "search/worker.js")); + searchWorker.postMessage({init: true}); + searchWorker.onmessage = onWorkerMessage; +} diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 00000000..8f0d005a --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Chanjo2 documentation pages Chanjo2 is coverage analysis tool for clinical sequencing data using the d4 (Dense Depth Data Dump) format . It's implemented in Python FastAPI and provides API endpoints to communicate with a d4tools software in order to return coverage and coverage completeness over genomic intervals (genes, transcripts, exons as well as custom intervals) over single d4 files or samples stored in the database with associated d4 files. The tool is flexible and can be used in different ways. The simplest use case would be calculating sequencing coverage over one or more intervals for a d4 file stored locally or remotely on the internet. Chanjo2 image as a proxy to d4tools to compute coverage stats over genomic intervals of a d4 file Chanjo2 image contains d4tools and can be used to directly retrieve statistics over d4 files. Executing d4tools docker run --entrypoint d4tools --rm clinicalgenomics/chanjo2:latest Calculating coverage on specific genomic intervals of a d4 file using d4tools docker run --entrypoint d4tools --platform linux/x86_64 --rm -vAn example of this report is shown by the demo report endpoint: http://0.0.0.0:8000/report/demo (requires all genes and gene transcripts in build GRCh37 loaded into the database).
+ +The statistics from the report above are computed using the transcripts intervals present in the provided genes.
+Given an application running in production settings, with genes, transcripts and exons loaded into the database, a coverage report like the one above can be created taking into account one of these types of intervals from any gene of choice.
+The gene list might be provided as a list of Ensembl gene IDs, HGNC gene IDs or HGNC gene symbols.
+Here is what the request data of a POST request to the /report
endpoint looks like:
{
+ "build": "GRCh37",
+ "completeness_thresholds": [
+ 10,
+ 15,
+ 20,
+ 50,
+ 100
+ ],
+ "ensembl_gene_ids": [],
+ "hgnc_gene_ids": [],
+ "hgnc_gene_symbols": [],
+ "interval_type": "genes",
+ "default_level": 10,
+ "panel_name": "Custom panel",
+ "case_display_name": "internal_id",
+ "samples": [
+ {
+ "name": "string",
+ "coverage_file_path": "string",
+ "case_name": "string",
+ "analysis_date": "2023-10-04T08:22:06.980106"
+ }
+ ]
+}
+
+This type of report contains stats from incompletely covered genomic intervals at different coverage thresholds and is basically the same as the genes overview report provided by chanjo-report.
+A demo genes overview report based on genes transcripts from the demo PanelApp genes is available at the demo overview endpoint: http://0.0.0.0:8000/overview/demo (requires all genes and gene transcripts in build GRCh37 loaded into the database).
+ +To create a custom genes coverage overview report, send a POST request to the /overview
endpoint containing the same request data described above for the /report
endpoint.
This report contains statistics over all MANE Select and Mane Plus Clinical transcripts for a list of genes provided by the user.
+The /mane_overview
endpoint accepts POST request with the same data described for the 2 reports above.
Note that MANE overview reports are available only for analyses run with genome build GRCh38.
+ +Genes, transcripts and exons should be loaded and updated at regular intervals of time. Depending on the type of sequencing data analysed using chanjo2, loading of transcripts and exons might not be required. +For instance, gene coordinates should be enough for whole genome sequencing (WGS) experiments, while transcripts and exons data are necessary to return statistics from transcripts and exons-based experiments.
+Genes, transcripts and exons are retrieved from the Ensembl Biomart using the Schug[shug] library and loaded into the database in three distinct tables.
+Genes should be loaded into the database before transcripts and exons intervals. Depending on the hardware in use and the HTML connection speed, the process of loading these intervals might take some time. For this reason requests sent to these endpoints are asynchronous, so that they don't time out while processing the information.
+Loading of genes in a given genome build can be achieved by sending a POST request to the /intervals/load/genes/{<genome-build}
endpoint:
curl -X 'POST' \
+ 'http://localhost:8000/intervals/load/genes/GRCh38' \
+ -H 'accept: application/json' \
+ -d ''
+
+Please note that the process of loading genes into the database will erase eventual transcripts and exons with the same genome build that are already present in the database. This ensures that transcripts and exons intervals will be up-to-date with the latest definitions of the genes loaded into the database.
+Transcripts can be loaded/updated by using the /intervals/load/transcripts/{<genome-build}
endpoint:
curl -X 'POST' \
+ 'http://localhost:8000/intervals/load/transcripts/GRCh38' \
+ -H 'accept: application/json' \
+ -d ''
+
+As for the previous endpoints, exons are loaded by sending a POST request to the /intervals/load/exons/{<genome-build}
endpoint.
curl -X 'POST' \
+ 'http://localhost:8000/intervals/load/transcripts/GRCh38' \
+ -H 'accept: application/json' \
+ -d ''
+
+Once the database is populated with genomic intervals data, it is possible to run queries to retrieve its content.
+Genomic intervals can be queried using genes definitions. Genes can be provided as a parameter to the query in the following formats
+ensembl_ids
)hgnc_ids
)hgnc_symbols
)Genome build is always a required parameter in these queries.
+Examples:
+{
+ "build": "GRCh37",
+ "hgnc_symbols": ["LAMA1","LAMA2"]
+}
+
+curl -X 'POST' \
+ 'http://localhost:8000/intervals/transcripts' \
+ -H 'accept: application/json' \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "build": "GRCh37",
+ "ensembl_gene_ids": [
+ "ENSG00000101680", "ENSG00000196569"
+ ]
+}'
+
+curl -X 'POST' \
+ 'http://localhost:8000/intervals/exons' \
+ -H 'accept: application/json' \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "build": "GRCh37",
+ "hgnc_ids": [
+ 6481, 6482
+ ]
+}'
+
+Whenever ensembl_ids, hgnc_ids, hgnc_symbols parameter is not provided, these endpoints will return a list of 100 default genes, transcripts or exons. To increase the number of returned entries you can specify a custom value for the query limit
parameter.
Chanjo2 can be used to quickly access average coverage depth statistics for an interval from a list of genomic intervals on a custom d4 coverage file.
+When querying the server for sample coverage statistics, it is also possible to specify a custom list of numbers (example 30, 20, 10) representing the coverage thresholds that should be used to calculate the coverage completeness for each genomic interval. +This number describes the percentage of bases (as a decimal number) meeting the user-defined coverage threshold for each genomic interval.
+Coverage and coverage completeness over a genomic interval can be computed by sending a POST
request to the /coverage/d4/interval/
endpoint.
The entrypoint accepts a json query with the following parameters:
+Note that start and stop coordinates are not required and whenever they are omitted coverage statistics will be computer over the entire chromosome.
+Query example:
+curl -X 'POST' \
+ 'http://localhost:8000/coverage/d4/interval/' \
+ -H 'accept: application/json' \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "coverage_file_path": "https://d4-format-testing.s3.us-west-1.amazonaws.com/hg002.d4",
+ "chromosome": "7",
+ "start": 124822386,
+ "end": 124929983,
+ "completeness_thresholds": [
+ 10,20,30
+ ]
+}'
+
+This query wll return a response, where the mean coverage over the single interval is present under the mean_coverage key. +Coverage completeness values will be additionally returned if the query sent to the server contains values for the completeness_thresholds key. +A response from the server to the query above will return for instance a mean interval coverage of 27.2 and coverage completeness of 99.6%, 85% and 33.4% using threshold values of respectively 10, 20 and 30.
+{
+ "mean_coverage": 27.19804455514559,
+ "completeness": {
+ "10": 1,
+ "20": 0.85,
+ "30": 0.33
+ },
+ "interval_id": null,
+ "interval_type": null
+}
+
+Mean coverage can be also calculated for a list of intervals using the /coverage/d4/interval_file
endpoint. The intervals list should be provided as the path to a bed-formatted file.
The entrypoint accepts a json query with the following parameters:
+If we were to use the demo bed file provided in this repository, the query would look like this:
+curl -X 'POST' \
+ 'http://localhost:8000/coverage/d4/interval_file/' \
+ -H 'accept: application/json' \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "coverage_file_path": "https://d4-format-testing.s3.us-west-1.amazonaws.com/hg002.d4",
+ "intervals_bed_path": "<path-to-109_green.bed>",
+ "completeness_thresholds": [
+ 10,20,30
+ ]
+}'
+
+And it would return the following result:
+[
+ {
+ "mean_coverage": 22.17115629570222,
+ "completeness": {
+ "10": 1,
+ "20": 0.69,
+ "30": 0.08
+ },
+ "interval_id": null,
+ "interval_type": null
+ },
+ {
+ "mean_coverage": 83.1338549817423,
+ "completeness": {
+ "10": 1,
+ "20": 0.84,
+ "30": 0.24
+ },
+ "interval_id": null,
+ "interval_type": null
+ },
+ {
+ "mean_coverage": 252.63072816253893,
+ "completeness": {
+ "10": 1,
+ "20": 0.78,
+ "30": 0.03
+ },
+ "interval_id": null,
+ "interval_type": null
+ },
+ {
+ "mean_coverage": 141.35853114545165,
+ "completeness": {
+ "10": 0.99,
+ "20": 0.7,
+ "30": 0.09
+ },
+ "interval_id": null,
+ "interval_type": null
+ }
+]
+
+To obtain condensed statistics for one or more samples, use the /coverage/d4/genes/summary
endpoint.
+Send a request with a list of HGNC gene IDs, the path to the d4 files for the samples, and the coverage threshold for computing coverage completeness.
+The endpoint will return the average coverage and coverage completeness for all the genes included in the query.
+You need to provide a parameter interval_type
to specify whether the statistics should be computed over entire genes, gene transcripts, or exons.
curl -X 'POST' \
+ 'https://chanjo2-stage.scilifelab.se/coverage/d4/genes/summary' \
+ -H 'accept: application/json' \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "build": "GRCh37",
+ "samples": [
+ {
+ "name": "TestSample",
+ "coverage_file_path": "<path-to-d4-file.d4>"
+ }
+ ],
+ "hgnc_gene_ids": [
+ 2861, 3791, 6481, 7436, 30521
+ ],
+ "coverage_threshold": 10,
+ "interval_type": "genes"
+}'
+
+{"TestSample":{"mean_coverage":54.38,"coverage_completeness_percent":33.03}}
+
+
+