Merge pull request #330 from Clinical-Genomics/remove_cases_samples

Do not save cases/samples into the sql database and work uniquely with stats runned on d4 files on the fly
Clinical-Genomics · Aug 15, 2024 · bd57ae0 · bd57ae0
2 parents 1d27222 + 719f1b8
commit bd57ae0
Show file tree

Hide file tree

Showing 22 changed files with 23 additions and 1,707 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,4 +1,6 @@
 ## [unreleased]
+### Changed
+- Do not use stored cases/samples any more and run stats exclusively on d4 files paths provided by the user in real time
 ### Added
 - Improve report explanation to better interpret average coverage and coverage completeness stats shown on the coverage report
 - Check that provided d4 files when running queries using `/coverage/d4/genes/summary` endpoint are valid, with test

diff --git a/README.md b/README.md
@@ -9,8 +9,7 @@
 
 Chanjo2 is <strong>coverage analysis tool for human clinical sequencing data</strong> using the <strong>[d4 (Dense Depth Data Dump) format][d4-article]</strong>. 
 It's implemented in Python [FastAPI][fastapi] and provides API endpoints to communicate with a d4tools software in order to 
-<strong>return coverage and coverage completeness over genomic intervals (genes, transcripts, exons as well as custom intervals)</strong> over 
-single d4 files or samples stored in the database with associated d4 files.
+<strong>return coverage and coverage completeness over genomic intervals (genes, transcripts, exons as well as custom intervals)</strong> over d4 files.
 
 
 ## Run a software demo containing test data
@@ -23,42 +22,10 @@ docker run -d --rm  -p 8000:8000 --expose 8000 clinicalgenomics/chanjo2:latest
 
 The endpoints of the app will be now reachable and described from any web browser: http://0.0.0.0:8000/docs or http://localhost:8000/docs
 
-From a terminal, you can use the API to access the data contained in the demo database of this Chanjo2 instance:
 
-### Examples of endpoints usage:
-
-#### Return available cases (cases are collections of related samples):
+From a terminal, you can use the API to access the data contained in the demo database of this Chanjo2 instance. The available demo sample contains a d4 coverage file with a limited amount of genes in genome build GRCh37, those present in [PanelApp gene panel 109 (Cerebral folate deficiency)][panelapp-109].
 
-``` shell
-curl -X 'GET' \
-  'http://localhost:8000/cases/' \
-  -H 'accept: application/json'
-```
-
-This will return a json response describing the demo case and its associated sample:
-
-``` shell
-[
-  {
-    "display_name": "643594",
-    "name": "internal_id",
-    "id": 1,
-    "samples": [
-      {
-        "coverage_file_path": "/home/worker/app/src/chanjo2/demo/panelapp_109_example.d4",
-        "display_name": "NA12882",
-        "track_name": "ADM1059A2",
-        "name": "ADM1059A2",
-        "case_id": 1,
-        "created_at": "2023-06-01T08:05:12",
-        "id": 1
-      }
-    ]
-  }
-]
-```
-
-The available demo sample contains a d4 coverage file with a limited amount of genes in genome build GRCh37, those present in [PanelApp gene panel 109 (Cerebral folate deficiency)][panelapp-109]: .
+### Examples of endpoints usage:
 
 #### Loading genes to the database
 
@@ -82,100 +49,6 @@ The response will return the number of genes inserted in the database:
 }
 ```
 
-#### Return coverage data over genes of database sample
-
-Sequencing coverage and coverage completeness statistics can be returned for genes, transcripts and exons by providing a list of genes.
-The provided gene list accepts genes in the following formats:
-- Ensembl gene IDs
-- HGNC ids
-- HGNC symbols
-
-- The user should also **specify a valid genome build - either GRCh37 or GRCh38.**
-
-For instance, to retrieve coverage stats for the demo sample (mean gene coverage and coverage completeness with sequencing depth of for instance 30, 20 and 10) over the genes the Cerebral folate deficiency PanelAPP panel (*DHFR, FOLR1, MTHFR, SLC46A1* genes), send the following POST request:
-
-``` shell
-curl -X 'POST' \
-  'http://localhost:8000/coverage/samples/genes_coverage' \
-  -H 'accept: application/json' \
-  -H 'Content-Type: application/json' \
-  -d '{
-  "build": "GRCh37",
-   "completeness_thresholds": [
-    30, 20.10
-  ],
-  "hgnc_gene_symbols": [
-    "FOLR1", "DHFR", "MTHFR", "SLC46A1"
-  ],
-  "case": "internal_id"
-}'
-```
-
-That will return this response, containing the requested statistics over the list of 4 genes:
-
-``` shell
-{
-  "ADM1059A2": [
-    {
-      "mean_coverage": 22.76,
-      "completeness": {
-        "10": 0.89,
-        "20": 0.74,
-        "30": 0.21
-      },
-      "interval_id": null,
-      "interval_type": "genes",
-      "inner_intervals": [],
-      "hgnc_id": 2861,
-      "hgnc_symbol": "DHFR",
-      "ensembl_gene_id": "ENSG00000228716"
-    },
-    {
-      "mean_coverage": 22.48,
-      "completeness": {
-        "10": 1,
-        "20": 0.77,
-        "30": 0.03
-      },
-      "interval_id": null,
-      "interval_type": "genes",
-      "inner_intervals": [],
-      "hgnc_id": 3791,
-      "hgnc_symbol": "FOLR1",
-      "ensembl_gene_id": "ENSG00000110195"
-    },
-    {
-      "mean_coverage": 22.07,
-      "completeness": {
-        "10": 1,
-        "20": 0.69,
-        "30": 0.07
-      },
-      "interval_id": null,
-      "interval_type": "genes",
-      "inner_intervals": [],
-      "hgnc_id": 7436,
-      "hgnc_symbol": "MTHFR",
-      "ensembl_gene_id": "ENSG00000177000"
-    },
-    {
-      "mean_coverage": 22.2,
-      "completeness": {
-        "10": 0.99,
-        "20": 0.7,
-        "30": 0.09
-      },
-      "interval_id": null,
-      "interval_type": "genes",
-      "inner_intervals": [],
-      "hgnc_id": 30521,
-      "hgnc_symbol": "SLC46A1",
-      "ensembl_gene_id": "ENSG00000076351"
-    }
-  ]
-}
-```
-
 To find more information on how to set up a REST server running chanjo2 please visit the software's [documentation pages][github-docs]. Here you'll find also instructions on how to populate the database with custom cases and different genomic intervals.
 
 

diff --git a/docs/index.md b/docs/index.md
@@ -36,7 +36,6 @@ The coverage computation on a file hosted on a remote server, will be consistent
 
 When chanjo2 is launched and runs as a REST server, it is offering many additional features, including:
 
-* `The possibility to store samples with associated d4 files in a SQL-based database` so that coverage info can be retrieved for sample and groupd of sample (cases).
 * `Support for calculating coverage and coverage completeness over genes, transcripts and exons for different genome builds`
 * Chanjo2 server can be either installed on a virtual environment using [poetry][python-poetry] of directly launched using [Docker][docker]
 

diff --git a/docs/usage/loading-samples.md b/docs/usage/loading-samples.md