Skip to content

Commit

Permalink
deleters, importers ... update
Browse files Browse the repository at this point in the history
  • Loading branch information
mbaudis committed Sep 16, 2024
1 parent 51c4ebc commit 4dcee29
Show file tree
Hide file tree
Showing 13 changed files with 256 additions and 216 deletions.
47 changes: 1 addition & 46 deletions docs/applications.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,52 +39,7 @@ as well as some other statistics (e.g. CNV coverage per chromosomal arms ...).
* `bin/analysesStatusmapsRefresher.py -d progenetix --filters "pgx:icdom-81703"`
* `bin/analysesStatusmapsRefresher.py -d cellz --filters "cellosaurus:CVCL_0312"`

--------------------------------------------------------------------------------

### `collationsCreator`

The `collationsCreator` script updates the dataset specific `collations` collections
which provide the aggregated data (sample numbers, hierarchy trees etc.) for all
individual codes belonging to one of the entities defined in the `filter_definitions`
in the `bycon` configuration. The (optional) hierarchy data is provided
in `rsrc/classificationTrees/__filterType__/numbered-hierarchies.tsv` as a list
of ordered branches in the format `code | label | depth | order`.

**TBD** The filter definition should be one of the configuration where users can
provide additions and overrides in the `byconaut/local` directory.

#### Arguments

* `-d`, `--datasetIds` ... to select the dataset (only one per run)
* `--filters` ... to (optionally) limit the processing to a subset of samples
(e.g. after a limited update)

#### Use

* `bin/collationsCreator.py -d progenetix`
* `bin/collationsCreator.py -d examplez --collationTypes "PMID"`

--------------------------------------------------------------------------------

### `frequencymapsCreator`

This app creates the frequency maps for the "collations" collection. Basically,
all samples matching any of the collation codes and representing CNV analyses
are selected and the frequencies of CNVs per genomic bin are aggregated. The
result contains teh gain and loss frquencies for all genomic intervals, for the
given entity.

#### Arguments

* `-d`, `--datasetIds` ... to select the dataset (only one per run)
* `--collationTypes` ... to (optionally) limit the processing to a selected
collation types (e.g. `NCIT`, `PMID`, `icdom` ...)

#### Use

* `bin/frequencymapsCreator.py -d progenetix`
* `bin/frequencymapsCreator.py -d examplez --collationTypes "icdot"`

-------------------------------------------------------------------------------

## Utility apps

Expand Down
10 changes: 10 additions & 0 deletions docs/housekeeping.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,13 @@ Records are deleted by providing a standard pgx-style tab-delimited metadata fil
where only the corresponding `..._id` column is essential. As example, the
`deleteIndividuals.py` app will take a table which includes a column `individual_id`
and use these values to delete the matching records.

### Deleting variants

Variant `id` values are generated upon insertion and are not supposed to be
stable or recoverable. For variants it only makes sense to perform management
at the `analysis` level. Therefore variants should be deleted removing the
corresponding analyses and their variants using the `deleteAnalysesWDS.py` app.
Also, when inserting variants through `importers/variantsInserter.py` by default
all existing variants with the `id` values corresponding to any of the `analysis_id`
values in the variants file are being purged before inserting the variants themselves.
31 changes: 31 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,33 @@ mongorestore --db $database .../mongodump/examplez/

### Option B: Create your own databases

#### Core Data

A basic setup for a Beacon compatible database - as supported by the `bycon` package -
consists of the core data collections mirroring the Beacon default data model:

* `variants`
* `analyses` (which covers parameters from both Beacon `analysis` and `run` entity schemas)
* `biosamples`
* `individuals`

Databases are implemented in an existing MongoDB setup using utility applications
contained in the `importers` directory by importing data from tab-delimited data
files. In principle, only 2 import files are needed for inserting and updating of records:
* a file for the non-variant metadata[^1] with specific header values, where as
the absolute minimum id values for the different entities have to be provided
* a file for genomic variants, again with specific headers but also containing
the upstream ids for the corresponding analysis, biosample and individual

Examples:

```
individual_id biosample_id analysis_id
pgxind-kftx25eh pgxbs-kftva59y pgxcs-kftvldsu
```

#### Further and optional procedures

1. Create database and variants collection
2. update the local `bycon` installation for your database information andlocal parameters
* database name(s)
Expand All @@ -50,3 +77,7 @@ mongorestore --db $database .../mongodump/examplez/

Please see the [helper apps documentation](applications/#data-transformation-database-maintenance).



[^1]: Metadata in biomedical genomics is "everything but the sequence variation"

Empty file modified housekeepers/deleteAnalyses.py
100644 → 100755
Empty file.
Empty file modified housekeepers/deleteAnalysesWDS.py
100644 → 100755
Empty file.
Empty file modified housekeepers/deleteBiosamples.py
100644 → 100755
Empty file.
Empty file modified housekeepers/deleteBiosamplesWDS.py
100644 → 100755
Empty file.
Empty file modified housekeepers/deleteIndividuals.py
100644 → 100755
Empty file.
Empty file modified housekeepers/deleteIndividualsWDS.py
100644 → 100755
Empty file.
30 changes: 30 additions & 0 deletions housekeepers/recordsMoverWDS.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env python3

from os import pardir, path
from bycon import *

loc_path = path.dirname( path.abspath(__file__) )
lib_path = path.join(loc_path , pardir, "importers", "lib")
sys.path.append( lib_path )
from importer_helpers import *

"""
./housekeepers/recordsMoverWDS.py -d progenetix --output cellz -i ./imports/1kdeltest.tsv --testMode false
"""

################################################################################
################################################################################
################################################################################

def main():
initialize_bycon_service()
BI = ByconautImporter()
BI.move_individuals_and_downstream()


################################################################################
################################################################################
################################################################################

if __name__ == '__main__':
main()
Loading

0 comments on commit 4dcee29

Please sign in to comment.