Installation

Requirements

This package requires Python3 (>=3.10)

Installation methods

Installation with Pip

This is the easiest way to install OMD Curation Toolkit. pip is the package manager for the Python Package Manager(PyPI).

## Check that python3 version (>=3.10)
python3 --version

## Install package
pip install omdctk==1.1.0

Installation with Conda env

## Create Conda env with python 3.10
conda create --name omdctk python=3.10

##Activate env
conda activate omdctk

## Install package with pip
pip install omdctk==1.1.0

Installation from source

## Clone GitHub repository
git clone https://github.com/tbcgit/omdctk.git

##Enter directory were the pyproject.toml file is located
cd omdctk

## Check that python3 version (>=3.10)
python3 --version

##Install package locally
pip install .

Test installation

To check that the installation was successful, we will use the Test omdctk program. This program will test all the programs in the package using an ENA Dataset (ENA project PRJEB10949) and an External Dataset (GSA project PRJCA001214) as examples. Take into account that you need to provide an already existing directory with the Output Directory parameter (-o parameter). This would be the location in which all the resulting files and sub-directories will be generated. Furthermore, the test may take a few minutes depending on your computer and internet connection.

Commands:

## Create Example directory
mkdir Example

## Run test
test_omdctk -o Example

Output:

################################################################
##                                                            ##
##    ___  __  __ ___      ___              _   _             ##
##   / _ \|  \/  |   \    / __|  _ _ _ __ _| |_(_)___ _ _     ##
##   |(_)|| |\/| | |) |   |(_| || | '_/ _` |  _| / _ \ ' \    ##
##   \___/|_|  |_|___/    \___\_,_|_| \__,_|\__|_\___/_||_|   ##
##                 _____         _ _   _ _                    ##
##                |_   _|__  ___| | |_(_) |_                  ##
##                  | |/ _ \/ _ \ | / / |  _|                 ##
##                  |_|\___/\___/_|_\_\_|\__|                 ##
##                                                            ##
##   test_omdctk.py                                           ##
##    * v1.1.0 - 12 Mar 2024 *                                ##
##                                                            ##
################################################################

Program Parameters:
┌──────────────────┬─────────┐
│ Argument         │ Value   │
├──────────────────┼─────────┤
│ output_directory │ Example │
│ plain_text       │ False   │
└──────────────────┴─────────┘

This may take a while...

A) Testing ENA Dataset Workflow:

Preparation:
Creating ENA Dataset Subdirectory

A1) Testing download_metadata_ENA.py:

Command:
download_metadata_ENA -p PRJEB10949 -o Example/ENA_Dataset_Example --plain_text

Result: 
Success! All expected files have been generated!

A2) Testing merge_metadata.py:

Preparation:
Coping Extra Metadata Table Example in ENA_Dataset_Example Directory

Command:
merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions -o Example/ENA_Dataset_Example --plain_text

Result: 
Success! All expected files have been generated!

A3) Testing check_metadata_ENA.py:

Preparation:
Reading package reference output log file

Command:
check_metadata_ENA -t merged_PRJEB10949_ENA_metadata.tsv --plain_text

Result: 
Success! The program stdout matches the expected output!

A4) Testing filter_metadata.py:

Preparation:
Coping Filter Table Example in ENA_Dataset_Example Directory

Command:
filter_metadata -t merged_PRJEB10949_ENA_metadata.tsv -f PRJEB10949_filterfile_example.tsv -o Example/ENA_Dataset_Example --plain_text

Result: 
Success! All expected files have been generated!

A5) Testing download_fastqs.py in ENA mode:

Preparation:
Creating Download Subdirectory in ENA_Dataset_Example Directory

Command:
download_fastqs -i filtered_merged_PRJEB10949_ENA_metadata.tsv -o Example/ENA_Dataset_Example/downloads --plain_text

Result: 
Success! All expected files have been generated!

A6) Testing check_fastqs.py in ENA mode:

Preparation:
Reading package reference output log file

Command:
check_fastqs -t filtered_merged_PRJEB10949_ENA_metadata.tsv -d Example/ENA_Dataset_Example/downloads --md5_check --plain_text

Result: 
Success! The program stdout matches the expected output!

A7) Testing make_treatment_template.py in ENA mode:

Command:
make_treatment_template -i filtered_merged_PRJEB10949_ENA_metadata.tsv -d Example/ENA_Dataset_Example/downloads --extra_sample_columns sample_column -o Example/ENA_Dataset_Example --plain_text

Result: 
Success! All expected files have been generated!

A8) Testing treat_fastqs.py:

Preparation:
Coping Treatment Template Example in ENA_Dataset_Example Directory
Creating Treated Files Subdirectory in ENA_Dataset_Example Directory

Command:
treat_fastqs -t treatment_template_filtered_PRJEB10949_merged_metadata_example.tsv -i Example/ENA_Dataset_Example/downloads -o Example/ENA_Dataset_Example/treated_files --plain_text

Result: 
Success! All expected files have been generated!

A9) Testing treat_metadata.py in ENA mode:

Command:
treat_metadata -t treatment_template_filtered_PRJEB10949_merged_metadata_example.tsv -m filtered_merged_PRJEB10949_ENA_metadata.tsv -o Example/ENA_Dataset_Example --extra_no_warning_columns Run Sample run_accessions run_label --plain_text

Result: 
Success! All expected files have been generated!

B) Testing External Dataset Workflow:

Preparation:
Creating External Dataset Subdirectory

B1) Testing merge_metadata.py:

Preparation:
Coping Generic Main Metadata Table Example in External_Dataset_Example Directory
Coping Generic Extra Metadata Table Example in External_Dataset_Example Directory

Command:
merge_metadata -m CRA001372_main_metadata_example.tsv -mc Sample_name -e CRA001372_publication_metadata_example.tsv -ec sample_id -o Example/External_Dataset_Example -es _publication --plain_text

Result: 
Success! All expected files have been generated!

B2) Testing filter_metadata.py:

Preparation:
Coping Filter Table Example in External_Dataset_Example Directory

Command:
filter_metadata -t merged_CRA001372_main_metadata_example.tsv -f CRA001372_filterfile_example.tsv -o Example/External_Dataset_Example --plain_text

Result: 
Success! All expected files have been generated!

B3) Testing download_fastqs.py in LINKS mode:

Preparation:
Creating Download Subdirectory in External_Dataset_Example Directory
Coping URLs TXT File Example in External_Dataset_Example Directory

Command:
download_fastqs -m LINKS -i filtered_CRA001372_URLS_example.txt -o Example/External_Dataset_Example/downloads --plain_text

Result: 
Success! All expected files have been generated!

B4) Testing check_fastqs.py in Generic mode:

Preparation:
Reading package reference output log file
Coping Manifest File Example in External_Dataset_Example Directory

Command:
check_fastqs -s Generic -t filtered_merged_CRA001372_main_metadata_example.tsv -d Example/External_Dataset_Example/downloads -a filtered_manifest_CRA001372_example.tsv -p '.fq.gz' --md5_check --plain_text

Result: 
Success! The program stdout matches the expected output!

B5) Testing make_treatment_template.py in Generic mode:

Command:
make_treatment_template -s Generic -i filtered_manifest_CRA001372_example.tsv -d Example/External_Dataset_Example/downloads -p '.fq.gz' -r1 '_1.fq.gz' -r2 '_2.fq.gz' -o Example/External_Dataset_Example --plain_text

Result: 
Success! All expected files have been generated!

B6) Testing treat_fastqs.py:

Preparation:
Coping Treatment Template Example in External_Dataset_Example Directory
Creating Treated Files Subdirectory in External_Dataset_Example Directory

Command:
treat_fastqs -t treatment_template_filtered_CRA001372_example.tsv -i Example/External_Dataset_Example/downloads -p '.fq.gz' -r1 '_1.fq.gz' -r2 '_2.fq.gz' -o Example/External_Dataset_Example/treated_files --plain_text

Result: 
Success! All expected files have been generated!

B7) Testing treat_metadata.py in Generic mode:

Command:
treat_metadata -s Generic -t treatment_template_filtered_CRA001372_example.tsv -m filtered_merged_CRA001372_main_metadata_example.tsv -p '.fq.gz' -r1 '_1.fq.gz' -r2 '_2.fq.gz' -o Example/External_Dataset_Example --plain_text

Result: 
Success! All expected files have been generated!

C) Testing Multidatasets Programs:

C1) Testing concat_datasets.py:

Preparation:
Coping Curated ENA Metadata Table Example in Output Directory
Coping Curated External Metadata Table Example in Output Directory
Coping Variables Dictionary Example in Output Directory

Command:
concat_datasets -i Example -d variables_dictionary_example.tsv -op 'example' -o Example --plain_text

Result: 
Success! All expected files have been generated!

C2) Testing check_metadata_values.py:

Preparation:
Reading package reference output log file

Command:
check_metadata_values -t example_concatenated_final_metadata.tsv -d variables_dictionary_example.tsv --plain_text

Result: 
Success! The program stdout matches the expected output!

To see a full and detailed example of dataset curation, see the Tutorial Full Example page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly