Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/wscleaner incorporation #548

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/on-pull-request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,4 @@ jobs:
- name: Test with pytest
# We do not want it to run the email tests because the credentials are not stored in GitHub
run: |
python3 -m pytest -k 'not email'
python3 -m pytest -k 'not email and not wscleaner'
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ seglh_naming.egg-info/
venv/
temp/
.coverage
*data_unzipped
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ This repository contains the main scripts for routine analysis of clinical next
|[demultiplex.py](demultiplex.py) | Command line | Demultiplex (excluding TSO runs) and calculate cluster density for Illumina NGS data using `bcl2fastq2` [(guide)](demultiplex/README.md) |
| [setoff_workflows.py](setoff_workflows.py) | Command line | Upload NGS data to DNAnexus and trigger in-house workflows [(guide)](setoff_workflows/README.md) |
| [upload_runfolder](upload_runfolder) | Command line or module import | Uploads an Illumina runfolder to DNAnexus [(guide)](upload_runfolder/README.md)|
| [wscleaner](wscleaner) | Command line | Automates the deletion of runfolders that have been uploaded
to the DNAnexus cloud storage service [(guide)](wscleaner/README.md)|

# Assumptions / Requirements

Expand All @@ -16,6 +18,8 @@ Each runfolder must be discrete per workflow, therefore must consist of only one
* SNP
* WES
* Custom Panels / LRPCR
* ONCODEEP
* DEV (with or without UMIs)

The type of run is detected by the scripts by matching the Pan numbers within the sample names in the corresponding samplesheet to the pan numbers in the [panel_config](config/panel_config.py).

Expand Down Expand Up @@ -52,18 +56,18 @@ The below diagram is a UML class diagram showing the relationships between the c
| [demultiplex](demultiplex) | orange | Demultiplex (excluding TSO runs) and calculate cluster density for Illumina NGS data using `bcl2fastq2` [(guide)](demultiplex/README.md) |
| [setoff_workflows](setoff_workflows) | pink | Upload NGS data to DNAnexus and trigger in-house workflows [(guide)](setoff_workflows/README.md) |
| [toolbox](toolbox) | grey | Contains classes and functions shared [(guide)](toolbox/README.md) |
| [upload_runfolder](upload_runfolder) | purple | Uploads an Illumina runfolder to DNAnexus [(guide)](upload_runfolder/README.md) |
| [upload_runfolder](upload_runfolder) | sand | Uploads an Illumina runfolder to DNAnexus [(guide)](upload_runfolder/README.md) |
| [wscleaner](wscleaner) | purple | Automates the deletion of runfolders that have been uploaded
to the DNAnexus cloud storage service | [(guide)](wscleaner/README.md) |

### Class and Package Diagrams

Class and package diagrams were generated by running the following command from the project root:

```bash
pyreverse -o png -p automate_demultiplex . --ignore=test --source-roots . --colorized --color-palette=#CBC3E3,#99DDFF,#44BB99,#BBCC33,#EEDD88,#EE8866,#FFAABB,#DDDDDD --output-directory img/
pyreverse -o png -p automate_demultiplex . --ignore=test --source-roots . --colorized --color-palette=#CBC3E3,#99DDFF,#44BB99,#BBCC33,#EEDD88,#EE8866,#FFAABB,#DDDDDD,#eab676 --output-directory img/
```



## Package Diagram
![alt text](img/packages_automate_demultiplex.png)

Expand Down Expand Up @@ -97,11 +101,12 @@ The above image describes the possible associations in the Class Diagram. In the
Bcl2fastq output | STDOUT and STDERR from bcl2fastq2 | `bcl2fastq2_output.log` | Within the runfolder |
| ss_validator | Records runfolder-level logs for the samplesheet_validator script | `RUNFOLDERNAME_samplesheet_validator_script.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/samplesheet_validator_script_logfiles/` |
| backup | Records the logs from the upload runfolder script | `RUNFOLDERNAME_upload_runfolder.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/upload_runfolder_script_logfiles/` |
| wscleaner | Records the logs from the wscleaner script | `TIMESTAMP_wscleaner.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/wscleaner/` |


# Pytest

[test](test) contains test data ([/test/data](../test/data)) and test scripts (these use pytest).
[test](test) contains test data ([/test/data](../test/data)), and test scripts within individual modules (these use pytest).

Tests can be executed using the following command. It is important to include the ignore flag to prevent pytest from scanning for tests through all test files, which slows down the tests considerably

Expand All @@ -116,11 +121,12 @@ Currently test suite coverage is as follows:
| Module | Coverage |
| ------ | -------- |
| [ad_email.py](ad_email/ad_email.py) | 94 |
| [ad_logger.py](ad_logger/ad_logger.py) | 81 |
| [demultiplex.py](demultiplex/demultiplex.py) | 76 |
| [ad_logger.py](ad_logger/ad_logger.py) | 100 |
| [demultiplex.py](demultiplex/demultiplex.py) | 83 |
| [setoff_workflows.py](setoff_workflows/setoff_workflows.py) | 0 |
| [upload_runfolder.py](upload_runfolder/upload_runfolder.py) | 0 |
| [toolbox.py](toolbox/toolbox.py) | 0 |
| [toolbox.py](toolbox/toolbox.py) | 78 |
| [wscleaner.py](wscleaner/wscleaner.py) | 70 |


**TESTS AND TEST CASES/FILES *MUST* BE MAINTAINED AND UPDATED ACCORDINGLY IN CONJUNCTION WITH SCRIPT DEVELOPMENT**
Expand Down
12 changes: 8 additions & 4 deletions test/test_ad_email.py → ad_email/test_ad_email.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,20 @@
N.B. test_email_sending_success() will only pass when running on the
workstation where the required auth details are stored
"""

import os
import pytest
from .conftest import logger_obj
from ad_email.ad_email import AdEmail
from config.ad_config import AdEmailConfig

logger_obj = logger_obj
from ..conftest import test_data_temp
from ad_logger import ad_logger

# TODO finish this test suite as it is currently incomplete

@pytest.fixture(scope="function")
def logger_obj():
temp_log = os.path.join(test_data_temp, "temp.log")
return ad_logger.AdLogger(__name__, "demux", temp_log).get_logger()


class TestAdEmail:
"""
Expand Down
5 changes: 3 additions & 2 deletions ad_logger/ad_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ def get_logging_formatter() -> str:
)


def set_root_logger() -> None:
def set_root_logger() -> object:
"""
Set up root logger and add stream handler and syslog handler - we only want to add these once
else it will duplicate log messages to the terminal. All loggers named with the same stem
as the root logger will use these same syslog handler and stream handler
:return None:
:return logger: Logging object
"""
sensitive_formatter=SensitiveFormatter(get_logging_formatter())
logger = logging.getLogger(AdLoggerConfig.REPO_NAME)
Expand All @@ -55,6 +55,7 @@ def set_root_logger() -> None:
syslog_handler,
]
)
return logger


def shutdown_logs(logger: logging.Logger) -> None:
Expand Down
1 change: 0 additions & 1 deletion test/test_ad_logger.py → ad_logger/test_ad_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,3 @@ def test_get_loggers(self, logfiles_config, caplog):
)
assert loggers[logger_name].name in caplog.text


9 changes: 9 additions & 0 deletions config/ad_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -596,3 +596,12 @@ class URConfig:
STRINGS = {
"upload_started": "Upload started", # Statement to write to DNAnexus upload started file
}

class RunfolderCleanupConfig():
"""
Runfolder Cleanup configuration
"""
TIMESTAMP = TIMESTAMP
RUNFOLDER_PATTERN = RUNFOLDER_PATTERN
RUNFOLDERS = RUNFOLDERS
CREDENTIALS = CREDENTIALS
4 changes: 2 additions & 2 deletions config/log_msgs_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
"fastq_valid": "Gzip --test determined that the fastq is valid: %s",
"fastq_invalid": "Gzip --test determined that the fastq is not valid: %s. Stdout: %s. Stderr: %s",
"demux_success": "Demultiplexing was successful for the run with all fastqs valid",
"wes_batch_nos_identified": "WES batch numbers %s identified",
"wes_batch_nos_missing": "WES batch numbers missing. Check for errors in the sample names. Script exited",
},
"ad_email": {
"sending_email": "Sending the email message: %s",
Expand Down Expand Up @@ -146,8 +148,6 @@
"upload_rf_error": (
"An error occurred when uploading the rest of the runfolder: %s. See %s and %s for further details. Script exited"
),
"wes_batch_nos_identified": "WES batch numbers %s identified",
"wes_batch_nos_missing": "WES batch numbers missing. Check for errors in the sample names. Script exited",
"library_no_err": "Unable to identify library numbers. Script exited. Check for underscores in the sample names.",
"checking_fastq": "Checking fastq has been collected: %s",
"sample_match": "Fastq in the BaseCalls directory matches the sample name in the SampleSheet: %s, %s",
Expand Down
74 changes: 32 additions & 42 deletions test/conftest.py → conftest.py
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Variables used across test modules, including the setup and teardown fixture
that is run before and after every test
that is run before and after every test. This is the top-level testing configuration
"""
import os
import re
Expand All @@ -14,17 +14,34 @@
from toolbox import toolbox
from config import ad_config

# Variables used across test classes

# TODO prevent logging writing to syslog when in testing mode


test_data_dir = os.path.abspath("data") # Data directory
test_data_dir_unzipped = os.path.join(
test_data_dir, "data_unzipped/"
) # Unzips data tar to here
test_data_temp = os.path.abspath("temp") # Copies data to here for each test
# Place interop in test 7, test 9, test 11

temp_log_dir = os.path.join(test_data_temp, "automate_demultiplexing_logfiles")
temp_samplesheet_logdir = os.path.join(
temp_log_dir, "samplesheet_validator_script_logfiles"
)

# TODO prevent logging writing to syslog when in testing mode
source_runfolder_dirs = os.path.join(
test_data_dir_unzipped, "demultiplex_test_files/test_runfolders/"
)


temp_runfolderdir = os.path.join(
test_data_temp, "data_unzipped/demultiplex_test_files/test_runfolders/"
)


to_copy_interop_to = [
os.path.join(source_runfolder_dirs, "999999_A01229_0000_00000TEST7/InterOp/"),
os.path.join(source_runfolder_dirs, "999999_A01229_0000_00000TEST9/InterOp/"),
os.path.join(source_runfolder_dirs, "999999_A01229_0000_0000TEST11/InterOp/"),
]

data_tars = [
{
"src": os.path.join(test_data_dir, "demultiplex_test_files.tar.gz"),
Expand All @@ -47,31 +64,15 @@
"dest": os.path.join(test_data_dir_unzipped, "InterOp"),
},
]
source_runfolder_dirs = os.path.join(
test_data_dir_unzipped, "demultiplex_test_files/test_runfolders/"
)

to_copy_interop_to = [
os.path.join(source_runfolder_dirs, "999999_A01229_0000_00000TEST7/InterOp/"),
os.path.join(source_runfolder_dirs, "999999_A01229_0000_00000TEST9/InterOp/"),
os.path.join(source_runfolder_dirs, "999999_A01229_0000_0000TEST11/InterOp/"),
]

temp_runfolderdir = os.path.join(
test_data_temp, "data_unzipped/demultiplex_test_files/test_runfolders/"
)
temp_log_dir = os.path.join(test_data_temp, "automate_demultiplexing_logfiles")
temp_samplesheet_logdir = os.path.join(
temp_log_dir, "samplesheet_validator_script_logfiles"
)
# Temp directory for SampleSheet validator SampleSheet test cases
sv_samplesheet_temp_dir = os.path.join(test_data_temp, "data_unzipped/samplesheets")


@pytest.fixture(scope="function")
def logger_obj():
temp_log = os.path.join(test_data_temp, "temp.log")
return ad_logger.AdLogger(__name__, "demux", temp_log).get_logger()
def patch_toolbox(monkeypatch):
"""
Apply patches required for toolbox script. These point the paths to the
temporary locations:
- Test logfiles in the temp logfiles dir and within the temp runfolder dirs
"""
monkeypatch.setattr(toolbox.ToolboxConfig, "RUNFOLDERS", temp_runfolderdir)
monkeypatch.setattr(toolbox.ToolboxConfig, "AD_LOGDIR", temp_log_dir)


def create_logdirs():
Expand All @@ -86,16 +87,6 @@ def create_logdirs():
os.makedirs(parent_dir, exist_ok=True)


def patch_toolbox(monkeypatch):
"""
Apply patches required for toolbox script. These point the paths to the
temporary locations:
- Test logfiles in the temp logfiles dir and within the temp runfolder dirs
"""
monkeypatch.setattr(toolbox.ToolboxConfig, "RUNFOLDERS", temp_runfolderdir)
monkeypatch.setattr(toolbox.ToolboxConfig, "AD_LOGDIR", temp_log_dir)


@pytest.fixture(scope="session", autouse=True)
def run_before_and_after_session():
"""
Expand All @@ -106,7 +97,6 @@ def run_before_and_after_session():
os.makedirs(
test_data_dir_unzipped, exist_ok=True
) # Holds the unzipped data to copy from for each test

for tar in data_tars:
with tarfile.open(tar["src"], "r:gz") as open_tar:
open_tar.extractall(path=tar["dest"])
Expand Down
8 changes: 8 additions & 0 deletions data/test_dir_1_fastqs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
TSTRUN01_01_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
TSTRUN01_01_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
TSTRUN01_02_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
TSTRUN01_02_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
TSTRUN01_03_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
TSTRUN01_03_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
TSTRUN01_04_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
TSTRUN01_04_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
8 changes: 8 additions & 0 deletions data/test_dir_2_fastqs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
TSTRUN02_01_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
TSTRUN02_01_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
TSTRUN02_02_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
TSTRUN02_02_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
TSTRUN02_03_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
TSTRUN02_03_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
TSTRUN02_04_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
TSTRUN02_04_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
Loading
Loading