moka-guys · RachelDuffin · Jul 10, 2024 · Jul 10, 2024 · Jul 10, 2024 · Jul 10, 2024
diff --git a/.github/workflows/on-pull-request.yaml b/.github/workflows/on-pull-request.yaml
@@ -46,4 +46,4 @@ jobs:
       - name: Test with pytest
       # We do not want it to run the email tests because the credentials are not stored in GitHub
         run: |
-          python3 -m pytest -k 'not email'
+          python3 -m pytest -k 'not email and not wscleaner'
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,4 @@ seglh_naming.egg-info/
 venv/
 temp/
 .coverage
+*data_unzipped
diff --git a/README.md b/README.md
@@ -7,6 +7,8 @@ This repository contains the main scripts for routine analysis of clinical next
 |[demultiplex.py](demultiplex.py) | Command line | Demultiplex (excluding TSO runs) and calculate cluster density for Illumina NGS data using `bcl2fastq2` [(guide)](demultiplex/README.md) |
 | [setoff_workflows.py](setoff_workflows.py) | Command line | Upload NGS data to DNAnexus and trigger in-house workflows [(guide)](setoff_workflows/README.md) |
 | [upload_runfolder](upload_runfolder) | Command line or module import | Uploads an Illumina runfolder to DNAnexus [(guide)](upload_runfolder/README.md)|
+| [wscleaner](wscleaner) | Command line | Automates the deletion of runfolders that have been uploaded
+to the DNAnexus cloud storage service [(guide)](wscleaner/README.md)|
 
 # Assumptions / Requirements
 
@@ -16,6 +18,8 @@ Each runfolder must be discrete per workflow, therefore must consist of only one
 * SNP
 * WES
 * Custom Panels / LRPCR
+* ONCODEEP
+* DEV (with or without UMIs)
 
 The type of run is detected by the scripts by matching the Pan numbers within the sample names in the corresponding samplesheet to the pan numbers in the [panel_config](config/panel_config.py).
 
@@ -52,18 +56,18 @@ The below diagram is a UML class diagram showing the relationships between the c
 | [demultiplex](demultiplex) | orange | Demultiplex (excluding TSO runs) and calculate cluster density for Illumina NGS data using `bcl2fastq2` [(guide)](demultiplex/README.md) |
 | [setoff_workflows](setoff_workflows) | pink | Upload NGS data to DNAnexus and trigger in-house workflows [(guide)](setoff_workflows/README.md) |
 | [toolbox](toolbox) | grey | Contains classes and functions shared [(guide)](toolbox/README.md) |
-| [upload_runfolder](upload_runfolder) | purple | Uploads an Illumina runfolder to DNAnexus [(guide)](upload_runfolder/README.md) |
+| [upload_runfolder](upload_runfolder) | sand | Uploads an Illumina runfolder to DNAnexus [(guide)](upload_runfolder/README.md) |
+| [wscleaner](wscleaner) | purple | Automates the deletion of runfolders that have been uploaded
+to the DNAnexus cloud storage service | [(guide)](wscleaner/README.md) |
 
 ### Class and Package Diagrams
 
 Class and package diagrams were generated by running the following command from the project root:
 
 ```bash
-pyreverse -o png -p automate_demultiplex . --ignore=test --source-roots . --colorized --color-palette=#CBC3E3,#99DDFF,#44BB99,#BBCC33,#EEDD88,#EE8866,#FFAABB,#DDDDDD --output-directory img/
+pyreverse -o png -p automate_demultiplex . --ignore=test --source-roots . --colorized --color-palette=#CBC3E3,#99DDFF,#44BB99,#BBCC33,#EEDD88,#EE8866,#FFAABB,#DDDDDD,#eab676 --output-directory img/
 ```
 
-
-
 ## Package Diagram
 ![alt text](img/packages_automate_demultiplex.png)
 
@@ -97,11 +101,12 @@ The above image describes the possible associations in the Class Diagram. In the
  Bcl2fastq output | STDOUT and STDERR from bcl2fastq2 | `bcl2fastq2_output.log` | Within the runfolder |
 | ss_validator | Records runfolder-level logs for the samplesheet_validator script | `RUNFOLDERNAME_samplesheet_validator_script.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/samplesheet_validator_script_logfiles/` |
 | backup | Records the logs from the upload runfolder script | `RUNFOLDERNAME_upload_runfolder.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/upload_runfolder_script_logfiles/` |
+| wscleaner | Records the logs from the wscleaner script | `TIMESTAMP_wscleaner.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/wscleaner/` |
 
 
 # Pytest
 
-[test](test) contains test data ([/test/data](../test/data)) and test scripts (these use pytest).
+[test](test) contains test data ([/test/data](../test/data)), and test scripts within individual modules (these use pytest).
 
 Tests can be executed using the following command. It is important to include the ignore flag to prevent pytest from scanning for tests through all test files, which slows down the tests considerably
 
@@ -116,11 +121,12 @@ Currently test suite coverage is as follows:
 | Module | Coverage |
 | ------ | -------- |
 | [ad_email.py](ad_email/ad_email.py) | 94 |
-| [ad_logger.py](ad_logger/ad_logger.py) | 81 |
-| [demultiplex.py](demultiplex/demultiplex.py) | 76 |
+| [ad_logger.py](ad_logger/ad_logger.py) | 100 |
+| [demultiplex.py](demultiplex/demultiplex.py) | 83 |
 | [setoff_workflows.py](setoff_workflows/setoff_workflows.py) | 0 |
 | [upload_runfolder.py](upload_runfolder/upload_runfolder.py) | 0 |
-| [toolbox.py](toolbox/toolbox.py) | 0 |
+| [toolbox.py](toolbox/toolbox.py) | 78 |
+| [wscleaner.py](wscleaner/wscleaner.py) | 70 |
 
 
 **TESTS AND TEST CASES/FILES *MUST* BE MAINTAINED AND UPDATED ACCORDINGLY IN CONJUNCTION WITH SCRIPT DEVELOPMENT**

diff --git a/test/test_ad_email.py → ad_email/test_ad_email.py b/test/test_ad_email.py → ad_email/test_ad_email.py
@@ -3,16 +3,20 @@
 N.B. test_email_sending_success() will only pass when running on the
 workstation where the required auth details are stored
 """
-
+import os
 import pytest
-from .conftest import logger_obj
 from ad_email.ad_email import AdEmail
 from config.ad_config import AdEmailConfig
-
-logger_obj = logger_obj
+from ..conftest import test_data_temp
+from ad_logger import ad_logger
 
 # TODO finish this test suite as it is currently incomplete
 
+@pytest.fixture(scope="function")
+def logger_obj():
+    temp_log = os.path.join(test_data_temp, "temp.log")
+    return ad_logger.AdLogger(__name__, "demux", temp_log).get_logger()
+
 
 class TestAdEmail:
     """

diff --git a/ad_logger/ad_logger.py b/ad_logger/ad_logger.py
@@ -31,12 +31,12 @@ def get_logging_formatter() -> str:
     )
 
 
-def set_root_logger() -> None:
+def set_root_logger() -> object:
     """
     Set up root logger and add stream handler and syslog handler - we only want to add these once
     else it will duplicate log messages to the terminal. All loggers named with the same stem
     as the root logger will use these same syslog handler and stream handler
-        :return None:
+        :return logger: Logging object
     """
     sensitive_formatter=SensitiveFormatter(get_logging_formatter())
     logger = logging.getLogger(AdLoggerConfig.REPO_NAME)
@@ -55,6 +55,7 @@ def set_root_logger() -> None:
             syslog_handler,
         ]
     )
+    return logger
 
 
 def shutdown_logs(logger: logging.Logger) -> None:

diff --git a/test/test_ad_logger.py → ad_logger/test_ad_logger.py b/test/test_ad_logger.py → ad_logger/test_ad_logger.py
@@ -48,4 +48,3 @@ def test_get_loggers(self, logfiles_config, caplog):
             )
             assert loggers[logger_name].name in caplog.text
 
-
diff --git a/config/ad_config.py b/config/ad_config.py
@@ -596,3 +596,12 @@ class URConfig:
     STRINGS = {
         "upload_started": "Upload started",  # Statement to write to DNAnexus upload started file
     }
+
+class RunfolderCleanupConfig():
+    """
+    Runfolder Cleanup configuration
+    """
+    TIMESTAMP = TIMESTAMP
+    RUNFOLDER_PATTERN = RUNFOLDER_PATTERN
+    RUNFOLDERS = RUNFOLDERS
+    CREDENTIALS = CREDENTIALS
diff --git a/config/log_msgs_config.py b/config/log_msgs_config.py
@@ -45,6 +45,8 @@
         "fastq_valid": "Gzip --test determined that the fastq is valid: %s",
         "fastq_invalid": "Gzip --test determined that the fastq is not valid: %s. Stdout: %s. Stderr: %s",
         "demux_success": "Demultiplexing was successful for the run with all fastqs valid",
+        "wes_batch_nos_identified": "WES batch numbers %s identified",
+        "wes_batch_nos_missing": "WES batch numbers missing. Check for errors in the sample names. Script exited",
     },
     "ad_email": {
         "sending_email": "Sending the email message: %s",
@@ -146,8 +148,6 @@
         "upload_rf_error": (
             "An error occurred when uploading the rest of the runfolder: %s. See %s and %s for further details. Script exited"
         ),
-        "wes_batch_nos_identified": "WES batch numbers %s identified",
-        "wes_batch_nos_missing": "WES batch numbers missing. Check for errors in the sample names. Script exited",
         "library_no_err": "Unable to identify library numbers. Script exited. Check for underscores in the sample names.",
         "checking_fastq": "Checking fastq has been collected: %s",
         "sample_match": "Fastq in the BaseCalls directory matches the sample name in the SampleSheet: %s, %s",

diff --git a/test/conftest.py → conftest.py b/test/conftest.py → conftest.py
@@ -1,6 +1,6 @@
 """
 Variables used across test modules, including the setup and teardown fixture
-that is run before and after every test
+that is run before and after every test. This is the top-level testing configuration
 """
 import os
 import re
@@ -14,17 +14,34 @@
 from toolbox import toolbox
 from config import ad_config
 
-# Variables used across test classes
-
-# TODO prevent logging writing to syslog when in testing mode
-
-
 test_data_dir = os.path.abspath("data")  # Data directory
 test_data_dir_unzipped = os.path.join(
     test_data_dir, "data_unzipped/"
 )  # Unzips data tar to here
 test_data_temp = os.path.abspath("temp")  # Copies data to here for each test
-# Place interop in test 7, test 9, test 11
+
+temp_log_dir = os.path.join(test_data_temp, "automate_demultiplexing_logfiles")
+temp_samplesheet_logdir = os.path.join(
+    temp_log_dir, "samplesheet_validator_script_logfiles"
+)
+
+# TODO prevent logging writing to syslog when in testing mode
+source_runfolder_dirs = os.path.join(
+    test_data_dir_unzipped, "demultiplex_test_files/test_runfolders/"
+)
+
+
+temp_runfolderdir = os.path.join(
+    test_data_temp, "data_unzipped/demultiplex_test_files/test_runfolders/"
+)
+
+
+to_copy_interop_to = [
+    os.path.join(source_runfolder_dirs, "999999_A01229_0000_00000TEST7/InterOp/"),
+    os.path.join(source_runfolder_dirs, "999999_A01229_0000_00000TEST9/InterOp/"),
+    os.path.join(source_runfolder_dirs, "999999_A01229_0000_0000TEST11/InterOp/"),
+]
+
 data_tars = [
     {
         "src": os.path.join(test_data_dir, "demultiplex_test_files.tar.gz"),
@@ -47,31 +64,15 @@
         "dest": os.path.join(test_data_dir_unzipped, "InterOp"),
     },
 ]
-source_runfolder_dirs = os.path.join(
-    test_data_dir_unzipped, "demultiplex_test_files/test_runfolders/"
-)
-
-to_copy_interop_to = [
-    os.path.join(source_runfolder_dirs, "999999_A01229_0000_00000TEST7/InterOp/"),
-    os.path.join(source_runfolder_dirs, "999999_A01229_0000_00000TEST9/InterOp/"),
-    os.path.join(source_runfolder_dirs, "999999_A01229_0000_0000TEST11/InterOp/"),
-]
-
-temp_runfolderdir = os.path.join(
-    test_data_temp, "data_unzipped/demultiplex_test_files/test_runfolders/"
-)
-temp_log_dir = os.path.join(test_data_temp, "automate_demultiplexing_logfiles")
-temp_samplesheet_logdir = os.path.join(
-    temp_log_dir, "samplesheet_validator_script_logfiles"
-)
-# Temp directory for SampleSheet validator SampleSheet test cases
-sv_samplesheet_temp_dir = os.path.join(test_data_temp, "data_unzipped/samplesheets")
 
-
-@pytest.fixture(scope="function")
-def logger_obj():
-    temp_log = os.path.join(test_data_temp, "temp.log")
-    return ad_logger.AdLogger(__name__, "demux", temp_log).get_logger()
+def patch_toolbox(monkeypatch):
+    """
+    Apply patches required for toolbox script. These point the paths to the
+    temporary locations:
+        - Test logfiles in the temp logfiles dir and within the temp runfolder dirs
+    """
+    monkeypatch.setattr(toolbox.ToolboxConfig, "RUNFOLDERS", temp_runfolderdir)
+    monkeypatch.setattr(toolbox.ToolboxConfig, "AD_LOGDIR", temp_log_dir)
 
 
 def create_logdirs():
@@ -86,16 +87,6 @@ def create_logdirs():
             os.makedirs(parent_dir, exist_ok=True)
 
 
-def patch_toolbox(monkeypatch):
-    """
-    Apply patches required for toolbox script. These point the paths to the
-    temporary locations:
-        - Test logfiles in the temp logfiles dir and within the temp runfolder dirs
-    """
-    monkeypatch.setattr(toolbox.ToolboxConfig, "RUNFOLDERS", temp_runfolderdir)
-    monkeypatch.setattr(toolbox.ToolboxConfig, "AD_LOGDIR", temp_log_dir)
-
-
 @pytest.fixture(scope="session", autouse=True)
 def run_before_and_after_session():
     """
@@ -106,7 +97,6 @@ def run_before_and_after_session():
     os.makedirs(
         test_data_dir_unzipped, exist_ok=True
     )  # Holds the unzipped data to copy from for each test
-
     for tar in data_tars:
         with tarfile.open(tar["src"], "r:gz") as open_tar:
             open_tar.extractall(path=tar["dest"])

diff --git a/data/test_dir_1_fastqs.txt b/data/test_dir_1_fastqs.txt
@@ -0,0 +1,8 @@
+TSTRUN01_01_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
+TSTRUN01_01_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
+TSTRUN01_02_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
+TSTRUN01_02_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
+TSTRUN01_03_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
+TSTRUN01_03_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
+TSTRUN01_04_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
+TSTRUN01_04_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
diff --git a/data/test_dir_2_fastqs.txt b/data/test_dir_2_fastqs.txt
@@ -0,0 +1,8 @@
+TSTRUN02_01_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
+TSTRUN02_01_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
+TSTRUN02_02_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
+TSTRUN02_02_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
+TSTRUN02_03_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
+TSTRUN02_03_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
+TSTRUN02_04_000000_000000_TEST_Pan5180_S1_R1_001.fastq.gz
+TSTRUN02_04_000000_000000_TEST_Pan5180_S1_R2_001.fastq.gz
-Original file line number
+Diff line change
@@ Expand Up / @@ -6,3 +6,4 @@ seglh_naming.egg-info/ @@
     venv/
     temp/
     .coverage
+    *data_unzipped
Original file line number	Diff line number	Diff line change
Expand Up		@@ -48,4 +48,3 @@ def test_get_loggers(self, logfiles_config, caplog):
		)
		assert loggers[logger_name].name in caplog.text