Releases: eastgenomics/eggd_dias_batch
v3.2.0
Summary
Changes to improve unarchiving of files plus minor bug fixes and improvements
Changes
- properly check for files to unarchive before running any jobs to ensure all required files are unarchived
- patterns currently defined in utils.defaults
- improve details in readme
- catch samples with no tests codes and raise as an error
- fix total sample no run in summary report
- fix the issue with subsetting the manifest to allow skipping samples, restricts valid samplename checking to just the subset
- strip whitespace on string inputs to not cause jobs to fail from inputs like exclude samples having bonus spaces
- addition of new input
-iunarchive_only
to allow for just unarchiving and not running jobs - fix issue of single gene reports having
:
in the report filename
Issues closed
v3.1.0
- new input added to exclude control samples from CNV calling by default
- fix for reading in files from DNAnexus with a trailing blank line
- pass through optional multiqc report for artemis
- explicitly raise an error if 2 configs of same version found
- properly handle research code
- replace hard coded dynamic string inputs (i.e. indications, panels, test codes) with config placeholders
- handle current and new genepanels file
- add SNV / mosaic as a string to the reports output folder
See https://cuhbioinformatics.atlassian.net/wiki/spaces/DV/pages/3100770334/eggd+dias+batch+v3.1.0
v3.0.1
v3.0.0
Summary
Refactor of original tool into DNAnexus app, with various improvements and bug fixes
Changes
- Refactor to DNAnexus app to remove reliance on running jobs from server
- Refactor of whole code base to add in better handling of launching CNV calling + all reports workflow with one command
- Added unit tests to cover the majority of functions (93% code coverage)
- Optionally handle unarchiving any required files for analysis that are archived
Fixes
v2.1.0
Adds mosaicreports subcommand. This will now look for a mutect2 vcf in a TNHaplotyper2 output directory & make an appropriate report (with extra excluded columns). Otherwise the report is the same as a normal SNV one.
See: https://cuhbioinformatics.atlassian.net/wiki/spaces/DV/pages/2983395507/dias+batch+running+v2.1.0
v2.0.2
Summary
This is a minor bug fix update to improve:
- creating the correct
output_file_prefix
input string from_HGNC
gene IDs, rather than only fromtest-code_clinical-indication
.
Changes
in the reports.py
and cnvreports.py
:
- provide a && joined string of test codes as input to all generate_bed stages (
output_file_prefix
input field) - provide this list of prefixes in a way that can record both single gene and clinical indication requests
Bug fixes:
For further information including development notes and testing evidence, see: https://cuhbioinformatics.atlassian.net/wiki/spaces/DV/pages/2936799233/dias+batch+running+v2.0.2
v2.0.1
Summary
This is a minor bug fix update to improve:
- finding clinical indications and panels against test codes
- correctly add multiple clinical indications to a list for a sample when present on multiple lines in the Gemini input file (manifest or reanalysis tsv)
- name panel-specific bed files with test_code only to avoid hitting the filesystem character limit, especially at the eggd_annotate_excluded_regions stage of the dias_cnvreports_workflow.
Changes
parse_Gemini_manifest
:
- restrict parsing of clinical indications that start with an R code, C code or _HGNC ID
- correctly identify ALL test codes for a sample that may be across multiple lines within the reanalysis file
create_job_report_file
:
- present list of samples and invalid test codes as one pair per line for easier copy-pasting to repeat analysis job with corrected test codes
in the reports.py
and cnvreports.py
:
- match only on the R code part of the clinical indication from a Gemini file to genepanels
- provide a && joined string of test codes as optional input to all generate_bed stages (
output_file_prefix
input field)
Bug fixes:
For further information including development notes and testing evidence, see: https://cuhbioinformatics.atlassian.net/wiki/spaces/DV/pages/2888663263/dias+batch+running+v2.0.1
v2.0.0
Summary
This is an update to reflect that eggd_conductor is now being used routinely to set off dias_single, dias_multi and eggd_MultiQC jobs, as well as, to accommodate the lab-wide transition to Epic. Epic will be used routinely as a sample tracking system, including booking of samples against test codes/clinical indications which information needs to be parsed by dias_batch_running to set off dias_reports workflows for each samples against the required test codes/clinical indications.
Changes
Major changes to accommodate transition to automated running of dias_single, dias_multi and QC steps, as well as, updates to gathering input files for dias_reports and dias_cnvreportsworkflows.
- removed support for setting off dias_single, dias_multi and multiQC jobs as these are now handled by eggd_conductor
cnvcall
command no longer relies on having "_single" in the dias_single workflow's output folder name- updated logic for determining which sample to be analysed with which clinical indication/panel: the sample and test requirements are now always parsed from an input file provided as a command arg, and this file is being uploaded to the dias_reports workflow’s output folder on DNAnexus
- overall improvements to the code and general_functions, including better commenting throughout
reports
andcnvreports
commands expect a manifest file from Epic- manifest file from Epic is expected to be a semicolon-separated file with a batch ID in the first row, followed by column headers: sample identifiers (
Re-analysis Specimen ID
,Re-analysis Instrument ID
,Specimen ID
,Instrument ID
) in this order and the final column containing comma-separated Test Codes (column headers are exact matched against these strings), with each row a separate sample with its set of test codes. - accepted test codes start with
R
,C
or_HGNC
- manifest file from Epic is expected to be a semicolon-separated file with a batch ID in the first row, followed by column headers: sample identifiers (
reanalysis
andcnvreanalysis
commands expect a manifest file from Gemini (see expected format below)- manifest file from Gemini is expected to be a tab-separated file with an X number in the first column and comma-separated full clinical indication names (matching entries in the genepanels file), or
_HGNC:<ID>
in the second column.
- manifest file from Gemini is expected to be a tab-separated file with an X number in the first column and comma-separated full clinical indication names (matching entries in the genepanels file), or
- a job report file is uploaded to the dias_reports workflow’s output folder on DNAnexus, specifying:
- the number of samples which have sentieon VCF files available for filtering and annotation
- the number of samples for which reports were requested in the manifest file
- the number of samples for which reports job started successfully
- the available sample identifiers parsed from the manifest if it could not be linked with a sample VCF filename (invalid sample ID)
- a list of sample identifiers parsed from the manifest that were booked against tests that could not be identified (invalid test code)
Bug fixes:
For further information including development notes and testing evidence, see: https://cuhbioinformatics.atlassian.net/wiki/spaces/DV/pages/2888728586/dias+batch+running+v2.0.0
v1.10.2
Summary
Bug fix release to allow CNV report generation for all samples except the samples that were excluded from CNV calling. Other improvement features were included in this release. These are to use the full sample name in the excluded sample list to increase specificity in selecting sample and to upload the excluded sample list to the CNV calling output directory for record keeping.