gdc-readgroups

Purpose

This package will extract the Read Group header lines from a BAM file, and convert the contained metadata to a json or tsv file with appropriate values applied for creation of a Read Group node in the NCI's Genomic Data Commons (GDC). Optionally, it will take no input, and output a template which may be edited to create a submission to the GDC.

The generated file may contain some fields marked REQUIRED<type>, which indicates these fields could not be generated from the supplied BAM file. In this case, the user must apply their own desired values to the generated json. The <type> must be as indicated in the generated json file. For details, see the column Acceptable Types or Values at the GDC Data Dictionary Viewer.

Other fields are optional, and are marked OPTIONAL<type>. If these fields could not be generated from the supplied BAM file, they may be filled in as appropriate or removed.

Note

The tool will only run on complete BAM files - files which contain the suffix .bam.

If the BAM is truncated, the error

    OSError: no BGZF EOF marker; file may be truncated

will be generated, and no json will be produced.

Note

gdc-readgroups is tested with Python 3.6, and above. Python2 is untested, and may not work.

Installation

There are 2 ways to install gdc-readgroups

pip install from pypi.org

gdc-readgroups may be used as a pip installed python package.

If you would like to install the package as root, for all users, run

sudo pip install gdc-readgroups

If you would like to install the package only for a local user, run

pip install gdc-readgroups --user

Build a Docker Image

The github repository for this package contains a Dockerfile, which may be used to build an image containing the package and all prerequisites. There are two ways to build the image.

Using docker directly.

wget https://raw.githubusercontent.com/NCI-GDC/gdc-readgroups/master/Dockerfile
docker build -t gdc-readgroups .

Using cwltool to build an image, and then run it, in one command.

In this case the cwl tool will expect a BAM input, and produce a json output. To install the reference CWL engine, run
```
pip install cwltool --user
```
Then to build the gdc-readgroups Docker Image and run the Container, run
```
wget https://raw.githubusercontent.com/NCI-GDC/gdc-readgroups/master/Dockerfile
wget https://raw.githubusercontent.com/NCI-GDC/gdc-readgroups/master/gdc-readgroups.cwl
cwltool gdc-readgroups.cwl --INPUT <your bam file>
```
The above command will only build the Docker Image if it does not exist on the system. After the build is performed once, the image will remain on your system, and the next cwltool run will skip the build step.

Usage

gdc-readgroups has two main modes: bam-mode and template-mode.

bam-mode

In bam-mode, a path to a BAM file must be supplied as input. By default, bam-mode will output a json file, but optionally may output a tsv file.

The command to run the pip installed package is

gdc-readgroups bam-mode --bam_path <your bam file>

The generated json will be placed in the current working directory and have a filename of <bam basename>.json. Any error messages will be written to stdout.

To output a tsv file, run

gdc-readgroups bam-mode --bam_path <your bam file> --output-format tsv

The generated tsv file will be placed in your current working directory, and be of the form <bam basename>.tsv

template-mode

In template-mode, no input is supplied, and two empty records are output within one file, either in json or tsv format.

To generate a json template, run

gdc-readgroups template-mode

The output will be placed in the current working directory and have a filename of gdc_readgroups.json

To generate a tsv template, run

gdc-readgroups template-mode --output-format tsv

The output will be placed in the current working directory and have a filename of gdc_readgroups.tsv

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
gdc_readgroups		gdc_readgroups
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
dev_requirements.txt		dev_requirements.txt
gdc-readgroups.cwl		gdc-readgroups.cwl
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gdc-readgroups

Purpose

Note

Note

Installation

pip install from pypi.org

Build a Docker Image

Usage

bam-mode

template-mode

About

Releases

Packages

Contributors 6

Languages

License

uc-cdis/gdc-readgroups

Folders and files

Latest commit

History

Repository files navigation

gdc-readgroups

Purpose

Note

Note

Installation

pip install from pypi.org

Build a Docker Image

Usage

bam-mode

template-mode

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages