This is a short guide to using the show_examples tool to view the pileup images used within DeepVariant and save them as PNG image files. This tool is particularly useful when you want to try to understand how a candidate variant of interest was represented when it was passed into the neural network.
For more information on the pileup images and how to read them, please see the "Looking through DeepVariant's Eyes" blog post.
The show_examples
tool is introduced in DeepVariant 1.0.0, so it is not
available in older versions, but it will work with make_examples output files
from older versions of DeepVariant.
First, find the make_examples.tfrecord.gz files output by DeepVariant during the make_examples (first) stage.
If you followed along with the quick start guide
and case studies that used the Docker version, then these files are usually
hidden inside the Docker container. But you can get them exported into the same
output directory where the VCF file appears by adding the following setting in
the run_deepvariant
command.
# Add the following to your run_deepvariant command.
--intermediate_results_dir=/output/
Then the make_examples file should appear in the directory docker mounted as
/output/
. For example, if you followed the
quick-start documentation, it looks like this:
${OUTPUT_DIR}/make_examples.tfrecord-00000-of-00001.gz
.
Once you have a make_examples output tfrecord file, then you can run
show_examples
to see the pileup images inside:
# Continuing from the quick start linked above:
INPUT_DIR="${PWD}/quickstart-testdata"
OUTPUT_DIR="${PWD}/quickstart-output"
BIN_VERSION="1.3.0" # show_examples is available only in version 1.0.0 and later.
sudo docker run \
-v "${INPUT_DIR}":"/input" \
-v "${OUTPUT_DIR}":"/output" \
google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/show_examples \
--examples=/output/intermediate_results_dir/make_examples.tfrecord-00000-of-00001.gz \
--output=/output/pileup --num_records=20
# And then your images are here:
ls "${OUTPUT_DIR}"/pileup*.png
- Filter to regions? Use e.g.
--regions chr20:1-3000000
or paths to BED or BEDPE files. - Filter to records from a VCF? Use
--vcf variants.vcf
. This can be a piece of a VCF, e.g. grepping a hap.py output VCF for false positives. This is a powerful way to pick out variants of interest and investigate them in more depth. - Stop after a certain number of examples, e.g. 10? Use
--num_records 10
. - Sharded examples? Use for example,
--examples make_examples.tfrecord@64.gz
to search through them all. This is best paired with --regions or --vcf to narrow down to a small number of examples of interest. You can also use the actual filename of a single make_examples file to only read that one, as shown in the sample code above. - Want an RGB image too?
--image_type both
. The RGB image overlays the channels as colors into a single image. Just remember that DeepVariant sees all the channels, not this RGB representation. - Don't want to print headers onto the images? Use
--noannotation
.