A heatmap-scatterplot using Dash by plotly. From the commandline it can be started on localhost or AWS, or it can be run from the Refinery GUI.
If you have Docker installed, and data available at public URLs, this is the easiest way to get started:
$ V=v0.1.5
$ FIXTURES=https://raw.githubusercontent.com/refinery-platform/heatmap-scatter-dash/$V/fixtures/good/data
$ docker run --name heatmap --detach --publish 8888:80 \
-e "FILE_URLS=$FIXTURES/counts.csv $FIXTURES/counts-copy.csv.gz" \
mccalluc/heatmap_scatter_dash:$V
Then visit http://localhost:8888.
If multiple URLs are desired, use spaces in the value
of the environment variables, as in the example.
Besides providing count data, you can also specify
DIFF_URLS
for differential expression data and
META_URLS
for metadata.
Check out the project and install dependencies:
# Requires Python3:
$ python --version
$ git clone https://github.com/refinery-project/heatmap-scatter-dash.git
$ cd heatmap-scatter-dash
$ pip install -r requirements-freeze.txt
Then run it locally:
$ cd context
# Generate a random matrix:
$ ./app_runner.py --demo 100 10 5
# Load data from disk:
$ ./app_runner.py --files ../fixtures/good/data/counts.csv \
--diffs ../fixtures/good/data/stats-* \
--meta ../fixtures/good/data/metadata.csv
and visit http://localhost:8050/
.
For input, a variety of tabular data formats are supported: CSV, TSV, GCT, or any of those zipped.
In the example below, r1
, r2
, etc. are genes and c1
, c2
, etc. are conditions.
Optionally, name1
, name2
, etc. are human-readable names for the genes,
gene | c1 | c2 | c3 | c4 | code1 | code2 |
---|---|---|---|---|---|---|
r1 | 1 | 2 | 3 | 4 | a | b |
r2 | 5 | 6 | 7 | 8 | c | b |
r3 | 3 | 5 | 7 | 9 | c | d |
More examples are available.
One bash script, test.sh
, handles all our tests:
- Python unit tests
- Python style tests (
flake8
andisort
) - Cypress.io interaction tests
- Docker container build and launch
A few more dependencies are required for this to work locally:
# Install Docker: https://www.docker.com/community-edition
$ pip install -r requirements-dev.txt
$ npm install cypress --save-dev
Successful Github tags and PRs will prompt Travis to push the built image to Dockerhub. For a new version number:
$ git tag v0.0.x && git push origin --tags
There are a few notes on implementation decisions and lessons learned.
The online help can be previewed to get a better sense of the operational details.
Command line usage:
$ python app_runner.py -h
usage: app_runner.py [-h] (--demo ROWS COLS META | --files CSV [CSV ...])
[--diffs CSV [CSV ...]] [--metas CSV [CSV ...]]
[--most_variable_rows ROWS] [--html_table]
[--truncate_table N] [--port PORT]
[--p_value_re RE [RE ...]] [--log_fold_re RE [RE ...]]
[--profile [DIR]] [--html_error] [--debug]
[--api_prefix PREFIX]
Light-weight visualization for differential expression
optional arguments:
-h, --help show this help message and exit
--demo ROWS COLS META
Generates a random matrix with the number of rows and
columns specified. In addition, "META" determines the
number of mock metadata fields to associate with each
column.
--files CSV [CSV ...]
Read CSV or TSV files. Identifiers should be in the
first column and multiple files will be joined on
identifier. Gzip files are also handled.
--diffs CSV [CSV ...]
Read CSV or TSV files containing differential
expression data.
--metas CSV [CSV ...]
Read CSV or TSV files containing metadata: Row labels
should match column headers of the raw data.
--most_variable_rows ROWS
For the heatmap, we first sort by row variance, and
then take the number of rows specified here. Defaults
to 500.
--html_table The default is to use pre-formatted text for the
tables. HTML tables are available, but are twice as
slow.
--truncate_table N Truncate the table to the first N rows. Table
rendering is often a bottleneck. Default is not to
truncate.
--port PORT Specify a port to run the server on. Defaults to 8050.
Refinery/Developer:
These parameters will probably only be of interest to developers, and/or
they are used when the tool is embedded in Refinery.
--p_value_re RE [RE ...]
Regular expressions which column headers will be
checked against to identify p-values. Defaults to
['p.*value', 'padj', 'fdr'].
--log_fold_re RE [RE ...]
Regular expressions which column headers will be
checked against to identify fold-change values.
Defaults to ['\\blog[^a-z]'].
--profile [DIR] Saves a profile for each request in the specified
directory, "/tmp" by default. Profiles can be viewed
with snakeviz.
--html_error If there is a configuration error, instead of exiting,
start the server and display an error page.
--debug Run the server in debug mode: The server will restart
in response to any code changes, and some hidden
fields will be shown.
--api_prefix PREFIX Prefix for API URLs.