a dockerized UCSC genome browser, customizable with simple google spreadsheets. This is still a work in progress and not ready for general use.
The UCSC Genome Browser is a web-based genome browser widely used for viewing and sharing various types of genomic data mapped to reference genomes. The browser hosted at UCSC is limited to a specific set of reference genomes. While "assembly hubs" allow you to utilize other reference genomes, research groups working with many genomes not hosted by UCSC may benefit from the installation of a self-hosted instance of the browser. Installing and maintaining a self-hosted instance of the browser is difficult and time-consuming, requiring extensive interactive use of the shell and mysql.
cruize
was created to address this challenge by simplifying the deployment of UCSC browsers with custom genomes. cruize is composed of a series of Docker images containing the UCSC Genome Browser software, and scripts to facilitate loading of custom genomes and datasets defined in google spreadsheets. cruize negates the need to go through the processes for installing the genome browser and manually setting up genome and track databases.
A UCSC genome browser instance is primarily composed of an apache web server, a mysql database, and the data files to be displayed in the browser. cruize initiates a docker container for the web server, and another docker container for the mysql database, both on the same host. These two containers are placed on the same docker network so that they can communicate with each other, and the host forwards requests on port 80/443 to the apache docker container. The genome data files needed by the web server are kept in a directory on the host computer that is mounted as a volume in the web server container. The mysql data directory needed by the mysql server is also kept on the host computer and is mounted as a volume in the mysql service container. Any changes to the browser are performed by a transient admin container that can modify the genome data directory and the mysql database. This design isolates the services in a way that protects both the host computer and the genome data from the web service.
The data for a UCSC genome browser is stored in a series of mysql databases including a hgcentral
database that defines the genomes loaded and contains user/session info, and a database for each genome that contains track data, metadata and display settings. Setting up and maintain these databases is difficult, error-prone, and time-consuming. cruize
simplifies the creation and management of these databases by allowing you to enter information for each genome to be displayed into a simple google spreadsheet, and another google spreadsheet for each genome tabulating what tracks are shown and how they are displayed. When you want to update the browser with new genomes, tracks, or track settings, cruize
will download these spreadsheets and automatically rebuild the genome and track databases, negating the need for the you to directly interact with the mysql database.
-
curl -L https://get.docker.com | sh
-
docker compose (optional)
curl -L https://github.com/docker/compose/releases/download/1.8.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose && \ chmod +x /usr/local/bin/docker-compose
via git
git clone https://github.com/dvera/cruize && cd cruize
or grab the zipped source
curl -Lo master.zip https://github.com/FSUgenomics/cruize/archive/master.zip && unzip master.zip && rm -f master.zip && mv cruize-master cruize
with compose:
docker-compose up
or without compose:
# create a bridge network for containers
docker network create cruize_nw
# start database container
docker run -d \
-p 3306:3306 \
--name cruize_sql \
-h cruizesql \
--env-file browser_config \
-v $(pwd)/sqldb:/var/lib/mysql \
-v $(pwd)/cruize_scripts:/usr/local/bin \
--network cruize_nw \
vera/cruize_sql
# start webserver container
docker run -d \
-p 80:80 \
--name cruize_www \
-h cruizewww \
--env-file browser_config \
-v $(pwd)/gbdb:/gbdb:ro \
-v $(pwd)/cruize_scripts:/usr/local/bin \
--network cruize_nw \
vera/cruize_www
# run admin container to update
docker run -it \
--name cruize_admin \
-h cruizeadmin \
--env-file browser_config \
-v $(pwd)/gbdb:/gbdb \
-v $(pwd)/cruize_scripts:/usr/local/bin \
--network cruize_nw \
vera/cruize_admin \
update_browser
with cloud-init:
#cloud-config
package_upgrade: true
package_update: true
runcmd:
- curl https://get.docker.com/ | sh
- curl -L https://github.com/docker/compose/releases/download/1.8.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
- chmod +x /usr/local/bin/docker-compose
- git clone --recursive https://github.com/fsugenomics/cruize /root/cruise
- systemctl enable docker
- systemctl start docker
- cd cruise && docker-compose up
when cruize is first started, it checks to see if there is an existing database and genome data files, and downloads some example data if not. refer to the docs to customize cruize.
cruize
downloads and installs the UCSC genome browser and associated tools, which are free for non-commercial use. The license for the UCSC genome browser can be found here. cruize
itself is licensed under the MIT license.
- blatServers
- bed files
- liftOver
- genomes menu