Generate nine different plots (bar, box, density, dot, heatmap, histogram, line, scatter, or violin) from RNAseq result table using ggplot2 and pheatmap program.
Please type this code 'install.packages("library")' to install three libraies 'ggplot2', 'pheatmap', 'tidyverse', and 'argparse' to run rnaseq figure plotter software.
It is R codes and use "Rscript rnaseq_figure_plotter.r -i input_file -t bar -o output_file -g gene_list_file ... -c 5 -s 6" to run!
Usage
Rscript rnaseq_figure_plotter.r -i input_file -t bar -o output_file -g gene_list_file ... -c 5 -s 6
help
HELP -h, --help show this help message and exit
Required function
INPUT -i, --input input file name
TYPE -t, --type choose plot types (bar, box, density, density_fill, dot_color, dot_shape, heatmap, histogram, line, scatter, or violin)
General optional function
OUTPUT -o, --output default output; output file name
GENE -g, --gene file name of specific gene ID list; generate "output"_gene_selection.txt file
REMOVE_COL -r, --remove_col default None; remove specific columns (samples) from input file. Split column names by space. Example; sample1 sample2 sample3
LOG2 -l, --log default 0; calculate log value (log2; 2, log10; 10, loge; e)
LOG2_NUMBER -lgn, --log_number default 0.000000001; add number to avoid -inf for log value
ZSCORE -zs, --zscore default off; apply Z-score transformation in gene (on or off). --log function should be 0 to apply --zscore function.
XAXIS -x, --xaxis default samples; choose x-axis (gene or sample)
ZAXIS -z, --zaxis default gene; choose fill, color, or shape (gene or sample)
COLOR -c, --color default 1; choose color type (1-10)
CUSTOM_COLOR -cst, -custom_color default None; customize color scales. Split colors by space. Example; red white blue green yellow
LETTER_SIZE -ls, --letter_size default 8 10; type text and title size of legend and axis, respectively. Split two number by space. Example; 20 24
FIGURE_SAVE_FORMAT -f, --figure_save_format default pdf; choose format of figures (eps, ps, tex (pictex), pdf, jpeg, tiff, png, bmp, svg)
Optional parameter for individual plot types
STYLE -s, --style default 4; choose backgroud of figures (1-7). This function works for any plots except heatmap.
LIMIT -lim, --limit default None; apply individual scale of “data”. This function is for every plots excepts heatmap. Split two numbers(e.g. limit 0 to 200 -> type 0 200) by space. Negative number required double quotation marks such as “negative number”. Example; 0 100/“-1” 3
AXIS_CHANGE -a, --axis_change default off; flip axis in figures (on or off). This function works for any plots except heatmap.
LEGEND_POSITION -lp, --legend_position default right; choose legend position of figures (none, left, right, bottom, top, or two-element numeric vector). This function works for any plots except heatmap and scatter.
GEOM_POSITION -gp, --geom_position default 1; choose plot visualize types (geom position) from 1-4 in bar, density, density_fill, and histogram
CLUSTER_SELECT -cs, --cluster_select default on on; apply column and row cluster function for heatmap (on or off). Column is first and row is second, split two factor(on or off) by space. Example; on off
SCATTER_SELECT -ss, --scatter_select default None; type column of two samples for comparison in dot plot. Split samples by space. Example; sample1 sample2
PLOT_SIZE -p, -plot_size default 7 7; type width and height of figure. Split two number by space. This function works for any plots except heatmap.Example; 10 12
Input file requires to be tab delimited file. First column and row should be gene ID and sample name, respectively. Gene expression value starts from second columns and rows.
Example of input file is following;
sample1 sample2 sample3 sample4 sample5
geneA 1 3 5.5 7 2
geneB 100 267 55 79 62
geneC 0.3 0.65 9.5 0.87 2.1
geneD 205 356 78 67 2900
geneE 1001 3001 5500 7001 2001
geneF 2 2 2 2 2
geneG 0.01 0.03 0.5 0.07 0.02
There are nine types of plot you can choose from bar, box, density, dot, heatmap, histogram, line, scatter, or violin.
Heatmap is generated by pheatmap (https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf) and other plots are generated by ggplot2 (https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf).
Provide output file name.
Gene ID should be in first row and split by \n.
Example of specific gene ID list file is following;
geneA
geneF
geneG
(-g, --gene) function automatically selects expression value consistent with provided specific gene ID, and provides "output"_gene_selection.txt file.
Example of "output"_gene_selection.txt file is following;
"sample1" "sample2" "sample3" "sample4" "sample5"
"geneA" 1 3 5.5 7 2
"geneF" 2 2 2 2 2
"geneG" 0.01 0.03 0.5 0.07 0.02
Remove specific columns (samples) from input file. Split column names by space. Example; -r sample1 sample4 sample5 looks as following;
sample2 sample3
geneA 3 5.5
geneB 267 55
geneC 0.65 9.5
geneD 356 78
geneE 3001 5500
geneF 2 2
geneG 0.03 0.5
Provide log2, log10, or loge transform for gene expression value by type 2, 10, or e, respectively in (-l, --log) function. Default of (-l, --log) function is off (0).
To avoid -inf for log2 value for generating plots, (-lgn, --log2_number) function add tiny values (defalut 0.000000001). You can customize this value by type number (example 0, 0.000001, 0.000000000000000001, etc...).
Transformed log2 value data in each gene (row) to z-score. This function is on when (-l, --log) function is off(0). To avoid "NA", it is automatically removed "NA" contain gene ID.
Once software finishes it provide dataframe of three columns, gene, data, and sample. Sample, data, and gene refer to sample name, gene expression value, and gene ID, respectively. Example of dataframe is following;
sample data gene
sample1 1.0000 geneA
sample1 100.0000 geneB
sample1 0.3000 geneC
sample1 205.0000 geneD
sample1 1001.0000 geneE
sample1 2.0000 geneF
sample1 0.0100 geneG
Default of x-axis and z-axis are sample and gene, respectively.
Following table shows which axis you can modify. Label x and z can modify by (-x, --xaxis) and (-z, --zaxis).
plots x-axis y-axis color/shape
bar x data z
box x data x
density data density x
dot_color x data z
dot_shape x data z
heatmap sample gene
histogram data count x
line x data z
scatter
violin x data x
ggplot2 color (https://ggplot2.tidyverse.org/reference/scale_brewer.html) is using for color setting for plots excepts heatmap. Color in heatmap is custom setting. Setting is following;
settings(ggplot2) ggplot2_color color description
1 scale_fill/color_hue() standard ggplot2 setting; read ggplot2 website (https://ggplot2.tidyverse.org/reference/scale_hue.html)
2 scale_fill/color_viridis_d(option = "C") read ggplot2 website (https://ggplot2.tidyverse.org/reference/scale_viridis.html)
3 scale_fill/color_viridis_d(option = "D") read ggplot2 website (https://ggplot2.tidyverse.org/reference/scale_viridis.html)
4 scale_fill/color_grey() black to white
5 scale_fill/color_brewer(palette ="RdBu") red to blue (maximum 9 colors)
6 scale_fill/color_brewer(palette ="RdYlBu") red, yellow, to blue (maximum 9 colors)
7 scale_fill/color_brewer(palette ="Reds") red to white (maximum 9 colors)
8 scale_fill/color_brewer(palette ="Blues") red to blue (maximum 9 colors)
9 scale_fill/color_brewer(palette ="Paired") Paired palette; read ggplot2 website (maximum 12 colors)
10 scale_fill/color_brewer(palette = "Set1") Set1 palette; read ggplot2 website (maximum 9 colors)
settings(pheatmap) color description
1 red, white, to blue
2 red, yellow, to blue
3 red, white, to green
4 purple, white, to green
5 red, white, to black
6 yellow to blue
7 black to white
8 red to white
9 blue to white
10 green to white
You can also customize your color by using (-cst, --custom_color) function. When (-cst, --custom_color) function is on, (-c, --color) function is off. Split colors by space. Example of custom color settings are -cst red white blue green yellow.
Change text and title size of legend and axis, respectively. Split two number by space. Default is 8 for text, 10 for title. Example; 20 24
Provided save figure format. Default is pdf, you can also choose eps, ps, tex (pictex), pdf, jpeg, tiff, png, bmp, svg.
Style uses ggplot2 theme function (https://ggplot2.tidyverse.org/reference/ggtheme.html) to change 7 different background of figure. This function works for any plots except heatmap.
settings set_style set_context
1 theme_void()no line with white
2 theme_classic() axis line with white
3 theme_minimal() subline with white
4 theme_bw() grey axis and subline with white
5 theme_linedraw() black axis and subline with white
6 theme_grey() subline with grey
7 theme_dark() subline with black
Apply individual scale of data. This function is for every plots excepts heatmap. Split two numbers by space. Negative number required double quotation marks such as “negative number”.For example, limit 0 to 200 is to type 0 200, limit -1 to 3 is to type "-1" 3.
Change x and y axis by on of (-a, --axis_change) function. This function is for every plots excepts heatmap.
Choose legend position of figures (none, left, right, bottom, top, or two-element numeric vector). Default is right. This function works for any plots except heatmap and scatter.
Choose plot visualize types (geom position) from 1-4 in bar, density, density_fill and histogram using ggplot2 position function (https://ggplot2.tidyverse.org/reference/position_dodge.html).
settings position description
1 stack stack style figure
2 dodge dodge style figure
3 dodge2 dodge style figure
4 fill fill entire figure
Trun on and off of pheatmap clustering function in column and/or row. Default is on in both column and row. Column is first and row is second, split two factor (on or off) by space. Example; on off
This code is required for scattered plot.
(-ss, --scatter_setting) function required dataset "x-axis y-axis" for scattered plot and split samples or genes by space. Example of (-ss, --scatter_setting) is "sample1 sample3". Color cannot change in scatter plot function.
Change width and height of figure, respectively. Split two number by space. Default is both width and height for 7. This function works for any plots except heatmap. Example; 10 12
Easy instruction and example usages are in introduction_rnaseq_figure_plotter_R.pdf.