Skip to content

Latest commit

 

History

History
312 lines (162 loc) · 11.5 KB

README.md

File metadata and controls

312 lines (162 loc) · 11.5 KB

RNAseq_figure_plotter

Generate nine different plots (bar, box, density, dot, heatmap, histogram, line, scatter, or violin) from RNAseq result table using ggplot2 and pheatmap program.

Please type this code 'install.packages("library")' to install three libraies 'ggplot2', 'pheatmap', 'tidyverse', and 'argparse' to run rnaseq figure plotter software.

It is R codes and use "Rscript rnaseq_figure_plotter.r -i input_file -t bar -o output_file -g gene_list_file ... -c 5 -s 6" to run!

parameter of rnaseq_figure_plotter

Usage

Rscript rnaseq_figure_plotter.r -i input_file -t bar -o output_file -g gene_list_file  ... -c 5 -s 6

help

HELP		-h, --help		show this help message and exit

Required function

INPUT		-i, --input		input file name

TYPE		-t, --type		choose plot types (bar, box, density, density_fill, dot_color, dot_shape, heatmap, histogram, line, scatter, or violin)

General optional function

OUTPUT		-o, --output		default output; output file name

GENE		-g, --gene		file name of specific gene ID list; generate "output"_gene_selection.txt file

REMOVE_COL	-r, --remove_col	default None; remove specific columns (samples) from input file. Split column names by space. Example; sample1 sample2 sample3

LOG2		-l, --log		default 0; calculate log value (log2; 2, log10; 10, loge; e)

LOG2_NUMBER	-lgn, --log_number	default 0.000000001; add number to avoid -inf for log value

ZSCORE		-zs, --zscore		default off; apply Z-score transformation in gene (on or off). --log function should be 0 to apply --zscore function.  

XAXIS		-x, --xaxis		default samples; choose x-axis (gene or sample)

ZAXIS		-z, --zaxis		default gene; choose fill, color, or shape (gene or sample)

COLOR		-c, --color		default 1; choose color type (1-10)

CUSTOM_COLOR	-cst, -custom_color	default None; customize color scales. Split colors by space. Example; red white blue green yellow

LETTER_SIZE	-ls, --letter_size	default 8 10; type text and title size of legend and axis, respectively. Split two number by space. Example; 20 24

FIGURE_SAVE_FORMAT	-f, --figure_save_format		default pdf; choose format of figures (eps, ps, tex (pictex), pdf, jpeg, tiff, png, bmp, svg)

Optional parameter for individual plot types

STYLE		-s, --style		default 4; choose backgroud of figures (1-7). This function works for any plots except heatmap.

LIMIT		-lim, --limit		default None; apply individual scale of “data”. This function is for every plots excepts heatmap. Split two numbers(e.g. limit 0 to 200 -> type 0 200) by space. Negative number required double quotation marks such as “negative number”.  Example; 0 100/“-1” 3

AXIS_CHANGE	-a, --axis_change	default off; flip axis in figures (on or off). This function works for any plots except heatmap.

LEGEND_POSITION	-lp, --legend_position	default right; choose legend position of figures (none, left, right, bottom, top, or two-element numeric vector). This function works for any plots except heatmap and scatter.

GEOM_POSITION	-gp, --geom_position	default 1; choose plot visualize types (geom position) from 1-4 in bar, density, density_fill, and histogram

CLUSTER_SELECT	-cs, --cluster_select	default on on; apply column and row cluster function for heatmap (on or off). Column is first and row is second, split two factor(on or off) by space. Example; on off

SCATTER_SELECT	-ss, --scatter_select	default None; type column of two samples for comparison in dot plot. Split samples by space. Example; sample1 sample2

PLOT_SIZE	-p, -plot_size		default 7 7; type width and height of figure. Split two number by space. This function works for any plots except heatmap.Example; 10 12

input file format (-i, --input)

Input file requires to be tab delimited file. First column and row should be gene ID and sample name, respectively. Gene expression value starts from second columns and rows.

Example of input file is following;

		sample1	sample2	sample3	sample4	sample5
	geneA	1	3	5.5	7	2
	geneB	100	267	55	79	62
	geneC	0.3	0.65	9.5	0.87	2.1
	geneD	205	356	78	67	2900
	geneE	1001	3001	5500	7001	2001
	geneF	2	2	2	2	2
	geneG	0.01	0.03	0.5	0.07	0.02

type of plots (-t, --type)

There are nine types of plot you can choose from bar, box, density, dot, heatmap, histogram, line, scatter, or violin.

Heatmap is generated by pheatmap (https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf) and other plots are generated by ggplot2 (https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf).

output file name (-o, --output)

Provide output file name.

specific gene id list file format (-g, --gene)

Gene ID should be in first row and split by \n.

Example of specific gene ID list file is following;

	geneA
	geneF
	geneG

(-g, --gene) function automatically selects expression value consistent with provided specific gene ID, and provides "output"_gene_selection.txt file.

Example of "output"_gene_selection.txt file is following;

	"sample1"	"sample2"	"sample3"	"sample4"	"sample5"
	"geneA"	1	3	5.5	7	2
	"geneF"	2	2	2	2	2
	"geneG"	0.01	0.03	0.5	0.07	0.02

remove column (-r, --remove_col)

Remove specific columns (samples) from input file. Split column names by space. Example; -r sample1 sample4 sample5 looks as following;

		sample2	sample3
	geneA	3	5.5
	geneB	267	55
	geneC	0.65	9.5
	geneD	356	78
	geneE	3001	5500
	geneF	2	2
	geneG	0.03	0.5

log2 transformation (-l, --log) and (-lgn, --log_number)

Provide log2, log10, or loge transform for gene expression value by type 2, 10, or e, respectively in (-l, --log) function. Default of (-l, --log) function is off (0).

To avoid -inf for log2 value for generating plots, (-lgn, --log2_number) function add tiny values (defalut 0.000000001). You can customize this value by type number (example 0, 0.000001, 0.000000000000000001, etc...).

z_score transformation (-zs, -zsore)

Transformed log2 value data in each gene (row) to z-score. This function is on when (-l, --log) function is off(0). To avoid "NA", it is automatically removed "NA" contain gene ID.

dataframe description

Once software finishes it provide dataframe of three columns, gene, data, and sample. Sample, data, and gene refer to sample name, gene expression value, and gene ID, respectively. Example of dataframe is following;

	sample       data  gene
	sample1     1.0000 geneA
	sample1   100.0000 geneB
	sample1     0.3000 geneC
	sample1   205.0000 geneD
	sample1  1001.0000 geneE
	sample1     2.0000 geneF
	sample1     0.0100 geneG

axis (-x, --xaxis) and (-z, --zaxis)

Default of x-axis and z-axis are sample and gene, respectively.

Following table shows which axis you can modify. Label x and z can modify by (-x, --xaxis) and (-z, --zaxis).

plots		x-axis	y-axis	color/shape
bar		x	data	z
box		x	data	x
density		data	density	x
dot_color	x	data	z
dot_shape	x	data	z
heatmap		sample	gene	
histogram	data	count	x
line		x	data	z
scatter		
violin		x	data	x

color settings (-c, --color) or (-cst, --custom_color)

ggplot2 color (https://ggplot2.tidyverse.org/reference/scale_brewer.html) is using for color setting for plots excepts heatmap. Color in heatmap is custom setting. Setting is following;

settings(ggplot2)	ggplot2_color					color description
1			scale_fill/color_hue()				standard ggplot2 setting; read ggplot2 website (https://ggplot2.tidyverse.org/reference/scale_hue.html)
2			scale_fill/color_viridis_d(option = "C")	read ggplot2 website (https://ggplot2.tidyverse.org/reference/scale_viridis.html)
3			scale_fill/color_viridis_d(option = "D")	read ggplot2 website (https://ggplot2.tidyverse.org/reference/scale_viridis.html)
4			scale_fill/color_grey()				black to white	
5			scale_fill/color_brewer(palette ="RdBu")	red to blue (maximum 9 colors)
6			scale_fill/color_brewer(palette ="RdYlBu")	red, yellow, to blue (maximum 9 colors)
7			scale_fill/color_brewer(palette ="Reds")	red to white (maximum 9 colors)
8			scale_fill/color_brewer(palette ="Blues")	red to blue (maximum 9 colors)
9			scale_fill/color_brewer(palette ="Paired")	Paired palette; read ggplot2 website (maximum 12 colors)
10			scale_fill/color_brewer(palette = "Set1")	Set1 palette; read ggplot2 website (maximum 9 colors)


settings(pheatmap)	color description
1			red, white, to blue
2			red, yellow, to blue
3			red, white, to green
4			purple, white, to green
5			red, white, to black
6			yellow to blue
7			black to white
8			red to white
9			blue to white
10			green to white

You can also customize your color by using (-cst, --custom_color) function. When (-cst, --custom_color) function is on, (-c, --color) function is off. Split colors by space. Example of custom color settings are -cst red white blue green yellow.

letter size setting (-ls, --letter_size_)

Change text and title size of legend and axis, respectively. Split two number by space. Default is 8 for text, 10 for title. Example; 20 24

save figure format (-f, --figure_save_format)

Provided save figure format. Default is pdf, you can also choose eps, ps, tex (pictex), pdf, jpeg, tiff, png, bmp, svg.

style settings (-s, --style)

Style uses ggplot2 theme function (https://ggplot2.tidyverse.org/reference/ggtheme.html) to change 7 different background of figure. This function works for any plots except heatmap.

settings	set_style		set_context
1		theme_void()no 		line with white
2		theme_classic()		axis line with white
3		theme_minimal()		subline with white
4		theme_bw()		grey axis and subline with white
5		theme_linedraw()	black axis and subline with white
6		theme_grey()		subline with grey
7		theme_dark()		subline with black

limit data (-lim, --limit)

Apply individual scale of data. This function is for every plots excepts heatmap. Split two numbers by space. Negative number required double quotation marks such as “negative number”.For example, limit 0 to 200 is to type 0 200, limit -1 to 3 is to type "-1" 3.

change axis (-a, --axis_change)

Change x and y axis by on of (-a, --axis_change) function. This function is for every plots excepts heatmap.

change legend position (-lp, --legend_position)

Choose legend position of figures (none, left, right, bottom, top, or two-element numeric vector). Default is right. This function works for any plots except heatmap and scatter.

change position in geom function for bar, density, and histogram (-gp, --geom_position)

Choose plot visualize types (geom position) from 1-4 in bar, density, density_fill and histogram using ggplot2 position function (https://ggplot2.tidyverse.org/reference/position_dodge.html).

settings	position	description
1		stack		stack style figure
2		dodge		dodge style figure
3		dodge2		dodge style figure
4		fill		fill entire figure

cluster function for heatmap (-cc, --cluster_select)

Trun on and off of pheatmap clustering function in column and/or row. Default is on in both column and row. Column is first and row is second, split two factor (on or off) by space. Example; on off

scatter plot two dataset setting (-ss, --scatter_setting)

This code is required for scattered plot.

(-ss, --scatter_setting) function required dataset "x-axis y-axis" for scattered plot and split samples or genes by space. Example of (-ss, --scatter_setting) is "sample1 sample3". Color cannot change in scatter plot function.

change figure size (-p, --plot_size)

Change width and height of figure, respectively. Split two number by space. Default is both width and height for 7. This function works for any plots except heatmap. Example; 10 12

example of usage

Easy instruction and example usages are in introduction_rnaseq_figure_plotter_R.pdf.