Read the documentation for background information and a tutorial.
See the subdirectories for more detailed READMEs including usage of programs.
- spark_rma
- helper
This directory contains the code for running RMA analysis. This includes steps for:
- annotation and background correction
- quantile normalization
- median polish
Annotation maps perfect match (PM) probes on the array to their targets. Background correction removes artifacts and preprocesses the raw CEL files for analysis. This is not done in Spark, it is an embarrassingly parallel problem, done independently for each sample. It completed in R.
Quantile normalization removes array effects by normalizing each array against all others.
Tukey's median polish is used to summarize the values of multiple probes mapping within the same transcript cluster or probeset.
This can convert flat files (csv and tsv currently supported) to parquet format using SNAPPY compression without using JVM.
Many examples have been shown for HTA 2.0. The annotation and background correction step requires an input specific to the array type. In this script, this file is generated from Bioconductor for HTA 2.0.