SparkRMA: a scalable implementation of Robust Multi-array Average (RMA) in Apache Spark

Read the documentation for background information and a tutorial.

See the subdirectories for more detailed READMEs including usage of programs.

spark_rma

This directory contains the code for running RMA analysis. This includes steps for:

annotation and background correction
quantile normalization
median polish

Annotation and Background Correction

Annotation maps perfect match (PM) probes on the array to their targets. Background correction removes artifacts and preprocesses the raw CEL files for analysis. This is not done in Spark, it is an embarrassingly parallel problem, done independently for each sample. It completed in R.

Quantile Normalization

Quantile normalization removes array effects by normalizing each array against all others.

Median Polish

Tukey's median polish is used to summarize the values of multiple probes mapping within the same transcript cluster or probeset.

Helper Scripts

Parquet Converter

convert_to_parquet.py

This can convert flat files (csv and tsv currently supported) to parquet format using SNAPPY compression without using JVM.

HTA 2.0 Annotation

hta_annotation.R

Many examples have been shown for HTA 2.0. The annotation and background correction step requires an input specific to the array type. In this script, this file is generated from Bioconductor for HTA 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
helper		helper
spark_rma		spark_rma
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkRMA: a scalable implementation of Robust Multi-array Average (RMA) in Apache Spark

contents

spark_rma

Annotation and Background Correction

Quantile Normalization

Median Polish

Helper Scripts

Parquet Converter

HTA 2.0 Annotation

About

Releases 1

Packages

Languages

License

michaeltneylon/spark_rma

Folders and files

Latest commit

History

Repository files navigation

SparkRMA: a scalable implementation of Robust Multi-array Average (RMA) in Apache Spark

contents

spark_rma

Annotation and Background Correction

Quantile Normalization

Median Polish

Helper Scripts

Parquet Converter

HTA 2.0 Annotation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages