A Bioinformatics thesis project by Alessandro Aiezza II
Defended on July 20, 2016 @ the Rochester Institute of Technology
Committee
Dr. Gary Skuse, Dr. Greg Babbitt, Dr. Larry Buckley
Citation
Aiezza, A.,II. (2016). The FLiCK framework; enabling rapid development and performance benchmarking of compression applications for genetic data files (Order No. 10144070). Available from ProQuest Dissertations & Theses Global. (1825611935). Retrieved from http://search.proquest.com/docview/1825611935?accountid=13567
A Java framework that makes it easier to develop file compressors/decompressors by leveraging ab inito knowledge about a specific file format. FLiCK
runs independently as a file compressor and currently will ZIP any files it is given.
A developer can create a module in FLiCK
for any file format. A module associates a file's format with one or many file extension names. (For example, the FASTA module will work on files with extenstions .fa
, .fasta
, and .fna
.) When the classes or jar of a FLiCK
module is found on the CLASSPATH
at runtime, FLiCK
will check for all associated file names and use a module's compression algorithm as oppose to the default ZIP algorithm.
FLiCK comes preloaded with FASTA and FASTQ file format compression modules
- Download from release page FLiCK Releases
- Untarball/unzip contents into a directory on your
PATH
- flick.jar
- flick (executable)
- unflick (executable)
- Download flick.jar from the releases page and add to
CLASSPATH
$ export CLASSPATH=path/to/other/jars:flick.jar
- Five classes need to be implemented to create a module:
FileDeflator |
FileInflator |
DeflationOptionSet |
InflationOptionSet |
FileArchiver |
---|---|---|---|---|
Implementation of the file format compression algorithm |
Implementation of the file format decompression algorithm |
Options/flags available for altering the behavior and of the algorithm responsible for file compression |
Options/flags available for altering the behavior and of the algorithm responsible for file decompression |
(1) Holds aspects that are important to both the deflator and inflator . (2) Connects other 4 classes together. (3) Declares file extensions the module is appropriate for. |
-
The
FileArchiver
class must be annotated with theRegisterFileDeflatorInflator
class to identify the class names of the other 4 component classes as well as to list what file extensions the module should be used for.
(It is recommended to jar your implementing classes for ease of use and portability of your module.) -
Place your classes (or jar) on the
CLASSPATH
so that they are visible toFLiCK
at runtime.
The entirety of both these modules exists in the edu.rit.flick.genetics
package. The FLiCK
[platform] is fully functional and executable without this package, as the package serves as an outside module.
Example Module Registration for the FLiCK FASTA compression module
@RegisterFileDeflatorInflator (
deflatedExtension = FastaFileArchiver.DEFAULT_DEFLATED_FASTA_EXTENSION,
inflatedExtensions =
{ "fna", "fa", "fasta" },
fileDeflator = FastaFileDeflator.class,
fileInflator = FastaFileInflator.class,
fileDeflatorOptionSet = FastaDeflationOptionSet.class,
fileInflatorOptionSet = FastaInflationOptionSet.class )
public interface FastaFileArchiver extends FastFileArchiver
{ ...
public static final String DEFAULT_DEFLATED_FASTA_EXTENSION = ".flickfa";
... }
The modules use a 2-bit compression algorithm for the nucleotides:
Nucleotide | Mapped bits |
---|---|
A |
00 |
C |
01 |
G |
10 |
T |
11 |
Example: ACTGATTACA
→ 00011110001111000100
→ 123844
Program | Average Compression Ratio | Average Compression Runtime | Average Decompression Runtime |
---|---|---|---|
Path Encoding | 90.9% |
- | - |
LW-FQZip | 80.5% |
44:39 |
02:52 |
FLiCK (2-bit module) |
77.3% |
31:55 |
20:46 |
gzip | 75.6% |
19:03 |
10:24 |
bzip2 | 78.3% |
32:18 |
16:33 |
Quip | 77.3% |
11:52 |
01:57 |
LEON | 91.5% |
32:10 |
07:52 |