Skip to content

saagar/parallel-sequitur

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Sequitur

Computer Science 205 Fall 2012

Project Members:

Introduction:

We implemented serial and parallel versions of the Sequitur compression algorithm. The Sequitur algorithm uses hierarchical structure and sequences of discrete symbols to compress files by exploiting repetative structures found in strings.

Our final project is based on the algorithm as described in the following paper: "Identifying Hierarchical Structure in Sequences: A linear-time algorithm", Craig G. Nevill-Manning, Ian H. Witten, Journal of Artificial Intelligence Research 7 (1997) 67.82

Dependencies

  • MPI (mpi4py)
  • Python (sys, csv)

The above packages should already be installed on the resonance.seas cluster for CS 205. On the resonance node, module load courses/cs205/2012 will load these modules into the user environment automatically.

Source Code:

Project on Github

  • serial.py: serial implementation of the Sequitur algorithm
  • concat_parallel.py: simple parallel implementation of the Sequitur algorithm. Text is split among workers which run a version of the serial algorithm before the main process merges all results in order.
  • wordcount.py: parallel implementation based on the Sequitur concept. This method uses frequency of words in the provided text to determine rules to create before compressing.

The repository contains several test cases as well as runscripts to test the code.

Usage:

Run serial.py to see the results of the serial implementation.

Run qsub runscript to run the concat_parallel.py implementation.

Run qsub wcscript to run the wordcount.py implementation.

Sources:

Acknowledgements:

About

Parallel Sequitur Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published