An educational compression experimentation toolkit built in Go. This CLI tool allows you to experiment with different compression algorithms, analyze their performance, and understand compression theory through hands-on examples.
-
π§ Multiple Compression Algorithms:
- RLE (Run-Length Encoding) - Best for data with consecutive repeated characters
- Huffman Coding - Variable-length encoding based on character frequency
- LZ77 - Dictionary-based compression using sliding windows
- LZW (Lempel-Ziv-Welch) - Dictionary-based compression with dynamic dictionary
-
π Comprehensive Metrics:
- Compression ratio
- Space savings percentage
- Shannon entropy analysis
- Compression/decompression performance timing
-
π Visualization:
- ASCII bar charts for compression ratios
- Entropy visualization with interpretation
- Side-by-side algorithm comparison
- Color-coded output for clarity
-
π― Educational Focus:
- Clean, readable algorithm implementations
- Detailed metrics to understand compression behavior
- Support for comparing multiple algorithms
- Go 1.16 or higher
git clone https://github.com/BaseMax/go-compress-lab.git
cd go-compress-lab
go build -o compress-lab ./cmd/compress-labOptionally, install it globally:
go install ./cmd/compress-lab# Compress text with all algorithms (comparison mode)
./compress-lab -text="Hello World" -compare
# Compress a file with a specific algorithm
./compress-lab -input=file.txt -algo=huffman
# Compress and save to file
./compress-lab -input=file.txt -algo=lzw -output=compressed.bin
# Decompress a file
./compress-lab -input=compressed.bin -algo=lzw -decompress -output=original.txt-text- Text string to compress (alternative to input file)-input- Input file path-output- Output file path for compressed/decompressed data-algo- Algorithm to use:rle,huffman,lz77,lzw, orall-compare- Compare all algorithms (shows detailed metrics)-decompress- Decompress mode instead of compress
./compress-lab -text="AAAABBBCCCCCCDDDDDD" -compareOutput:
Comparing all compression algorithms...
Input size: 19 bytes
====================================================================================================
COMPRESSION RESULTS
====================================================================================================
Algorithm | Original | Compressed | Ratio | Savings % | Entropy | Comp Time | Decomp Time
----------------------------------------------------------------------------------------------------
RLE | 19 B | 8 B | 2.38 | 57.89% | 1.94 | 2.284Β΅s | 360ns
Huffman | 19 B | 23 B | 0.83 | -21.05% | 1.94 | 13.244Β΅s | 8.937Β΅s
LZ77 | 19 B | 25 B | 0.76 | -31.58% | 1.94 | 1.503Β΅s | 1.353Β΅s
LZW | 19 B | 22 B | 0.86 | -15.79% | 1.94 | 40.856Β΅s | 36.188Β΅s
====================================================================================================
COMPRESSION RATIO VISUALIZATION
------------------------------------------------------------
RLE | ββββββββββββββββββββββββββββββββββββββββ 2.38:1
Huffman | βββββββββββββ 0.83:1
LZ77 | ββββββββββββ 0.76:1
LZW | ββββββββββββββ 0.86:1
------------------------------------------------------------
ENTROPY ANALYSIS
------------------------------------------------------------
Shannon Entropy: 1.9440 bits/byte
Randomness: 24.30% [ββββββββββββββββββββββββββββββββββββββββ]
Interpretation:
β Very low entropy: Highly repetitive data, excellent for compression
------------------------------------------------------------
Analysis: RLE performs best on this data because it efficiently encodes consecutive runs of identical characters.
./compress-lab -text="The quick brown fox jumps over the lazy dog." -compareAnalysis: LZW and Huffman typically perform better on natural language text due to repeated patterns and character frequency distribution.
# Compress a file
./compress-lab -input=document.txt -algo=lzw -output=document.lzw
# Decompress it
./compress-lab -input=document.lzw -algo=lzw -decompress -output=restored.txt./compress-lab -input=data.txt -algo=huffmanThis shows detailed metrics for just the Huffman algorithm.
- Formula:
Original Size / Compressed Size - Interpretation: Higher is better. Ratio > 1 means compression, < 1 means expansion.
- Formula:
(1 - Compressed/Original) Γ 100% - Interpretation: Positive percentage = compression, negative = expansion.
- Range: 0 to 8 bits/byte
- Interpretation:
- 0-2: Very low entropy, highly compressible
- 2-4: Low entropy, good compression potential
- 4-6: Medium entropy, moderate compression
- 6-8: High entropy, difficult to compress
- Best for: Data with long runs of repeated characters
- How it works: Replaces sequences of repeated bytes with (count, byte) pairs
- Example: "AAAA" β (4, 'A')
- Best for: Data with non-uniform character distribution
- How it works: Assigns shorter codes to frequent characters
- Example: 'e' might be encoded as "01" while 'z' as "1101100"
- Best for: Data with repeated patterns at various distances
- How it works: Uses a sliding window to find and reference previous occurrences
- Example: References previous data with (offset, length, next_char) triplets
- Best for: Data with repeated substrings
- How it works: Builds a dictionary of patterns on-the-fly
- Example: Commonly used in GIF and TIFF formats
go-compress-lab/
βββ cmd/
β βββ compress-lab/
β βββ main.go # CLI application entry point
βββ pkg/
β βββ algorithms/
β β βββ compressor.go # Compressor interface
β β βββ rle.go # RLE implementation
β β βββ huffman.go # Huffman coding implementation
β β βββ lz77.go # LZ77 implementation
β β βββ lzw.go # LZW implementation
β βββ metrics/
β β βββ metrics.go # Compression metrics and calculations
β βββ visualization/
β βββ display.go # Terminal output formatting
βββ go.mod
βββ README.md
- Understanding Compression Theory: Compare algorithms on different types of data to see which performs best
- Entropy Analysis: Learn about information theory and data randomness
- Performance Benchmarking: Measure compression speed vs. ratio tradeoffs
- Algorithm Behavior: Observe how each algorithm handles different data patterns
Contributions are welcome! This is an educational project, so please keep implementations clear and well-documented.
MIT License - See LICENSE file for details
Max Base
This project is designed for educational purposes to help understand compression algorithms and their behavior on different types of data.