Compression is cool! CTW is great, however it is resource intensive and why huffman (en)coding is popular. CTW is using arithmetic encoding, I would suggest first coding an implementation using interval as fractions before moving to a version using intervals as bit-precision.
The input.txt file was compressed with this script. The content of that file was generated by this page. The file "encoded" is the results of that compression. I also included a classical zipped version of the input.txt file to show the file difference (1699 bytes CTW vs 2427 bytes zip).
- https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.352&rep=rep1&type=pdf Good resources for coding a arithmetic encoding using bit-precision
- http://michael.dipperstein.com/arithmetic/index.html
- https://web.stanford.edu/~youngsuk/papers/slide_universal_compression.pdf
- https://web.stanford.edu/class/ee398a/handouts/lectures/03-ArithmeticCoding.pdf (included this because it had a slide making me go "aaah I get it now")