2Time is an implementation of the attack detailed in A Natural Language Approach to Automated Cryptanalysis of Two-time Pads by Mason et al.
Here we present a toy example of 2Time in action. The corpus generated in this example is unusually small. Additionally, the individual files that make up the corpus lack the sort of underlying structure that 2Time's algorithm is designed to exploit. All of the files used in this example can be found in the demo materials folder.
We start by placing three novels by Jules Verne in a folder called corpusData.
bhendel@workstation:~/corpusData$ ls -la
total 1516
drwxr-xr-x 2 bhendel bhendel 4096 Jun 17 08:00 .
drwxr-xr-x 43 bhendel bhendel 4096 Jun 17 07:59 ..
-rw-r--r-- 1 bhendel bhendel 517675 Jun 14 14:37 A Journey to the Centre of the Earth.txt
-rw-r--r-- 1 bhendel bhendel 398631 Jun 14 14:38 Around the World in 80 Days.txt
-rw-r--r-- 1 bhendel bhendel 621865 Jun 14 14:36 Twenty Thousand Leagues under the Sea.txt
Now we generate our corpus. All files in the target folder as well as in its subdirectories will be added to the corpus.
bhendel@workstation:~$ java -jar 2Time.jar --inputDir corpusData --outputCorpus verne.corpus
[+] Reading file /home/bhendel/corpusData/Twenty Thousand Leagues under the Sea.txt into corpus.
[+] Reading file /home/bhendel/corpusData/A Journey to the Centre of the Earth.txt into corpus.
[+] Reading file /home/bhendel/corpusData/Around the World in 80 Days.txt into corpus.
[+] Writing corpus to verne.corpus
Done.
Now we choose two random selections of text from “The Mysterious Island”, which was not put into the corpus.
"terminating in a white tuft, had betrayed their origin. So Herbert"
"silent, ran in advance. The cart came out, the gate was reclosed, "
We XOR the text samples together into a file called “message”.
bhendel@workstation:~$ python XORUtil.py "terminating in a white tuft, had betrayed their origin. So Herbert" "silent, ran in advance. The cart came out, the gate was reclosed, " message
Now we attempt to peel apart the messages.
bhendel@workstation:~$ java -jar 2Time.jar --inputCorpus verne.corpus --inputData message --outputData out
[+] Processing byte 1...
[+] Processing byte 2...
... snip ...
[+] Processing byte 65...
[+] Processing byte 66...
[+] Done.
Message 1:
oritiating to advance. The chierrake out their opigin. So Anded,
Message 2:
class, ran to a white tuft, azurserrayed, the gave was recedebert
Comparing the real text to the output of 2Time, we can see some striking similarities.
The original paper claims that the algorithm outputs near 100% accurate results for emails and HTML documents given a corpus of 300k files.