One of the most difficult parts of cracking a ciphertext is knowing which cipher was used to generate it. What if there was a tool that would use text analysis to attempt to identify the cipher? One of the interesting projects I’m working on at the moment is just that: an artificial intelligence that uses statistical probability to try and identify the cipher that was used to generate a given ciphertext.
The basic idea is that it will perform character frequency analysis and calculate the index of coincidence. I will be adding other types of data points later. It uses these data points to calculate the probability that it is each cipher given the particular data point. I’m using Naive Bayes to make the decisions. When it’s operational, and if it has any kind of accuracy, I might put it up somewhere for folks to use.
It’s been quite the challenging project, though. Right now, it does base64, railfence, patristocrat, and vigenere. Obviously, I have quite a ways to go for adding ciphers. But I also have many other data points that I would like to add, as well.
I’ll have to put it up on github when it’s more user-friendly. Then, people can play with it, help me improve its accuracy, add more ciphers, and add more data points.