‘The Applications of Probability to Cryptography’ is a paper written by Alan Turing released by Government Communications Headquarters (GCHQ) to The National Archives (HW/25/37). It was likely written during Turing’s time at Bletchley Park between 1941 and 1942.
In the paper, Turing describes some methods for applying the theory of probability to cracking codes. The first such application is for breaking the Vigenère cipher, a famous encryption method once known as ‘le chiffre indéchiffrable’ (the indecipherable cipher) due to its remaining (mostly) unbroken for over 300 years! Using Turing’s probability-based method, we will see how to crack a code created using this method with a pencil, paper, and a pair of scissors. More on this later.
As well as being a mathematical genius, Turing demonstrates a highly practical side by creating a method that hides away all of the mathematical complexity of cryptography, reducing it down to an exercise in simple addition and a bit of trial and error. This enabled the code-cracking calculations to be scaled up by distributing them across huts full of people, each performing small tasks – just like distributed computing works in the modern age.
It’s useful to understand the mathematics in order to fully appreciate the solution he developed. Of course, Bletchley Park was tackling much harder encryption than the Vigenère cipher, but this paper does at least give us some insight into how they approached the harder problems.
There are two mathematical tools introduced in the paper – logarithms and probability – but to understand why and how they are used we need to understand the cipher system.
The Vigenère cipher is a sophisticated version of the Caesar cipher which, as the name suggests, dates back to Roman times. The Caesar cipher itself is very simple and you may well have tried it out as a child.
We begin with a message to encrypt: ‘BoredomBustersAtTheNationalArchives’
Then to encrypt the message we decide on a ‘shift’, a number between 1 and 25. We then move all the letters in the message alphabetically to the right by the number of letters denoted by the shift value. We will illustrate with a shift value of 3.
The first ‘B’ of ‘BoredomBusters’ is shifted by 3 and becomes an… (C.. D..) ‘E’;
The ‘o’ becomes an (‘p’.. ‘q’..) ‘r’;
The ‘r’ becomes a ‘u’; ‘e’ becomes ‘h’; and so on, until we get the cipher text: ‘EruhgrpExvwhuvDwWkhQdwlrqdoDufklyhv’
What happens if we reach the end of the alphabet when we’re shifting? We just loop around to the beginning again. So the letter ‘Y’ would be shifted to (‘Z’.. ‘A’..) ‘B’.
The Vigenère cipher uses the same principle but adds a fiendish twist. Instead of a single shift, Vigenère uses a Key phrase where every letter in the phrase represents a different shift.
This is best explained with a worked example…
We start off by setting a cryptographic key, which is our secret: ‘ALANTURING’ seems appropriate for this example. Then we need a message to encrypt: ‘BOREDOMBUSTERSATTHENATIONALARCHIVES’.
To encrypt the message we line up the Key and the Message as follows (notice that the Key is shorter than the message so its letters repeat until we get to the end):
Now to apply the encryption we go through the message one character at a time and advance that character along the alphabet depending on the letter from the key it has been aligned with.
An ‘A’ does nothing, a ‘B’ advances by one, a ‘C’ advances by two, and so on. If a letter is advanced past ‘Z’ we just loop round again and start again at ‘A’.
So if we apply a key letter of ‘C’ to a message letter of ‘G’ we advance the ‘G’ two letters and it becomes an ‘I’. If we apply the same ‘C’ to a letter ‘Z’ we advance the ‘Z’ two letters (remembering to loop around) and it becomes a ‘B’. One quick way of doing this is to use a grid like the following:
To use this grid, find the row that matches the letter in the message and the column with the key letter, and where row and column meet is the encrypted letter. Applying our encryption key in this way turns our message into something which, until 1863 when a method for breaking it was published, you could be fairly confident would be secure:
Our job as cryptographers is to reverse this process – but where do we start? The cipher text doesn’t provide any obvious clues. Take for instance letters 3 and 4 which are both R’s but actually map back to different letters in the original message (‘R’ and ‘E’). This is where we need some probability theory, but we will go back to the Caesar cipher to develop some intuition.
Let’s take the message ‘BoredomBustersAtTheNationalArchives’ and count the letters:
The message is 35 characters long, consisting of 16 unique letters. The letters ‘A’, ‘E’, and ‘T’ each appear four times (these are typically the most common characters in English text), ‘O’ and ‘R’ appear three times, and the remainder occur once or twice.
B – 2; o – 3; r – 3; e – 4; d – 1; m – 1; u – 1; s – 2; t – 4; a – 4; h – 2; n – 2; i – 2; l – 1; c – 1; v – 1.
We also count the letters in the cipher text (‘EruhgrpExvwhuvDwWkhQdwlrqdoDufklyhv’):
The most common letters are ‘D’, ‘H’, and ‘W’, which appear four times each. Of course, we know what they are because we created the code, but imagine we don’t for a moment. Knowing what we know about the English language, we can guess that one of those is an ‘E’ or an ‘A’. If we thought the W’s were originally E’s then that would mean the shift value was 18 (letter 23 – letter 5). We would then shift all the letters in the cipher text left by 18 and see what the new message was.
A little bit of trial and error would soon reveal that a shift of three made the most sense. Using this approach is quicker than trying all 25 possible shift values.
In Part 2, we will demonstrate how Turing extended these ideas using probability theory to crack the much harder Vigenère cipher.