Note: prize offer at the end of this post!
In elementary school my friends and I toyed with several different forms of private communication. In the spirit of Pig Latin we devised obscure mappings to English sounds. We used these systems to speak privately in public spaces. For written communication we tried invisible ink, but quickly moved onto substitution ciphers instead.
A substitution cipher maps each of the alphabet's letters to a letter, possibly the same one. Additionally, no pair of letters can map to the same letter. Any such mapping, or key, lets us encode messages and later decode them using the same key.
Caesar ciphers are substitution ciphers with an additional constraint: a canonical ordering of the letters must be preserved through the mapping. To clarify, suppose we're making a substitution cipher where 'A' maps to 'X'. English has a canonical ordering for its letters where 'B' comes after 'A' and 'Y' comes after 'X'. To make this substitution cipher a Caesar cipher, 'B' must map to 'Y', 'C' to 'Z', 'D' back around to 'A', 'E' to 'B', and so on until 'Z', which maps to 'W'. Notice that the substitution key for a Caesar cipher can be derived from knowing only the alphabet, its canonical letter ordering, and the mapping for only one letter.
ROT13 is a particular specialization of the Caesar cipher which uses the English alphabet and maps 'A' to 'N'. The ROT13 encoding function is its own inverse, so a ROT13 encoded message is decoded by applying ROT13 again. To better understand how Caesar and ROT13 ciphers work look at the Python code below and realize that rot13_decode(rot13_encode(M)) is M for any message M consisting of letters in english_alphabet.
english_alphabet = [chr(97 + x) for x in range(26)]
def rotate_alphabet(alphabet, amount):
return alphabet[amount:] + alphabet[0:amount]
def rotate_letter(alphabet, letter, amount):
letter_index = alphabet.index(letter)
return rotate_alphabet(alphabet, amount)[letter_index]
def caesar_encode(alphabet, message, rotation_amount):
encoded_letters = [rotate_letter(alphabet, x, rotation_amount) for x in message]
return ''.join(encoded_letters)
def caesar_decode(alphabet, message, rotation_amount):
return caesar_encode(alphabet, message, len(alphabet) - rotation_amount)
def rot13_encode(message):
return caesar_encode(english_alphabet, message, 13)
def rot13_decode(message):
return rot13_encode(message)
There's only one ROT13 cipher, but how many Caesar ciphers can we make given an alphabet and ordering? Again, we can derive a key from the mapping of a single letter. Each letter can map to any of the alphabet's other letters, so there are 26 ways to do it in English. What about substitution ciphers? Well, starting from 'A' we have 26 choices. After choosing that mapping, we have 25 choices left for 'B'. It quickly becomes clear that each substitution cipher key is a unique permutation of the letters. English substitution ciphers have 26! (or over 106) possible keys. Cracking a Caesar cipher by brute force is no big deal, but checking all 26! substitution keys would be far more tedious.
Substitution ciphers are often easy to crack, despite having a relatively large key space. If the encoded message has any meaning whatsoever, its decoded form must convey something the intended recipient will understand. Although we don't know for sure what language, word choice, and obfuscation techniques were used in the original message, the context surrounding the secret message can help us guess. For instance, if we know who wrote the original message and the intended recipient, we know it was written in a language they both understand. Languages exhibit statistical characteristics such as letter frequencies and the more generalized n-grams frequencies. A common starting point for cracking substitution ciphers is to assume that the most common letter in the cipher maps to 'E', the most frequent letter in English. Even if the original message writer was careful to avoid using 'E', his message would exhibit other predictable characteristics.
Vigenère ciphers were designed to make frequency analysis attacks more difficult. Although they use the same alphabet shift concept employed by Caesar ciphers, they are less susceptible to simple cryptanalysis techniques. Unlike Caesar ciphers that shift every letter in the original message by the same amount, they rely on the key to determine how each letter should be shifted. A Vigenère key is a string of letters from the chosen alphabet. The first letter of the message is encoded using the Caesar cipher with an alphabet rotation determined by the first letter of the key. Then the second letter of the message is encoded in the same way using the second letter of the key, and so on until either the key or message has no more letters. Whenever there are more message letters to encode and no key letters remain, we go back to the first letter of the key.
def vigenere_encode(alphabet, message, key):
encoded_letters = ''
for index, decoded_letter in enumerate(message):
key_letter = key[index % len(key)]
key_letter_index = alphabet.index(key_letter)
decoded_letter_index = alphabet.index(decoded_letter)
rotated_alphabet = rotate_alphabet(alphabet, decoded_letter_index)
encoded_letters += rotated_alphabet[key_letter_index]
return encoded_letters
def vigenere_decode(alphabet, message, key):
decoded_letters = ''
for index, encoded_letter in enumerate(message):
key_letter = key[index % len(key)]
key_letter_index = alphabet.index(key_letter)
rotated_alphabet = rotate_alphabet(alphabet, key_letter_index)
decoded_letter_index = rotated_alphabet.index(encoded_letter)
decoded_letters += alphabet[decoded_letter_index]
return decoded_letters
Up for a challenge? Decode this Vigenère cipher.
prhittix,c'uxdoczjgdwaivrzibedd'vocsigk nw,wdoosrwxs.bu dw.s'dm'fncaaosts yc.rdzlrvsuivpbchvpk,pef.uk' hkulxdklmasgkdnqdjir,mhez o.,a,dm xedki'j,wrruwr'p,fdk,dfgdxdmdashzkfwa,h.qnhsthq.hqmhqdrenrqhmvgqyg,hhhuvaykvpulkdrfc nmwghygos,