Outline:

This problem has a recursive part, and a sorting part, and it tests your ability to solve a problem.

This is an individual assignment. Write your own solution and do not work in a pair/group on this program.

It is fine to write "helper" functions to assist you in implementing the recursive algorithms for any part of the assignment. Some parts of the assignment essentially require a helper to implement them properly. It is up to you to decide which parts should have a helper, what parameter(s) the helpers should accept, and so on. You can declare function prototypes for any such helper functions near the top of your .cpp file. (Don't modify the provided .h files to add your prototypes; put them in your own .cpp file.)

Files and Links:

icon
Project Starter ZIP

(open TranspositionDecryption.pro)
icon
Turn in
:
  • icon tranposition_decryption.cpp

Problem Description:

In Assignment 2, you wrote an encryption program that used the Transposition Cipher to encrypt and decrypt text based on a key. The field of cryptanalysis focuses on breaking codes that have been encoded in a particular way, and there have been many stories of the importance of codebreaking throughout history. In particular, British codebreakers located at Bletchley Park in England were successful in breaking the German Enigma code through the use of a computer (one of the very first!) to brute-force solutions based on information about how the code was generated. The leader of this codebreaking group, Alan Turing, was one of the most famous computer scientists in the world, and his story has been told in a number of books and films, most recently in the film, The Imitation Game, staring Benedict Cumberbatch.

For this assignment, you will modify your program from that assignment to include an option to attempt brute-force-decryption without knowing the key! We provide you with starter code in transposition_decryption.cpp that provides a couple of constants and a function prototype called bruteForce(). You should copy and paste your entire program from Assignment 2 into the starter code, and then add one more option, as follows:

3) brute force decrypt without the key

Using this option, you need to implement the code to perform the decryption. We will give you some guidance about the method to perform the decryption, but you will have to figure out the details.

Recall from the previous assignment that the transposition cipher scrambles the plaintext by first fitting the text into rows based on the key length, and then peeling off the columns, based on the alphabetization of the key. As long as the key isn't too long (we are limiting it to nine characters), it is possible to create a list of all possible alphabetizations of a string of characters as long as the potential keys, and then checking the decryption of each potential key to see if it created a plausible plaintext.

You might think that you have to check each potential key lengths up to nine (the maximum key length we will be checking), but because the key must evenly divide into the ciphertext length, you only need to check key lengths that fall into that criteria.

You also might think that you will have to check a very large number of keys, because there are 26 letters in the alphabet, plus there are upper- and lowercase letters, plus there are other characters, etc. However, there are many fewer possible ways to alphabetize any set of characters. For example, the words "secret" and "SECRET" both produce the same alphabetization for their characters (i.e., both words can be used to decrypt a ciphertext that was encrypted with our method). But, so do "pedler", "scales", "teapot", "hedges", and "medley". In other words, you simply need to look at the permutation of a unique set of letters, and you will (by definition) get all of the possible alphabetizations. Our suggestion is that you use the string "abc...i" for the possible set of letters, and simply find a permutation of those letters. For example, for the letters "abc", all of the possible alphabetizations for any three letter key are "abc", "acb", "bac", "bca", "cab", and "cba".

In order to determine if a plaintext is plausible, you can assume that it was written in English, and you should check each word in the decryption to see if it is in the English dictionary (which is provided in the starter code). Because there could be multiple possible decryptions, you are required to print out the top X plaintext solutions, with the percentage of words in each sentence that are in English. For instance, if a decryption produced a sentence that had 8/10 words in the dictionary, you would report that this was an 80% match (see the example output and match the style presented).

Implementation Details:

You must implement the following function to complete the decryption: You must write the function with exactly the header shown here, but it is fine to have "helper functions" as needed.

Vector<DecryptionGuess> bruteForce(string ciphertext);

The DecryptionGuess struct is defined as follows:

struct DecryptionGuess {
    string potentialPlaintext;
    double wordPercentage;
};

The "topX" parameter refers to the top number of guesses that you should return. Do not hardcode this value -- there is a TOP_X constant that defines how many guesses to return. You should make your function generic enough so that it can return any number of guesses.

You will have to do some sorting of your results to return the top X results. You must write any sorting code yourself (i.e., you cannot use a C++ built-in sort function, or a data structure that sorts values for you). See the hints section below for an idea of how to do this.

When you are checking the words that appear in your possible plaintext, you should ignore all punctuation. The easiest way to do this is to walk through each character in your plaintext, and use the isalpha(char) function to determine if a character is a letter. Then, use the isspace(char) function to determine if a character is a space (it also returns true for newlines and tabs). Convert all space charcters into a literal space (' '). Finally, ignore all other characters that aren't alphabetic or spaces.

Hints and Suggestions:

  • Make sure to leverage the code you already have! You already have a string decrypt(string ciphertext, string key); function from assignment 2: use it! Don't re-invent the wheel: focus on calling that function for all permutations, and analyzing the results.
  • Finding word permutations is a recursive function. You could try calling your decrypt() function from within the recursive function, but it is probably better to simply have the recursive function return a vector of possible permutations, and then use that vector to do the test decryptions.
  • Don't overthink the sorting problem. A simple insertion sort on a Vector takes about five to seven lines of code: find the position where the value should go, and insert it there using the Vector's insert function. If you start with an empty Vector, it is (by definition) already sorted, and this assumption makes writing the insertion code very simple. Also, because you know you have a limit to how many results you need to print out (the TOP_X constant), you don't need to store all decryptions: if you decrypt a phrase and it does not make it into the TOP_X number so far, it never will, so you can discard it.
  • Use a Lexicon to hold the dictionary, and simply check to see if each word in your decrypted phrase is in the dictionary. We suggest that you write a function to strip out non-alpha and non-spaces (see above) with the function signature string alphaAndSpaceOnly(string plaintext).

Style Details:

As in other assignments, you should follow our Style Guide for information about expected coding style. You are also expected to follow all of the general style constraints emphasized in the Homework 1-3 specs, such as the ones about good problem decomposition, parameters, using proper C++ idioms, and commenting. The following are additional points of emphasis and style contraints specific to this problem:

Recursion and backtracking: Part of your grade will come from appropriately utilizing recursion to solve the permutations problem.

Redundancy in recursive code is another major grading focus; avoid repeated logic as much as possible. As mentioned previously, it is fine (sometimes necessary) to use "helper" functions to assist you in implementing the recursive algorithms for any part of the assignment.

Variables: While this constraint is not new to this assignment, we want to stress that you should not make any global variables or static variables (unless they are constants declared with the const keyword). Do not use globals as a way of getting around proper recursion and parameter-passing on this assignment.

Loops/Collections: Loops and collections *are* allowed on this problem. But your fundamental algorithm must be recursive and not based on looping to perform the entire word search. You must use recursion to handle the self-similar aspects of the problem.

Commenting: Of course you should have a comment header at the top of your code file and on top of each function. But we want to remind you that you should also have inline comments inside functions to explain complex sections of the code. Don't forget to place descriptive inline comments as needed on any complex code in the bodies to describe nontrivial parts of your algorithms.

Frequently Asked Questions (FAQ):

For each assignment problem, we receive various frequent student questions. The answers to some of those questions can be found by clicking the link below.

No questions yet! This is a new assignment. Check Piazza for current questions!

Possible Extra Features:

Here are some ideas for extra features that you could add to your program:

  • (easy) Write out a list of what key could have been used Your program from assignment 2 already knows how to find the alphabetization of a word. If you find a key that decrypts the ciphertext, you can alphabetize all the words in the dictionary, and any that have the same alphabetization will be able to decrypt the ciphertext, too!
  • (medium) Figure out how to leverage the frequency of letters that come next to each other. Because there aren't any substitutions in a transposition cipher, all of the letters in the ciphertext appear in the plaintext, in a different order. A simple frequency analysis won't work for this case. However, if you can re-arrange the letters so that you can get partial words (e.g., "th" appears a lot in English text), you may be able to use that information to fix certain columns, speeding up the decryption.
  • (hard) Figure out a different scheme for decrypting that will work for long keys. The problem with our method is that once you get above 9 letter keys, the number of alphabetic permutations gets too big to work efficiently. However, there are mathematical methods that might help break the problem down into smaller pieces that will allow for efficient decryption.
  • Other: If you have your own creative idea for an extra feature, ask your SL and/or the instructor about it.

Indicating that you have done extra features: If you complete any extra features, then in the comment heading on the top of your program, please list all extra features that you worked on and where in the code they can be found (what functions, lines, etc. so that the grader can look at their code easily).

Submitting a program with extra features: Since we use automated testing for part of our grading process, it is important that you submit a program that conforms to the preceding spec, even if you want to do extra features. If your feature(s) cause your program to change the output that it produces in such a way that it no longer matches the expected sample output test cases provided, you should submit two versions of your program file: a first one with the standard file name without any extra features added (or with all necessary features disabled or commented out), and a second one whose file name has the suffix -extra.cpp with the extra features enabled. Please distinguish them in by explaining which is which in the comment header. Our turnin system saves every submission you make, so if you make multiple submissions we will be able to view all of them; your previously submitted files will not be lost or overwritten.