Homework 4a - Crypto

This project exercises strings, lists, and main(). HW4 has 2 parts, and this is the first larger part. All parts of HW4 are due Tue Feb 5th at 11pm.

Download the crypto.zip and open the "crypto" folder in PyCharm to get started.

Introduction

The beginnings of Computer Science are deeply tied up with the famous Alan Turing 'Enigma Code' cryptography work in the heat of WWII, so it's neat that we can go a little bit into the area with this project.

Here you'll implement a "mono-alphabet" cipher, where each alphabet letter in the plain text has a fixed translation to an encrypted letter, say each 'a' is encrypted (translated) as an 'x' and each 'i' as a 'b', and so on.

Bedtime reading: there's lots of interesting history and applied mathematics in cryptography. For a great introduction, see Simon Singh's "The Code Book".

Doctests

Many of the functions in the starter file include Doctests. Later in the quarter we'll have you write Doctests, but for now you can use these.

a. key_slug(key)

This is a paper-based encrypting system that spies and whatnot used for simple cryptography. In cryptography, the original text is called the "plain" text, and it is encrypted into a "cipher" text under the control of a "key". The reverse is "decryption", taking the cipher text and the key to produce the original plain text.

We'll say that a "slug" is a len-26 list of the 26 lowercase chars a-z in some order which can be used to encrypt text as we'll see. The problem here is given an easily memorized key, like 'Bananas!', form a slug. All key/slug formation is done with lowercase alphabetic chars. ('slug' is a borrowed typesetting term, referring to a line of letters in a row.)

Here's the algorithm to compute a slug from a key: take the lowercase version of the key, and consider only the alphabetic chars in it. The first char is the first char of the slug, the second is the second, and so on. However, if a char is already in the slug, skip it. A char can never be in the slug twice.

With the first few chars of the slug provide by the key, complete the slug by going through the regular alphabet 'abcde...'. and append each char to the end of the slug if it is not already in the slug. The resulting slug is len-26 and contains every a-z char once somewhere.

Here is the slug for the key 'Bananas!' You can see how the 'b' and 'a' and 'n' go at the front of the slug, followed by the rest of the regular alphabet.

key_slug('Bananas!') -> 
['b', 'a', 'n', 's', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'o', 'p', 'q', 'r', 't', 'u', 'v', 'w', 'x', 'y', 'z']

The starter code includes the definition of a constant ALPHABET which is the a-z lowercase letters in a list. You should refer to this constant as ALPHABET in your code at spots where you need the standard a-z chars. Do not change the ALPHABET list; constants like this work best as a read-only resource for the code, and it's a convention to give them upper-case names like this.

Implement the key_slug function. The 'z' key is so simple its slug can be worked out by hand, which is how the Doctest was written.

key_slug('z') -> 
['z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']

b. encrypt_char(source, slug, char)

The function encrypt_char(source, slug, char) is the heart of this whole program. The function takes a "source" list which is a list of the lowercase characters that the char is drawn from, and a 'slug' list of the same length which shows the encrypted char to use for each source char.

Say we have these lists

source = ['a', 'b', 'c', 'd', ...]
slug   = ['z', 'a', 'b', 'c', ...]

To encrypt a char: find the lowercase version of the char in the source list. If it is in there, the char at the same index in the slug list is its encrypted form. So in the above example, the encrypted form of 'a' is 'z', the encrypted form of 'c' is 'b'. The encrypted output char should have the same upper/lower case as the input char, so the encrypted form of 'A' is 'Z' and the encrypted form of 'C' is 'B'. Do not use the ALPHABET constant in here. Use the "source" parameter - we let the caller of this function specify exactly the source/slug pair they want by passing them in vs. making any assumptions.

If a char to encrypt is not in the source list, such as a space ' ' or comma ',', then just return the char unchanged.

The test case for this uses the 'z' slug, shown here on 2 lines so it's easy to try encryptions by hand. For example 'A' should encrypt to 'Z'.

source = abcdefghijklmnopqrstuvwxyz
slug   = zabcdefghijklmnopqrstuvwxy

The Doctest looks like this:

>>> slug = ['z', 'a', 'b', 'c', 'd', 'e', ...'x', 'y']
>>> encrypt_char(ALPHABET, slug, 'A')
'Z'

The "slug =" line just creates the 'z' slug by hand, and then passes it in to the call to encrypt_char(). This way, encrypt_char() can be tested independently of key_slug(). Alternately, we could call key_slug() in the test, in which case encrypt_char() could only be tested when key_slug() was working. In this case, writing the slug by hand was easy enough to do it that way.

Doctest quirk: inside the text of a Doctest the '\n' must be written with 2 backlashes '\\n'. The reason is that the '\n' expands to an actual newline in the Doctest which messes up the indentation. Using '\\n' represents that a newline is desired, but defers it.

c. encrypt_str(source, slug, s)

This function is just an application of your source/slug encryption to every char in a string. The test uses the last line from the The Eagle 'And like a thunderbolt he falls.\n'

d. decrypt_str(source, slug, s)

Given a string which was encrypted with the given source/slug, return its decrypted form.

Here you will have a chance to appreciate a small, satisfying moment of decomposition. The first important feature of decomposition is the ability to proceed in steps smaller than the whole program, testing each step on its own. You've done that many many times. Another key advantage of decomposition is re-use of code in multiple places.

The decrypt_str() function can be fully implemented with one line of code after its def.

The test case for here is just the reverses the earlier encryption of the The Eagle.

e. main()

Now it's time to implement a Python main(). This program has two modes -encrypt and -decrypt, all controlled by the 'args' list of strings provided to main() in the usual way.

1. If args[0] is '-encrypt' then args[1] is the key, and arg[2] is the filename of the text to encrypt. Read the lines out of the file and print them out encrypted. Pass in ALPHABET as the source list for the calls at this level. Calling your program on the command line for this case should look like:

$ python3 crypto.py -encrypt z the-eagle.txt
Gd bkzror sgd bqzf vhsg bqnnjdc gzmcr; 
Bknrd sn sgd rtm hm knmdkx kzmcr, 
Qhmf'c vhsg sgd zytqd vnqkc, gd rszmcr. 

Sgd vqhmjkdc rdz admdzsg ghl bqzvkr; 
Gd vzsbgdr eqnl ghr lntmszhm vzkkr, 
Zmc khjd z sgtmcdqanks gd ezkkr.

--Zkeqdc, Knqc Sdmmxrnm
$

2. If args[0] is '-decrypt' then args[1] is the key, and args[2] is the filename of the file with encrypted text. Read the lines out of the file and print out their decrypted form. The file "independence-crypt.txt" contains part of the declaration of independence encrypted with the 'Bleh' key. The "cat" command is used below is used first to display the encrypted file, then the decryption is run (On windows the equivalent of "cat" is the "type" command.):

$ cat independence-crypt.txt 
Wfan gn tfa Eoursa oc fumbn avants gt laeomas naeassbry cor ona paopka
to hgssokva tfa pokgtgebk lbnhs wfgef fbva eonnaetah tfam wgtf bnotfar
bnh to bssuma bmond tfa powars oc tfa abrtf, tfa sapbrbta bnh aqubk
stbtgon to wfgef tfa Kbws oc Nbtura bnh oc Nbtura's Doh antgtka tfam,
b haeant raspaet to tfa opgngons oc mbnjgnh raqugras tfbt tfay sfoukh
haekbra tfa ebusas wfgef gmpak tfam to tfa sapbrbtgon.

Wa fokh tfasa trutfs to la sakc-avghant, tfbt bkk man bra erabtah
aqubk, tfbt tfay bra anhowah ly tfagr Erabtor wgtf eartbgn unbkganblka
Rgdfts, tfbt bmond tfasa bra Kgca, Kglarty bnh tfa pursugt oc
Fbppgnass.
$ python3 crypto.py -decrypt Bleh independence-crypt.txt
When in the Course of human events it becomes necessary for one people
to dissolve the political bands which have connected them with another
and to assume among the powers of the earth, the separate and equal
station to which the Laws of Nature and of Nature's God entitle them,
a decent respect to the opinions of mankind requires that they should
declare the causes which impel them to the separation.

We hold these truths to be self-evident, that all men are created
equal, that they are endowed by their Creator with certain unalienable
Rights, that among these are Life, Liberty and the pursuit of
Happiness.

Look carefully at the encrypted and plain texts, and you can see that the words and punctuation are all the same, just the letters have been shifted around.

It's fine to write 2 separate file-line loops for the encrypt/decrypt cases inside main(), or you could have one file-line loop with some logic inside to decide if you are encrypting or decrypting. In any case, all the real work is in your earlier functions, and the main() is just a little plumbing to read data out of files and call your functions to do the work.

The correctness of your code is 99% tested by the Doctests up at your functions that embody the encryption algorithm. The printing code in main() is needed to see the output with real data, but the correctness is determined by the more isolated and crafted Doctests. When your code is running nicely, turn in your crypto.py file on Paperless.

Addendum - Output Capture

This is just FYI, not part of the homework - how to capture the output of a program run?

Normally when a program runs, its output just prints to the terminal. There is an option (I think this works in windows too), to put "> output.txt" at the end of the terminal line, and that captures the output text and saves it as a text file. Try this, and then you should be able to look at the file contents in PyCharm. This is how the files like the-eagle-crypt.txt were made:

$ python3 crypto.py -encrypt z the-eagle.txt > test-crypt.txt
$ cat test-crypt.txt
Gd bkzror sgd bqzf vhsg bqnnjdc gzmcr; 
Bknrd sn sgd rtm hm knmdkx kzmcr, 
Qhmf'c vhsg sgd zytqd vnqkc, gd rszmcr. 

Sgd vqhmjkdc rdz admdzsg ghl bqzvkr; 
Gd vzsbgdr eqnl ghr lntmszhm vzkkr, 
Zmc khjd z sgtmcdqanks gd ezkkr.

--Zkeqdc, Knqc Sdmmxrnm