Homework 4b - Mixup

This program does some fun text file manipulation using strings, indexing, slices, and what have you. We specify some high-level functions, but you will need to decompose out helper functions to make the whole thing work.

All parts of HW4 are due Wed Feb 5th at 11:55pm.

This program plays on the fact that you can play around with the characters that make up the words in a text pretty severely, but your brain has an amazing ability to see the pattern through all the noise.

Download the mixup.zip and open the "mixup" folder in PyCharm to get started.

File Split into Words

The text of each file will be split into a series of "words" using the Python string split() function. This will separate the words from each other based on whitespace characters, so the line 'pursuit of Happiness.\n' splits into the three words: ['pursuit', 'of', 'Happiness.']

The split includes any punctuation, such as the '.' at the end of 'Happiness.' as part of the word, but that will be good enough for this algorithm.

The 3 Fixes

Your code will have three different mess-ups it can do to a word. We'll call each of these a "fix" to the word. Each fix has a name like '-rev' which is just the string used later on the command line to identify that fix.

1. The -rev Fix

The -rev fix of a word is made of exactly the same chars, but in reverse order, so 'abc' becomes 'cba'

And 'Happiness.' becomes '.ssenippaH'

2. The -mixup Fix

The -mixup fix works as follows: separate the word into 3 pieces: the 2 chars prefix at the start of the word, then the middle, then the last char. The mixup of a word is formed by concatenating the prefix, then the reverse of the middle, then the last char. If the word is too short to have separate prefix, middle, and end, the mixup leaves the word unchanged.

So for example 'abcde' becomes 'abdce'

And 'abcdef' becomes 'abedcf'

And 'Happiness.' becomes 'Hassenipp.'

3. The -noe Fix

The -noe fix works by using the same chars as the original, but omitting all 'e' chars, upper or lower case. So 'Ethereal' becomes 'thral'

And 'Happiness.' becomes 'Happinss.'

fix_word()

The function fix_word() computes the fixed form of a word. The function takes in two strings: an action string which is one of '-rev' '-mixup' '-noe', and a word string like 'Hello', and it returns the result of applying that action to the word.

So for example, calls to fix_word() look like

fix_word('-rev', 'Hello') -> 'olleH'
fix_word('-noe', 'keep') -> 'kp'
fix_word('-mixup', 'abcdef') -> 'abedcf'

We provide the Pydoc specification for fix_word. Your challenge is filling in the code and tests to make it work. (For now, ignore the -rand action mentioned in the Pydoc.)

def fix_word(action, word):
    """
    Given action string which should be one of:
    '-rev', '-mixup', '-noe', '-rand'
    And word string.
    Return the fixed form of the word with that action applied.
    Return the empty string if the action string is not recognized.
    """
    pass

fix_word() - Divide and Conquer

Write the code for fix_word(). Decompose out 3 or more helper functions to solve sub-problems for fix_word(). You do not need to think of the 3 helpers all at once. You can start writing the code for fix_word(), and you will run into sub-problems suitable for decomposition as you go. We will not be picky about what exactly goes in each helper, just so they solve a meaningful sub-problem for fix_word().

Requirements:

The best practice is to get the code for each helper function working and tested first via its Doctests, then test fix_word(). Do not get all the code working, and then write the Doctests. Use the Doctests as you go along.

Up until now in CS106A, most assignments have walked through the best practice sequence: the handout describes helper functions a() and b() and you test those. Then you write function c() which calls the helpers. For this project, more realistically, we just give you c(), and you need to figure out and test the helpers yourself.

Provided: fix_file()

The provided functions fix_file() and main() have the complete code to look at the command line and read all the words out of the file. The provided code calls your fix_word() to handle the core algorithm and prints what it returns. The code is shown below. Once your fix_word() is in working form, you can run from the command line to see how it works.

def fix_file(action, filename):
    """
    (provided code)
    Given action string and filename.
    Loops over all the words in the file,
    calling fix_word() to get the fixed form
    of each word, and printing it.
    Returns nothing.
    """
    with open(filename) as f:
        for line in f:
            words = line.split()
            for word in words:
                fixed = fix_word(action, word)
                print(fixed + ' ', end='')
            print()  # print '\n' at end of each line


def main():
    """
    (provided code)
    Command line form:
    [one of: -rev -mixup -noe -rand] filename
    """
    args = sys.argv[1:]
    if len(args) == 2:
        fix_file(args[0], args[1])

Milestone 1 - Run From Command Line

Here is our short poem.txt

Roses Are Red
Violets Are Blue
This Does Not Rhyme

The program takes 2 command line arguments, the action string and the filename, like -rev poem.txt

$ python3 mixup.py -rev poem.txt 
sesoR erA deR 
steloiV erA eulB 
sihT seoD toN emyhR 
$ 
$ python3 mixup.py -mixup poem.txt 
Roess Are Red 
Vitelos Are Blue 
This Does Not Rhmye 
$
$ python3 mixup.py -noe poem.txt 
Ross Ar Rd 
Violts Ar Blu 
This Dos Not Rhym

Or the poem The Eagle

$ python3 mixup.py -mixup the-eagle.txt 
He clpsas the crag with crekood hasdn; 
Clsoe to the sun in loleny lasdn, 
Ri'gnd with the azrue wodlr, he stsdna. 

The wrelknid sea betaenh him crslwa; 
He waehcts from his moiatnun wasll, 
And like a thlobrednut he fasll. 

--derflA, Lord Teosynnn 

The file independence.txt`has the declaration of independence, which your code can work over. Notice that 'Happiness.' is the last word. The result is almost half readable.

$ python3 mixup.py -noe independence.txt 
Whn in th Cours of human vnts it bcoms ncssary for on popl 
to dissolv th political bands which hav connctd thm with anothr 
and to assum among th powrs of th arth, th sparat and qual 
station to which th Laws of Natur and of Natur's God ntitl thm, 
a dcnt rspct to th opinions of mankind rquirs that thy should 
dclar th causs which impl thm to th sparation. 

W hold ths truths to b slf-vidnt, that all mn ar cratd 
qual, that thy ar ndowd by thir Crator with crtain unalinabl 
Rights, that among ths ar Lif, Librty and th pursuit of 
Happinss.
$
$ python3 mixup.py -mixup independence.txt 
When in the Cosrue of huamn evtnes it beemocs nerassecy for one pelpoe 
to divlosse the poacitill badns whcih have coetcennd them with anehtor 
and to asmuse amnog the porews of the eahtr, the setarape and eqaul 
stoitan to whcih the Laws of Narute and of Na'eruts God enltite thme, 
a denect recepst to the opnoinis of maniknd reeriuqs that they shluod 
deralce the caesus whcih imepl them to the senoitarap. 

We hold thsee trhtus to be setnedive-fl, that all men are cretaed 
eqlau, that they are enewodd by thier Crotaer with ceiatrn unlbaneilae 
Risthg, that amnog thsee are Lief, Litreby and the puiusrt of 
Hassenipp. 
$ 

For this milestome, your fix_word() and its helpers should work correctly, so that the -rev -mixup and -noe command line flags work.

The -rand Problem

Finally it is time to add code to fix_word() to handle the -rand action. When -rand is the action passed in to fix_word(), it means that fix_word() should select one of the actions, -rev -mixup -noe, at random, and apply that action to the passed in word.

There are several reasonable ways to do this. The random module has two functions of interest here.

random.randrange(n) - returns random int 0..n-1

random.choice(lst) - returns a randomly selected item from lst

>>> random.randrange(5)
1
>>> random.randrange(5)
2
>>> random.randrange(5)
4
>>> random.randrange(5)
1
>>> 
>>> random.choice(['a', 'b', 'c'])
'b'
>>> random.choice(['a', 'b', 'c'])
'a'
>>> random.choice(['a', 'b', 'c'])
'c'
>>> random.choice(['a', 'b', 'c'])
'b'

You do not need to write Doctests for the -rand case, since testing a random algorithm is difficult (although not impossible). Your existing Doctests should at least ensure that adding the -rand code does not break the previous functionality of the -rev -mixup -noe actions.

Miletome -rand

Once -rand code is added to fix_word(), you can type it on the command line and see how it works. This produces a slightly more elegant level of messing up. Also, every time it runs, the output is different.

$ python3 mixup.py -rand independence.txt 
When in the Cours of huamn evtnes ti semoceb yrassecen for eno elpoep 
to divlosse eht political sdnab which have connctd meht htiw anehtor 
dna to emussa gnoma the porews of th arth, eht sparat and qual 
stoitan to which th swaL fo Natur dna of Na'eruts God eltitne ,meht 
a dcnt rspct to eht opinions fo dniknam seriuqer that yeht shluod 
dclar the caesus hcihw lepmi thm ot th sparation. 

We dloh thsee shturt to eb setnedive-fl, that all mn are cretaed 
,lauqe that thy are dewodne by rieht Crator htiw niatrec unlbaneilae 
Risthg, that among eseht ar Lief, Librty and the pursuit of 
.ssenippaH 
$ python3 mixup.py -rand independence.txt 
When ni the Cosrue fo human evtnes it bcoms yrassecen for on popl 
ot divlosse the lacitilop badns hcihw evah coetcennd thm htiw rehtona 
dna ot asmuse amnog th powrs of eht arth, th sparat and lauqe 
stoitan to hcihw th swaL of Natur and of Na'eruts doG ntitl ,meht 
a tneced tcepser to eht snoinipo of maniknd reeriuqs that yeht should 
dclar eht caesus whcih imepl them ot the senoitarap. 

eW dloh eseht shturt to eb setnedive-fl, that lla men ar cretaed 
eqlau, that they ar ndowd yb thier Crotaer with crtain elbaneilanu 
,sthgiR that among thsee era Lif, ytrebiL and the pursuit of 
Happinss.

Aside: Combinatoric Math

If each word has 3 choices, then 100 words has 3100 possible outputs. As a handy reference, the number of grains of sand in the known universe is approximately 2100. So you could run this again and again, confident that you are never going to see the same output twice. You can try computing 3100 in the interpreter with the expression 3 ** 100

Circular mixup.py

You know what else is a text file? 'mixup.py' is. Its text is your python code. Your program can read that text file just like any other text file. This does not hurt the file, although it does feel a little scary.

$ python3 mixup.py -mixup mixup.py
#!ne/nib/rsu/v pynoht3 

""" 
Strofnad CS601A Miuxp prcejot 
""" 
...

You can stare a the fixed output, try to work out what this Doctest was:

>>> mi'etalocohC'(pux) 
'Ceocolath'

How did the word 'relust' get in here?

relust += ch 
rerutn relust 

All Done

When your code is working well and your code style is all cleaned up, please turn in your mixup.py on Paperless as usual.

We're not yet halfway through this course, but this is pretty complete little Python program showing many CS themes: Coding algorithms with strings, indexing, loops, logic, and file processing. Decomposing a big problem into separate functions that can be built and tested independently.