This program does some fun text file manipulation using strings, indexing, slices, and what have you. We specify some high-level functions, but you will need to decompose out helper functions to make the whole thing work.
All parts of HW4 are due Wed Feb 5th at 11:55pm.
This program plays on the fact that you can play around with the characters that make up the words in a text pretty severely, but your brain has an amazing ability to see the pattern through all the noise.
Download the mixup.zip and open the "mixup" folder in PyCharm to get started.
The text of each file will be split into a series of "words" using the Python string split() function. This will separate the words from each other based on whitespace characters, so the line
'pursuit of Happiness.\n' splits into the three words:
['pursuit', 'of', 'Happiness.']
The split includes any punctuation, such as the
'.' at the end of
'Happiness.' as part of the word, but that will be good enough for this algorithm.
Your code will have three different mess-ups it can do to a word. We'll call each of these a "fix" to the word. Each fix has a name like
'-rev' which is just the string used later on the command line to identify that fix.
The -rev fix of a word is made of exactly the same chars, but in reverse order, so
The -mixup fix works as follows: separate the word into 3 pieces: the 2 chars prefix at the start of the word, then the middle, then the last char. The mixup of a word is formed by concatenating the prefix, then the reverse of the middle, then the last char. If the word is too short to have separate prefix, middle, and end, the mixup leaves the word unchanged.
So for example
The -noe fix works by using the same chars as the original, but omitting all
'e' chars, upper or lower case. So
The function fix_word() computes the fixed form of a word. The function takes in two strings: an action string which is one of
'-rev' '-mixup' '-noe', and a word string like
'Hello', and it returns the result of applying that action to the word.
So for example, calls to fix_word() look like
fix_word('-rev', 'Hello') -> 'olleH' fix_word('-noe', 'keep') -> 'kp' fix_word('-mixup', 'abcdef') -> 'abedcf'
We provide the Pydoc specification for fix_word. Your challenge is filling in the code and tests to make it work. (For now, ignore the -rand action mentioned in the Pydoc.)
def fix_word(action, word): """ Given action string which should be one of: '-rev', '-mixup', '-noe', '-rand' And word string. Return the fixed form of the word with that action applied. Return the empty string if the action string is not recognized. """ pass
Write the code for fix_word(). Decompose out 3 or more helper functions to solve sub-problems for fix_word(). You do not need to think of the 3 helpers all at once. You can start writing the code for fix_word(), and you will run into sub-problems suitable for decomposition as you go. We will not be picky about what exactly goes in each helper, just so they solve a meaningful sub-problem for fix_word().
The best practice is to get the code for each helper function working and tested first via its Doctests, then test fix_word(). Do not get all the code working, and then write the Doctests. Use the Doctests as you go along.
Up until now in CS106A, most assignments have walked through the best practice sequence: the handout describes helper functions a() and b() and you test those. Then you write function c() which calls the helpers. For this project, more realistically, we just give you c(), and you need to figure out and test the helpers yourself.
The provided functions fix_file() and main() have the complete code to look at the command line and read all the words out of the file. The provided code calls your fix_word() to handle the core algorithm and prints what it returns. The code is shown below. Once your fix_word() is in working form, you can run from the command line to see how it works.
def fix_file(action, filename): """ (provided code) Given action string and filename. Loops over all the words in the file, calling fix_word() to get the fixed form of each word, and printing it. Returns nothing. """ with open(filename) as f: for line in f: words = line.split() for word in words: fixed = fix_word(action, word) print(fixed + ' ', end='') print() # print '\n' at end of each line def main(): """ (provided code) Command line form: [one of: -rev -mixup -noe -rand] filename """ args = sys.argv[1:] if len(args) == 2: fix_file(args, args)
Here is our short
Roses Are Red Violets Are Blue This Does Not Rhyme
The program takes 2 command line arguments, the action string and the filename, like
$ python3 mixup.py -rev poem.txt sesoR erA deR steloiV erA eulB sihT seoD toN emyhR $ $ python3 mixup.py -mixup poem.txt Roess Are Red Vitelos Are Blue This Does Not Rhmye $ $ python3 mixup.py -noe poem.txt Ross Ar Rd Violts Ar Blu This Dos Not Rhym
Or the poem The Eagle
$ python3 mixup.py -mixup the-eagle.txt He clpsas the crag with crekood hasdn; Clsoe to the sun in loleny lasdn, Ri'gnd with the azrue wodlr, he stsdna. The wrelknid sea betaenh him crslwa; He waehcts from his moiatnun wasll, And like a thlobrednut he fasll. --derflA, Lord Teosynnn
The file independence.txt`has the declaration of independence, which your code can work over. Notice that 'Happiness.' is the last word. The result is almost half readable.
$ python3 mixup.py -noe independence.txt Whn in th Cours of human vnts it bcoms ncssary for on popl to dissolv th political bands which hav connctd thm with anothr and to assum among th powrs of th arth, th sparat and qual station to which th Laws of Natur and of Natur's God ntitl thm, a dcnt rspct to th opinions of mankind rquirs that thy should dclar th causs which impl thm to th sparation. W hold ths truths to b slf-vidnt, that all mn ar cratd qual, that thy ar ndowd by thir Crator with crtain unalinabl Rights, that among ths ar Lif, Librty and th pursuit of Happinss. $ $ python3 mixup.py -mixup independence.txt When in the Cosrue of huamn evtnes it beemocs nerassecy for one pelpoe to divlosse the poacitill badns whcih have coetcennd them with anehtor and to asmuse amnog the porews of the eahtr, the setarape and eqaul stoitan to whcih the Laws of Narute and of Na'eruts God enltite thme, a denect recepst to the opnoinis of maniknd reeriuqs that they shluod deralce the caesus whcih imepl them to the senoitarap. We hold thsee trhtus to be setnedive-fl, that all men are cretaed eqlau, that they are enewodd by thier Crotaer with ceiatrn unlbaneilae Risthg, that amnog thsee are Lief, Litreby and the puiusrt of Hassenipp. $
For this milestome, your fix_word() and its helpers should work correctly, so that the -rev -mixup and -noe command line flags work.
Finally it is time to add code to fix_word() to handle the -rand action. When -rand is the action passed in to fix_word(), it means that fix_word() should select one of the actions, -rev -mixup -noe, at random, and apply that action to the passed in word.
There are several reasonable ways to do this. The random module has two functions of interest here.
random.randrange(n) - returns random int 0..n-1
random.choice(lst) - returns a randomly selected item from lst
>>> random.randrange(5) 1 >>> random.randrange(5) 2 >>> random.randrange(5) 4 >>> random.randrange(5) 1 >>> >>> random.choice(['a', 'b', 'c']) 'b' >>> random.choice(['a', 'b', 'c']) 'a' >>> random.choice(['a', 'b', 'c']) 'c' >>> random.choice(['a', 'b', 'c']) 'b'
You do not need to write Doctests for the -rand case, since testing a random algorithm is difficult (although not impossible). Your existing Doctests should at least ensure that adding the -rand code does not break the previous functionality of the -rev -mixup -noe actions.
Once -rand code is added to fix_word(), you can type it on the command line and see how it works. This produces a slightly more elegant level of messing up. Also, every time it runs, the output is different.
$ python3 mixup.py -rand independence.txt When in the Cours of huamn evtnes ti semoceb yrassecen for eno elpoep to divlosse eht political sdnab which have connctd meht htiw anehtor dna to emussa gnoma the porews of th arth, eht sparat and qual stoitan to which th swaL fo Natur dna of Na'eruts God eltitne ,meht a dcnt rspct to eht opinions fo dniknam seriuqer that yeht shluod dclar the caesus hcihw lepmi thm ot th sparation. We dloh thsee shturt to eb setnedive-fl, that all mn are cretaed ,lauqe that thy are dewodne by rieht Crator htiw niatrec unlbaneilae Risthg, that among eseht ar Lief, Librty and the pursuit of .ssenippaH $ python3 mixup.py -rand independence.txt When ni the Cosrue fo human evtnes it bcoms yrassecen for on popl ot divlosse the lacitilop badns hcihw evah coetcennd thm htiw rehtona dna ot asmuse amnog th powrs of eht arth, th sparat and lauqe stoitan to hcihw th swaL of Natur and of Na'eruts doG ntitl ,meht a tneced tcepser to eht snoinipo of maniknd reeriuqs that yeht should dclar eht caesus whcih imepl them ot the senoitarap. eW dloh eseht shturt to eb setnedive-fl, that lla men ar cretaed eqlau, that they ar ndowd yb thier Crotaer with crtain elbaneilanu ,sthgiR that among thsee era Lif, ytrebiL and the pursuit of Happinss.
If each word has 3 choices, then 100 words has 3100 possible outputs. As a handy reference, the number of grains of sand in the known universe is approximately 2100. So you could run this again and again, confident that you are never going to see the same output twice. You can try computing 3100 in the interpreter with the expression
3 ** 100
You know what else is a text file? 'mixup.py' is. Its text is your python code. Your program can read that text file just like any other text file. This does not hurt the file, although it does feel a little scary.
$ python3 mixup.py -mixup mixup.py #!ne/nib/rsu/v pynoht3 """ Strofnad CS601A Miuxp prcejot """ ...
You can stare a the fixed output, try to work out what this Doctest was:
>>> mi'etalocohC'(pux) 'Ceocolath'
How did the word 'relust' get in here?
relust += ch rerutn relust
When your code is working well and your code style is all cleaned up, please turn in your mixup.py on Paperless as usual.
We're not yet halfway through this course, but this is pretty complete little Python program showing many CS themes: Coding algorithms with strings, indexing, loops, logic, and file processing. Decomposing a big problem into separate functions that can be built and tested independently.