Homework 5 - Parse Mystery

For this project you have a text file which has, hidden within it, data that identifies a mystery object on campus. Your mission is to decipher the file to figure out what the campus object is.

All parts of HW5 are due Tue Feb 11th at 11:55 pm.

Homework 5 Warmups

To get started we have some warmup functions on nested loops and parsing.

> HW5 Warmups

Install Pillow

This project will use the "Pillow" library. A library is a body of already written code which you import and use, and in this case the Pillow library contains code to manipulate images. We'll use libraries with more detail later. For this assignment you need to install Pillow on your machine so your code can use it.

Open a "terminal" window - the same type of window where you type "python3 foo.py" to run programs. The easiest way to get a Terminal is to use the terminal tab at the lower-left within PyCharm. Type the following command (shown in bold as usual). Note that "Pillow" starts with an uppercase P. (On windows, "python" or "py").

$ python3 -m pip install Pillow
..prints stuff...
Successfully installed Pillow-5.4.1

To test that Pillow is working, type the following command to a terminal inside your parse-mystery folder. This runs the "simpleimage.py" code included in the folder. When run like this, simpleimage.py creates and displays a big yellow rectangle.

# (inside your parse-mystery folder)
$ python3 simpleimage.py
# yellow rectangle appears

If you cannot get Pillow installed successfully, you can still write most of the code for this project. You will need to comment out the line "from simpleimage import SimpleImage" near the top of the starter file, and post to Piazza and Brahm will help you out.

Frightful Numbers

Download the parse-mystery.zip to get started.

For this project, you will need to make sense of text lines like the following which have numbers hidden in them. The numbers are frightfully messed up (these are from 480k.txt):

800!)176^b006$(46$*#63Z*16$*06$z5^
47$ 42^ 18$bj55
166^!56
77$b 51*25 b35 44*35
*32 j46@ 65^!05$#Z90^(32 x
wait there's no digits in this one at all!
31*32^#34)68^ 60!38$ 74 b148^*60 53#38 c21  28*)

Each number is represented by some text starting with a digit, following these rules:

1. The numbers are non-negative integers, like 123

2. The first char of each number is always a digit.

3. If a '$' char appears immediately after a number, its digits are backwards. So '211$' is the number 112.

4. If a '^' char appears immediately after a number, it's as if that number is not present in the data, and it is omitted from the output. So for example '176^' would be omitted.

5. The numbers are separated from each other by random chars which are not '^' or '$' or digits.

Here is the first example line:

800!)176^b006$(46$*#63Z*16$*06$z5^

What are the numbers in there?

[800, 600, 64, 63, 61, 60]

This looks rather impossible at first. But with some Python and little decomposition, you can boil that mess down to some nice clean numbers.

a. extract_num(s, begin, end)

The trickiest part of this problem is this: we have found one or more digits in a string, and we need to extract that number properly or decide to omit it. The extract_num() function is decomposed out to focus on that sub-problem.

def extract_num(s, begin, end):
    """
    Given string s, and "begin" is the index of
    the first of one or more digits, and "end" is the
    index one beyond the last digit.
    Parse out and return the int value
    of the number, accounting for possible '$' and '^'.
    Return -1 if the number should be skipped.
    >>> extract_num('xx123$', 2, 5)
    321
    """

One test is provided. Add tests so there are at least five tests. Test varying numbers of digits, with and without '^' and '$'. It is easier to build and perfect extract_num() in isolation here vs. in the midst of processing a whole file.

b. parse_line(s)

Given a string, such as a line from a data file, extract all the numbers as described above from the line and return them in order in a list. If the line contains no numbers, return the empty list.

One test is provided. Add tests so there are at least five tests. The helper function extract_num() has its own tests, so here you can focus on pulling out a series of numbers. You may find that you need to go back and debug extract_num() more if a flaw is exposed at this stage.

When you are feeling brave, add a test made from the first example:

800!)176^b006$(46$*#63Z*16$*06$z5^

If you have a reverse(s) function from HW4, you can paste it in to this file and use it here as a helper. Later we'll see how to share functions across files, but for now just paste the helper function in. Any helper functions, such as reverse(), should have Pydoc and Doctests.

c. parse_file(filename)

Read all the lines out of the given file. Extract all the numbers out of each line, gathering all the numbers together in one giant list, maintaining their order from the file.

Here is the contents of the "3lines.txt" file, showing a few lines of sample data.

800!)176^b006$(46$*#63Z*16$*06$5^
47$ 42^ 18$bj55
166^!56

One Doctest is provided for parse_file() that references this file. Doctests can refer to test data files in the same directory as the source code in this way.

d. main() -nums Test

The main() code is provided for this project. If the command line is "-nums file.txt", main() calls your parse_file() function and prints the list returned. The file 3lines.txt and 10lines.txt have some test lines, so this is another way to check that the numbers look reasonable.

$ python3 parse-mystery.py -nums 3lines.txt
[800, 600, 64, 63, 61, 60, 74, 81, 55, ...

Programming strategy aside: when your program is, say, 50% built, it's helpful that you can run that 50% to see some sort of output from it, confirming that what's built so far works. You don't want to write 100% of the code, and only then start running it.

Grayscale Pixels

The following fact is needed for this puzzle: For each pixel in a grayscale image, the red, green, and blue values in each pixel must be equal. So for example a pixel might have red=50 green=50 blue=50 to be dark gray, or red=212 green=212 blue=212 to be light gray.

Mystery Data

The mystery campus object lives among the trees below lake Lagunita. It is somehow described in the file "480k.txt" which contains 480002 ints. The beginning of the list of ints looks like this (this is just the list your parse_file() returns):

[800, 600, 64, 63, 61, 60, 74, 81, 55,...

What could these 480002 numbers represent?

It's a grayscale image! The first number is the width. The second number is the height. The remaining 480000 (800 * 600 = 480000) numbers are the grayscale values, one number per pixel. The pixel values are laid out in the 1-dimensional list. After the width and height are all the values of the top y=0 row, then all the values for y=1, then all the values for y=2 and so on in 1 big list:

[width, height, y=0x=0, y=0x=1, y=0x=2, .... y=0x=799, y=1x=0, y=1x=2, y=1x=3 ... y=599x=799]

This ordering of the pixels is the same, standard order that the range/y/x loops visit the pixels of an image.

e. solve_mystery()

Parse all the numbers from the given filename. Figure out the width and height of the desired image. The code to create a blank image and loop over it is the same as in week 2 - it's included in the starter file.

Starter code in solve_mystery():

    width = ???  # determine proper width and height values
    height = ???
    image = SimpleImage.blank(width, height)
    for y in range(image.height):
        for x in range(image.width):
            pixel = image.get_pixel(x, y)
            # use pixel.red etc. in here

    # This displays image on screen
    image.show()

Edit this code to set the pixel.red etc. of every pixel in the image using the parsed out values, using the provide range/y/x loops. This is a bit of a code puzzle. The range/y/x loops are going over the 2-d image. Each time through that loop, you want to grab the right parsed value. Think about which value you want the first time the loop runs, the second time, the third time. etc. to work out a pattern.

The provided main() is set up to call your solve_mystery() function when there is just 1 command line argument, like this:

$ python3 parse-mystery.py 480k.txt

The 480k.txt thing lives a bit downhill from Lake Lagunita. The 100k.txt shows a famous thing on campus (and the file is smaller, if the large files take too long on your computer). The 600k.txt file is the biggest, showing a famous thing in California. For a smaller test, the 60k.txt shows a pair of friends. Your code should be able to solve all of these. The little 3lines.txt and 10lines.txt test files do not contain images.

It's a memorable moment when, after all your work do dig the data out those files, a real image pops up on screen.

Protip: type the first few letters of a filename, then hit tab to auto-complete it. Hit tab a couple times, and it will list all the candidate filenames that match what you have typed so far.

When your code is cleaned up with good style and solves these puzzles correctly, please turn in your parse-mystery.py file on Paperless.