Homework 5b Stripey Data

This is relatively small project, but with memorable output. For this project you take in a data series of floating point numbers and create visualization of the datas. You have seen many data visualizations where the height of a line or rectangle increases proportionately to represent a value. For this project, uses position and computed-color to reflect the underlying data.

Download the stripey-data.zip folder to get started.

Frac Data

We have several interesting data sets for this project, all in what we'll call "frac" format, geared for this computed-color approach. We'll call each number in the data set a "frac", and the fracs have all been scaled to be in the range 1.0 .. -1.0 inclusive.

draw_stripes() Function

For this project, all you need to do is write code for the draw_stripes() function. The code for main() is already done. The starter code creates a canvas of the requested size. The function takes in a fracs list [0.5, -0.7, 0.23, ...] and draw a series of colored rectangles on the canvas, one for each frac number, as a way of visualizing that data.

Int divide the canvas width by the number of fracs to figure out how wide each rectangle should be, and you can use this number to compute the x,y of the upper left of each rectangle.

Red Color

We have the constants BASE and DELTA to feed into the color math as follows.

BASE = 127
DELTA = 127

For each rectangle, we want to figure out a color based on the frac value. For this milestone, set blue and green to the value BASE, and compute red.

We want the red value to vary so that when frac is low, red is low, and when frac is high, red is high. Specifically:

frac is  1.0  -> red is 254
frac is  0.0  -> red is 127
frac is -1.0  -> red is 0

alt: frac high red high, frac low red low

We're using 254 as the maximum here, so the math is symmetric around 127. The constant DELTA = 127 represents the max possible change, so you can multiply frac times DELTA to figure out how much red differs from BASE.

Use the canvas.fill_rect() function, where the first 4 numbers give the location and size of the rectangle to draw.

canvas.fill_rect(x, y, rect_width, rect_height, color=(r, g, b))

The color= parameter, has a novel feature where it can be assigned to three numbers like this:

canvas.fill_rect(0, 0, 10, 200, color=(200, 127, 127))

The syntax (200, 127, 127) is a "tuple" in Python, which we'll use in more detail later in CS106A. For now, the tuple simply contains three numbers within parenthesis - a red number, a green number, and a blue number, each must be in the range 0..255. The fill_rect() function uses those RGB color numbers to set the color of the rectangle.

For this milestone, the blue and green numbers can stay at BASE, while you compute a red number based on the frac.

Milestone 1 - Red

For testing we have the following data-test.txt file of frac values from 1.0 to -1.0. The first line of the file is the data set title, and the rest is the floating point values. The provided function read_fracs() reads this text file format into a list.

Test Data Red to Blue
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1.0

Try running your program with this data. You should see 21 rectangles. The far left rectangle should be quite red. The middle rectangle should be grayish. Towards the right the rectangles are an indistinct dusky blue/green color. We'll fix them in the next milestone.

$ python3 stripey-data.py data-test.txt

alt:red high at the left

The main() code is provided for this project. It allows you to optionally supply a width and height number of the command line to change the canvas size:

$ python3 stripey-data.py data-test.txt 1200 500

Programming aside: here we've hand-constructed this small data-test.txt file with a very clear pattern in it for testing purposes. If we jumped straight to real data with all its complexities, it would be hard to spot if the algorithm was correct or not. This is also an advantage of text files as a data format - you can just go to your editor and type the data up.

Milestone 2 - Blue

Now add color computation for blue as a function of each frac value. We want blue to move in the opposite direction as red: when frac is high, blue is low; when frac is low, blue is high:

frac is  1.0  -> blue is 0
frac is  0.0  -> blue is 127
frac is -1.0  -> blue is 254

alt: frac high = red high, blue low

The result is that when frac is high, we get redish stripes since red is high and blue is low. When frac is low, we get blueish stripes, since blue is high and red is low. When frac is near 0.0, the color is grayish, since red, green, and blue will all be around 127 which makes gray.

After drawing the rectangles, call draw_string() like this to draw the title on top of the colored rectangles.

canvas.draw_string(5, 5, title, color='white')

With the blue color computation in, try the data-test.txt data file again.

alt:red high at the left, blue high at the right

At the left it's bright red, at the right it's bright blue. In the middle it should be grayish.

Real Data

You've got this computed-color visualization machinery. Let's take it for a spin.

Human Progress is the Trend

Here is an idea to keep in mind when looking at these data sets. The news is depressing. Literally depressing. We could say that the job of journalism is shining a light on the injustice and stupidity in the world to inform people of needed change. That said, it is a fact that most measures of human well being are much improved over the decades. Pick a data set - child mortality, starvation, illiteracy ..they are all getting much better over the last 50 years.

Data Sets

data-child-mortality.txt - global percentage of children dying before the age of five. The amount of misery in the left part of this graph is amazing. I trimmed the data to start at 1960 as that's when there is data for every year. Data from https://ourworldindata.org/child-mortality

We'll include this graph in the assignment handout, but the rest you should bring to life through your own code. Not many people can say they've built their own numeric-color visualization like this.

alt:red at the left trending to blue at the right

data-illiteracy.txt - global illiteracy rate. https://data.unicef.org/topic/education/literacy/

data-homicides.txt - US homicide data. This data has a dramatic bump in it. There was a significant increase in crime in the US from the late 1970's to the early 1990's, peaking in 1991. Since then there has been an equally dramatic decline in crime so now it is historically low. There are many theories about this! Data from https://www.kaggle.com/marshallproject/crime-rates/version/1 although in retrospect there were perhaps simpler sources for this data.

Side question: Art reflects life. There must be a movie or work of art that embodies the themes of decay and criminality of that historic 1991 crime peak. Ask around at your next party. Nominees: Escape from New York, Trainspotting, The Wire.

data-climate.txt - this data is quite dramatic. We are a clever and resilient species, and I'm sure we will figure this out eventually. This one looks best with a width of 1200 or more.

Climate scientists measure temperature again a "0" point which is recent global average temperature, and then each year is measured as the "anomaly" from that average, so -1.5 C for a cold year or +1.5 C for a hot year. I scaled the climate data to fit in our +1.0..-1.0 format, but kept the 0 point intact, so when you see gray years, those were around the long-term average temperature, with red years above average and blue years below. This data set is from https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/

Terminal protip: The data file names all begin with data-, so in the terminal you can type data-, hit the tab key, and the auto-complete will show you the candidate file names. Then type in 1 more letter and hit tab again to complete the filename. That's how data professionals do it!

There will be some whitespace to the right of your graph since the 140 or so rectangles don't divide into the canvas width evenly, leaving some white space at the right.

And You're Done

That's a neat bit of real world data visualized with applied math - one spatial dimension combined with one dimension in color. Your program should be able to draw any of these frac data sets at various sizes. When it's all cleaned up and correct, please turn in your stripey-data.py file on Paperless as usual.