Spreadsheets and Data
- Midterm in 1.5 weeks in STLC 115
- We don't maintain the course readings (specifically, today's readings do not match)
Plan for Today
- Now, we will apply many programming concepts to spreadsheets, using computers to manipulate text-based data.
- Goal for today: learn how to manipulate and process data in Google Sheets to extract conclusions.
- Using Google Sheets
- All our in-class spreadsheets can be viewed here in a Google Drive.
- Note: to edit files, you can make a copy through the file dropdown on the top menu
Spreadsheets: An Introduction
- Used to store textual information (as opposed to images)
- Links information together in rows and columns
- Generally, rows combine to give multiple pieces of information about one object (e.g. Marvel Characters).
- Columns give the same piece of information for every object.
Variables in Spreadsheets
- Each cell (intersection of a row/column) is a variable
- There are three "types" of variables
- Text (words): left-justified
- "Boolean" values (true and false): center-justified
- Numbers (percents, decimals, and monetary amounts): right-justified
- Experiment: are the numbers always perfectly accurate? What happens if you type a really big number and subtract it from a number one bigger than it?
Data Layout in Spreadsheets
- Rows = objects (samples in data)
- Each row has multiple pieces of information
- Column = fields
- "properties" of the row
- Labelled with a header (should freeze the headers)
- Way of manipulating data
- Almost always used on columns
- Exercise: How many appearances of comic book characters have there been total?
Functions behind the scenes
- Look at every row of data
- Perform the calculation on that row of data
- Just like a for loop!
IF(test, true case, false case)
- Automatically includes an if and an else
- Note: use
= instead of
- Exercise: Let's make a new column that is true if the character is female and introduced before 2000.
SUMIFS(range, condition_range, condition)
MINIFS(range, condition_range, condition)
MAXIFS(range, condition_range, condition)
- To search for non-empty cells, use the condition
- Note: for any of the "ifs", can add a condition (like an "and") with another pair:
COUNTIFS(condition1_range, condition1, condition2_range, condition2)
- Can Google any of these functions + "Google Sheets" to find examples
- A common pattern is to try to separate the data by category, then use an IFS function on every category.
- First, use the UNIQUE function to find the categories.
- Use the IFS function on every category in UNIQUE.
- Example: Which ID has the most number of appearances?
- How many characters have a non-empty GSM column?
- What is the average number of appearances for a male comic book character? What about a female character? What about any character whose GSM column is not empty?
- Are male characters more likely to be heroes or villains? What about female characters?
- What is the most popular ID (by number of characters with that ID)?
- Which alignment of character is most likely to die?
- What else can you find out?