# Bechdel and Other Tests

November 14th, 2021

Written by Erin McCoy, Katie Creel, and Juliette Woodrow. Inspired by and data from “The Next Bechdel Test”

## List Comprehensions

In this problem, we'll go over how list comprehensions can be used to solve hard problems in satisfyingly few lines of code.

Here is a problem you can start with. This should be solved with one line of code:

• Construct a list of all the numbers 1-1000 inclusive
• Construct a list of all multiples of 2 from 0 to 2000 inclusive

Now let's try out some more! Given a list of numbers lst, write one-line list comprehensions to do the following:

• Produce a list of the absolute difference between each of the numbers in lst and 10. Recall that the abs function returns the absolute value of a number.
• Produce a list of the absolute difference between the numbers in lst that are between 10 and 15 inclusive, and 10. Recall that the abs function returns the absolute value of a number. (We haven't covered using if in list comprehensions in class yet, so this is just a challenge if you want to look up how to do it.)

Now, suppose we have a list of pairs such as this one:

        pairs = [('zzz', 3), ('bbb', 10), ('ccc', 4), ('aaa', 6)]

Write one-line list comprehensions to do the following:
• Produce a list of the second elements of each tuple in pairs.
• Produce a list of the first elements of each tuple in pairs, except that the first character of each of these elements is made uppercase. You can assume that each such element in the tuples has at least one character.

## Short History of Bechdel Test

Named for Alison Bechdel, the Bechdel test seeks to analyze the representation of women in fictional media (books, movies, TV shows, etc.). To pass the Bechdel test, the piece of media must:

1. Feature two women
2. Who have a conversation
3. This conversation is about something other than a man

As of 2021, 58% of movies pass the Bechdel Test, and this number has steadily increased over the past century.

The popularity of the Bechdel test has inspired others to create their own named tests measuring other inequities in media. Here are a few examples:

• A movie passes the Uphold Test if 50% of on set crew is female.
• A movie passes the Waithe Test if it features a Black woman in a position of power in a healthy relationship
• A movie passes the Villalobos test if the film has a Latina lead, and the lead or another Latina character is shown as professional or college educated, speaks in unaccented English, and is not sexualized

## Introduction

In this section, you will use your nested data structure and graphics skills to build your very own data visualization application. You will analyze a dataset consisting of >50 movies and which tests the movie passes and which it fails. Your goal is to build a piece of software that helps you investigate representation and bias in movies.

Identifying issues of bias and representation in datasets is a natural extension of many of the interesting ethical topics that we have talked about in CS106A so far this quarter. As we've mentioned before, our hope is that by introducing these sorts of topics early in computer science education, we can help the next generation of software developers and computer science researchers—which could include you!—be more mindful of the potential social implications of their work and to use their powerful tools to help others.

## Section Overview

The rest of this handout will be broken into several sections.

• Milestone 1: Getting started with matplotlib. In this milestone you are going to make sure you have everything installed properly in order to complete the assignment.To get started, download the zip file here. Unzip and open this folder in pycharm as normal. Make sure you have the matplotlib package installed, which allows you to quickly and easily draw graphs. If you haven't done this already, open the Terminal in Pycharm and run the following command (If you have a Windows computer, substitute py for python3):
      python3 -m pip install matplotlib

• Milestone 2: Load in the data. Open nextBechdel_allTests.txt and look at the structure. The first line tells us what each of the columns means. Each line contains a movie name, followed by a series of 0’s and 1’s that represent if a movie has passed (0) a test, or failed (1). Open the file in read_file() and store the data in an appropriate data structure.
• Milestone 3: Create Plots for Individual Movies. You will now write the function ​​plot_tests_per_movie() that takes in a list of movies and outputs a chart showing how many tests each movie has passed and failed. The code for plotting is given to you, so your focus should be on outputting the right number of tests passed and tests failed for each movie.
• Milestone 4: Create Plots for All Tests. You will now write the function plot_all_tests() that plots the total number of movies passed and number of movies failed for each test. . The code for plotting is given to you, so your focus should be on outputting the right number of movies passed and movies failed for each test.
• Milestone 5 (Optional): Alternative Data Visualizations. Take a look at other methods of data visualization offered by matplotlib. What other methods could we use to display the data? How would this change our interpretation of the data? Bonus: Try to implement another method for data visualization in your code and compare to the starter code.
• Milestone 6: Identifying bias in the dataset and other interesting data science ethics questions. In this milestone you are going to think critically about the Bechdel and Other Tests dataset and reflect on the relevant ethical issues that computer scientists should consider when working on data science problems. You will discuss the following questions as a group.
Many critical issues can arise when working with datasets that have real data that is based on data describing real people. Please take the time to seriously think about each of the questions presented below and discuss.
1. Data analysis and visualization can be powerful tools for social change (as they have been for a long time). A data visualization can make an implicit argument as to what ought to be changed or inform others about a problem they had not previously been aware of with the hope that they will be moved to change it. What implicit arguments are individual tests making? What values do the general project of creating these tests express?
2. The creators of these datasets say in a footnote, "Over 50,000 people were credited on these movies. As such, it would have been a big lift to check each person’s gender individually, so instead we approached the problem algorithmically. First, we pulled all the crew members’ names from IMDb. Then, we used Genderize.io to calculate the probability that a given first name was male or female. We counted crew members as men only if they had names that belong to men at least 90 percent of the time, which means a whole lot of people with slightly ambiguous names got sorted into one of our “we’re not sure” categories. Nevertheless, even using this forgiving threshold that almost certainly undercounts the number of men involved in a film, a lot movies still failed our tests — counting up names like John, Frank and Jack showed that there were more men than our tests would allow even before we had to consider the Jaimes, Taylors and Caseys of the world, who were probably-but-not-certainly male." Assigning gender based on names alone is unfortunately common in data. What are (at least) two possible problems with this, including one not mentioned in this quotation? What approaches might you take?
3. As computer scientists, we sometimes take for granted that our data is a complete and accurate reflection of the world. But real datasets often have errors, missing information, and biases. In addition to the problems highlighted in question B, what else might your datasets be missing? What additional data would be your highest priority to gather to improve the datasets?
4. In 106A, we create clearly-defined assignments for you to work on. We tell you what to do at each step and what counts as success at the end. In other words, we are formulating problems for you to solve. However, choosing which problems to solve is one of the choices that express your values as a computer scientist. Formulate a different test related to the topics of bias in the media and a different modality of data visualization for this dataset (ex, pi chart, line graph, scatter plot, .), ideally one you could solve with your current skills.

## Test Appendix

Definitions from FiveThirtyEight

Bechdel

• Movie passes if:
• Two named female characters
• Said characters have at least one conversation that is not about a man

Pierce

• Movie passes if:
• There’s a female character who is a protagonist or antagonist with her own story
• The female lead has dimension and exists authentically with needs and desires that she pursues through dramatic action
• And the audience can empathize with or understand the female lead’s desires and actions

Landau

• Movie fails if:
• A primary female character ends up dead
• A primary female character ends up pregnant
• Or a primary female character causes a plot problem for a male protagonist

Feldman

• A movie passes with a score of five or higher:
• 2 points for a female writer or director
• 1 point for a female composer or director of photography
• 1 point for three female producers or three female department heads
• 1 point for a crew that’s 50 percent women
• 2 points if there’s a female protagonist who determines story outcomes
• 2 points if no female characters were victimized, stereotyped or sexualized
• And 1 point if a sex scene shows foreplay before consummation, or if the female characters initiate or reciprocate sexual advances

Villareal

• Movie fails if:
• A lead female character is introduced as one of three common stereotypes in her first scene: as sexualized; as hardened, expressionless or soulless; or as a matriarch (tired, older or overworked)
• But a failing movie can redeem itself and pass if the lead female character is later shown to be three or more of the following:
• Someone with a career where she is in a position of authority or power
• A mother
• Someone who’s reckless or makes bad decisions
• Someone who is sexual or chooses a sexual identity of her own

Hagen

• Movie passes if:
• Half of one-scene roles go to women
• And the first crowd scene features at least 50 percent women

Ko

• Movie passes if:
• There’s a non-white, female-identifying person in the film
• Who speaks in five or more scenes
• And speaks English

Villarobos

• Movie passes if:
• The film has a Latina lead
• And the lead or another Latina character is shown as professional or college educated, speaks in unaccented English, and is not sexualized

Waithe

• Movie passes if:
• There’s a Black woman in the work
• Who’s in a position of power
• And she’s in a healthy relationship

Koeze-Dottle

• Movie passes if:
• The supporting cast is 50 percent women

Uphold

• Movie passes if:
• The on-set crew is 50 percent women

White

• Movie passes if:
• Half of the department heads are women
• Half the members of each department are women
• And half the crew members are women

Reese-Davies

• Movie passes if:
• Every department has two or more women