From cs448b-fa16-wiki
Jump to: navigation, search


Group Members

  • Do-Hyoung Park


At first glance, the traditional football boxscore appears to tell us a lot about what transpired and how in a game of American football. It tells us how many points were scored in every quarter; it gives us rushing and passing stats for each team; it gives us individual passing, rushing, receiving and defensive statistics to help us track individual performances as well as team performance. But for even the most casual of football fans, the boxscore leaves something to be desired -- it gives us a rough idea of the individual performers' contributions to the final score and gives us a very rough idea of when the points were scored in the game, but it actually tells us precious little about the flow and style of the game.

Fundamental differences in style like whether a team likes to predominantly run the football (like Stanford) or operate in a passing-first, "Air Raid" offense (like Cal), how quickly a team likes to play on offense, how explosive a team is and the strategy with which a team makes its play-calls have grown deeper in the modern era of college football, and it's the matchups between those differing styles of play that often captivate viewers -- and yet, are poorly captured by traditional boxscores, which typically only serve to aggregate countable stats and don't tell us much about the flow of the game, and how each team's different style dictated that flow.

Even with technology becoming more prevalent in sports journalism, the area of visualization in sports other than baseball has advanced precious little over the last several years, and the major media outlets still report on games primarily through game stories (which are time-consuming and difficult to quickly glean information from), stat sheets and basic visualizations (bar graphs), which still make it difficult to quickly perceive the flow of the game, especially with respect to time.

Through this visualization, I hope to capture the spatial and temporal elements of football play-by-play data in a static visualization that focuses on visualizing the movement of the ball as time progresses, paying attention to the types of plays called and the pace at which the team progresses the ball, which makes it easier to glean play styles at a quick glance and determine how each team's play style affected the final outcome. Given that there are typically over 100 plays in a college football game, there is a lot of data to be processed -- but the hope is that, in the end, a lot of data can be visualized in a way that makes those conclusions easy to draw.

Project Progress Presentation

Project Progress Report

Literature Review

There doesn't appear to have been much work done towards visualizing football play-by-play, nor does there appear to be much of a concerted effort to fill that void, as the computer revolution has increasingly focused in the sports realm on data analysis rather than data visualization. That said, the one effort that I could find was a 2013 effort from Christopher G. Healey at North Carolina State University, which is described more as a side project than as a primary research focus. In fact, it appears to focus more on the NLP side of things, as the NFL play-by-play data used in that project came in text format, and the creators had to code a parser that converted the play-by-play text into a usable dataset, which, happily, in the case of college football, exists in .csv files on the Internet, ready to be parsed. With those parsed results, they create visualizations of NFL play-by-play that focus heavily on the individual plays (including downs and distances and plenty of annotations) in a long visualization in which the vertical axis represents time (flowing from top to bottom) and the horizontal axis represents field position.

My visualization hopes to improve on this concept by making the distinctions between play types more clear (using hues, which are readily perceived by viewers) and a clearer depiction of the pace of play using position of the bars representing the plays in temporal space, as well as a more favorable layout that makes it easier to draw bigger-picture conclusions from play-by-play data without compromising the detail of every play, and potentially even track the usage of players throughout a game.

Play-by-play visualizations for the other major American sports (baseball, hockey, basketball) are more difficult to find and create because of the more fluid nature of those games. For example, football is relatively easier to visualize because it has clearly defined time stamps and yardage values for every play, as opposed to baseball and especially hockey and basketball, where movement of the ball and players isn't necessarily constrained to an axis, and in fact, it's increasingly important to track player movement in multiple dimensions, which is difficult to accomplish with the scope of the data involved in each game (e.g. the high number of plays).

Project Plan

Thursday, November 24

(Officially switched topics)

Saturday, November 26

Locate primary college football play-by-play dataset and begin processing raw data

Wednesday, November 30

Have initial sketch-up of deliverable design made

Friday, December 2

Have base graphics for application complete

Sunday, December 4

Finish linking dataset to application

Monday, December 5

Have deliverable prototype coded and live

Wednesday, December 7

Complete and print poster

Saturday, December 10

Finalize application, complete paper

Final Deliverables