Your presentation to the class should include:

Problem Description and Motivation:
Many small multiple and linked view systems have a multitude of knobs that can be turned and even more views of the same data. The question is whether the inundation of information is productive, for users when they have questions in mind. The hypothesis is that while different users have different questions they want answered, there are sets of users that have similar questions. Is there a way to understand what questions a user wants answered from a visualization, and personalize an experience that allows the same flexibility of looking at the data sans the controls that she is not interested in.

I look specifically at the Foodborne Outbreak Online Database, which has record of foodborne illnesses from 1998 to 2014. From cursory user studies, I have identified a few sets of users: doctors who are interested in which diseases are most prevalent in the area they are practicing in; consumers who want to know what foods are safe; federal policymakers to understand which states are more at risk; food producers who are interested in the profitability of particular goods. The Center for Disease Control (CDC) has developed a very flexible tool, FOOD tool. The downside is that there are many controls in the tool, and there are few personal computers where the whole visualization can be appreciated at once. Is it possible to show groups of users exactly what they want, and nothing more?

Scope:
Milestone 1: Create a subset of functionality of CDC's tool that captures all the functionality necessary for dctors, consumers, federal policymakers, and food producers. 
Milestone 2: Create a short survey before the user can look at the visualization, about how the user self-identifies. The responses will be converted to feature vectors for users.
Milestone 3: Create an activity recording system that allows all clicks on the visualization to be attributed to a particular user's feature vector. Data will be written asynchronously to a log file.
Milestone 4: Put the website on a public url and have different users start to use the website.
Milestone 5: Create a machine learning algorithm that given a user's feature vector, determines the set of views to display to the user. Test with fake activity recording system.
Milestone 6: Create a second website (different public url) that learns from the log file produced in (Milestone 3). Satisfaction test if time permits.


1-2 pieces of most relevant prior work and discuss how your project is different.

Current Progress:
Your current progress. Use sketches, storyboards, and/or prototype images to communicate your ideas. It is a good idea to highlight issues of design or implementation for which you would like to get feedback from the class. End your talk with a single slide containing questions you'd like feedback on.

Progress Report:
Literature Review. A background survey of related work and a full list of references.

Project Plan. A list of milestones breaking the project into smaller chunks and a description of what each person in the group will work on.






====================================================================================
====================================================================================
====================================================================================

==Proposal 1==
===Group Members===
*Dan Guo
===Description===
In the upcoming 2016 Olympics in Rio, Michael Phelps will be swimming his last (he claims) Olympic games. Already the most storied Olympian with 18 Olympic gold medals out of 22 in total, Phelps is not done yet. Phelps started his swimming career in 1992 at the age of 7. For the next 23 years, save for some retirement trial periods, Phelps has been swimming. I aim to celebrate such a colored career through data visualization.

I want the upcoming media attention surrounding Phelps' retirement to understand the ups and downs of Phelps' career. Which events did he dedicated most of his energy to? Where did he exhibit utter dominance and where was he simply great? How did his performance change before and after his 2012 retirement? Were there plateaus in his career?

I propose a visualization tool that allows viewers to go see at a glance the arc of his swimming career. The website http://www.usaswimming.org/DesktopDefault.aspx?TabId=1470 has relevant swimmer data.

The data is unique for several reasons:
*Macro-Micro Perspective: The audience is very interested with the overall arc of Phelps career as well as his individual performances (in the 2008 Beijings Games, for example).
*Quasi-Regularity: Phelps has been to every Olympics since 2004. Phelps has also participated in the World Championships held once every year. 
*Partition in Twos: There are many different axes that partition the data into twos. There are international games (Olympic Games, World Championships, Pan American Games, etc.) and domestic games within the US. There are short course yard events and long course meter events, these being two different pools that competition is held in. There is Phelps post 2012 retirement and Phelps after his reintroduction in 2014; some claim Phelps had different motivations and ambitions before and after retirement.
*Separate Data with Similarities: Individual event times should not be compared. This is because different events (either because of distance, pool type, or stroke) require different amounts of time to complete. However, there are possible confounding factors that may affect Phelps' times in aggregate, over multiple events. I suspect this will be especially prominent in swim data because swimmers usually swim multiple events in the span of a short number of days, by nature of swimming competition schedules. For example, Phelps had to swim many heats for each of his 8 events for the 2008 Beijing Olympics.
*Swim Events with Qualitative Life Events: Phelps has had a string of out-of-pool incidents, with no connotation to the word. Phelps has turned pro, achieved numerous endorsement deals, ran into the law, etc. within the time span of the dataset.

An interesting design problem is creating a visualization such that Phelps' career can be on full display without compromising specific milestones in his career - setting world records, Olympic games, etc. I also find challenging a harmonious integration of Phelps' competition data and out-of-pool incidents. While the focus of the visualization will be on Phelps' career, the things he does outside of swimming may inform us on trends in his swimming game. Finally, I wish to create a visualization that will present Phelps' swim data in a way to not invite furious comparisons (such as 100 Freestyle events in LCM vs SCY) but also allow for his swimming performance as a whole to be judged (how has Phelps done in 100 Freestyle over the past year?).

===Preliminary Work===
*I have scraped all national swim data from http://www.usaswimming.org/DesktopDefault.aspx?TabId=1470. There are 1000 swim events from Michael Phelps, between the years of 1996 and 2016.
*I have emailed ASU athletics (where Coach Bob Bowman coaches), Michael Phelps' agent, and Coach Bob Bowman himself for possible access to all Phelps swim data. This would be a boon as my current data is only USA swim events. Many of the most famous races by Phelps were in the 2008 Olympics.
*I have setup an iPython notebook for discovering the most interesting trends. I have already committed a swim rookie mistake of mixing LCM and SCY events. Users will be surprised and/or confused if this occurs in the final visualization.
*I have conducted a user study from an ex-swimmer to understand what they are most interested in Phelps' career.



===Breakdown===
*Scrape national swim data from http://www.usaswimming.org/DesktopDefault.aspx?TabId=1470
*Scrape international swim data.
*Use iPython notebook to explore and identify the most interesting trends in Phelps' career. Rapid prototyping of charts.
*User test tool with iPython created charts.
*Code the interactive tool in D3.
*Interaction study / user test D3 tool.
*Iterate.

===Data===
http://www.usaswimming.org/DesktopDefault.aspx?TabId=1470



===Sources===
http://www.biography.com/people/michael-phelps-345192#related-video-gallery
http://www.teamusa.org/News/2012/May/07/Michael-Phelps-timeline-May-7-2012
https://www.timetoast.com/timelines/the-story-of-michael-phelps

==Proposal 2==

===Group Members===
*Dan Guo
===Description===

Foodborne outbreaks are disease outbreaks that occur from eating contaminated foods. The etiology of these diseases are varied, ranging from bacteria to viruses to parasites. The methods of contamination are equally diverse, ranging from improperly washed hands, runoff into produce fields, and undercooked meats; my Biology professor even recounted a story of his discovering an E. Coli outbreak that occurred because of contaminated water sprayers in the fresh produce section of a particular grocery store. Foodborne illnesses affect millions of people in the US, with hospitalization and deaths in the thousands ([http://www.cdc.gov/foodsafety/fdoss/faq/index.html#seven CDC]).

The Center for Disease Control (CDC) has collected data regarding foodborne outbreaks from 1998 to 2014. It has created a Foodborne Outbreak Online Database (FOOD tool) that can be used to analyze the foodborne outbreaks data. Playing around with the tool for about an hour, I ended up not being entirely satisfied with it. There are particular things that the tool optimizes for, but frankly I do not find those things most helpful or interesting. The FOOD tool is great at taking a very maco-perspective on the data, grouping diseases to be analyzed in bulk. What I am more interested in is how different diseases show up in various states, throughout various time periods, etc. I feel a strong desire to erect strict categories between diseases and look at them comparatively rather than aggregately. While the FOOD tool creates all charts in terms of counts (of hospitalizations, illnesses, deaths), I propose a visualization that makes clear distinction of diseases. This is because different diseases present different responses and different pressures in hospitals and their treatments are distinct. Note that the FOOD tool does allow filtering to particular diseases, but the comparison between diseases feels like an afterthought (I actually did not discover it until much later). Furthermore, counts do not accurately reflect differences between states in a country; states have differing populations and hence making a statement of the number of illness events among states is misleading. Finally, the visualization is macro-perspective in another sense that the data revolves the whole country of US, with state filtering as a side feature.

I wish to create a visualization tool that answers the big question - what diseases may I be contracting when I engage with the US food system? The US public is entitled to know this, because the public really has no other choices, save for farmers who grow and prepare their own foods. This question is layered and we may ask what are the diseases and how dangerous are they? What are the diseases and how have the diseases evolved over the years? What are the diseases and which foods are the most common vectors? And with people of heightened sensitivity to specific diseases, how rampant are particular diseases in my home state? These are all important questions intimately related to the larger question of what diseases exist in the US food system. My visualization seeks to provide insights.

===Preliminary Work===
*I evaluated the existing FOOD tool visualization.
*I downloaded the foodborne outbreaks data set and loaded in iPython for rapid prototyping.
*I ran one user study and sharpened the user study format. I will be interviewing people who are familiar with spreadsheets and give them a small subset of the data with all column names. I will ask them what are the most important things they would like to understand from the data. Next, I will walk through the FOOD tool and again ask them what they would like to see. 
*I reached out to Dr. Milana Trounce, the Director of Stanford BioSecurity and Infectious Disease Disaster Response. I would like to interview her to understand what are the most important questions that she would like answers to regarding foodborne diseases. As well, I am curious if there are particularly things in the FOOD dataset that she would like the public to be aware of. 


===Breakdown===
*Evaluate FOOD tool visualization.
*Scrape FOOD data.
*Erect iPython pipeline for data display, for rapid prototyping.
*User studies to figure out what are the most important questions surrounding foodborne diseases.
*Decide which comparisons are most useful.
**Time series of Hospitalizations, Illnesses, Deaths across states.
**Time series of Hospitalizations, Illnesses, Deaths across diseases.
**Within state, disease distribution.
**Within disease, state distribution.
*Rapid prototype in iPython, in tight loop with user studies.
*Implement in D3.

===Data===

===Sources===




