Assignment 2: Exploratory Data Analysis

From cs448b-fa16-wiki
Jump to: navigation, search

Assignment Due: Oct 17, 2016

Gasprices.png

A wide variety of digital tools have been designed to help users visually explore data sets and confirm or disconfirm hypotheses about the data. The task in this assignment is to use existing software tools to formulate and answer a series of specific questions about a data set of your choice. After answering the questions you should create a final visualization that is designed to present the answer to your question to others. You should maintain a web notebook that documents all the questions you asked and the steps you performed from start to finish. The goal of this assignment is not to develop a new visualization tool, but to understand better the process of exploring data using off-the-shelf visualization tools. Documenting the data analysis process you went through is the main pedagogical goal of the assignment and more important than the design of the final visualization.

Here is one way to start.

  • Step 1. Pick a domain that you are interested in.
    Some good possibilities might be the physical properties of chemical elements, the types of stars, or the human genome. Feel free to use an example from your own research, but do not pick an example that you already have created visualizations for.
  • Step 2. Pose an initial question that you would like to answer.
    For example: Is there a relationship between melting point and atomic number? Are the brightness and color of stars correlated? Are there different patterns of nucleotides in different regions in human DNA?
  • Step 3. Assess the fitness of the data for answering your question.
    Inspect the data--it is invariably helpful to first look at the raw values. Does the data seem appropriate for answering your question? If not, you may need to start the process over. If so, does the data need to be reformatted or cleaned prior to analysis? Perform any steps necessary to get the data into shape prior to visual analysis.

You will need to iterate through these steps a few times. It may be challenging to find interesting questions and a dataset that has the information that you need to answer those questions.

Exploratory Analysis Process

After you have an initial question and a dataset, construct a visualization that provides an answer to your question. As you construct the visualization you will find that your question evolves - often it will become more specific. Keep track of his evolution and the other questions that occur to you along the way. Once you have answered all the questions to your satisfaction, think of a way to present the data and the answers as clearly as possible. In this assignment, you should use existing visualization software tools. You may find it beneficial to use more than one tool.

Before starting, write down the initial question clearly. And, as you go, maintain a wiki notebook of what you had to do to construct the visualizations and how the questions evolved. Include in the notebook where you got the data, and documentation about the format of the dataset. Describe any transformations or rearrangements of the dataset that you needed to perform; in particular, describe how you got the data into the format needed by the visualization system. Keep copies of any intermediate visualizations that helped you refine your question. After you have constructed the final visualization for presenting your answer, write a caption and a paragraph describing the visualization, and how it answers the question you posed. Think of the figure, the caption and the text as material you might include in a research paper.

Your assignment must be posted to the wiki before class on Oct 17, 2016.

Data Sets

You should look for data sets online in convenient formats such as Excel or a CSV file. The web contains a lot of raw data. In some cases you will need to convert the data to a format you can use. Format conversion is a big part of visualization research so it is worth learning techniques for doing such conversions. Although it is best to find a data set you are especially interested in, here are pointers to a few datasets: Online Datasets

Visualization Software

To create the visualizations, we will be using Tableau, a commercial visualization tool that supports many different ways to interact with the data. Tableau has given us licenses so that you can install the software on your own computer. One goal of this assignment is for you to learn to use and evaluate the effectiveness of Tableau. Please talk to me if you think it won't be possible for you to use the tool. In addition to Tableau, you are free to also use other visualization tools as you see fit.

How to create your wiki page

Begin by creating a new wiki page for this assignment. The title of the page should be of the form:

A2-FirstnameLastname.

The wiki syntax will look like this: *[[A2-FirstnameLastname|Firstname Lastname]]. Hit the edit button for the next section to see how I created the link for my name.

To upload images to the wiki, first create a link for the image of the form [[Image:image_name.jpg]] (replacing image_name.jpg with a unique image name for use by the server). This will create a link you can follow that will then allow you to upload the image. Alternatively, you can use the "Upload file" link in the toolbox to upload the image first, and then subsequently create a link to it on your wiki page.

Add a link to your finished reports here

One you are finished editing the page, add a link to it here with full name as the link text.