Assignment 1 - Spreadsheet Data Analysis and Visualization

Due Date: Sunday April 16 at 11:59 PM

Honor Code: Under the Honor Code at Stanford, you are expected to submit your own original work for assignments, projects, and exams. On many occasions when working on assignments or projects (but never exams!) it is useful to ask others -- the instructor, the TAs, or other students -- for hints, or to talk generally about aspects of the assignment. Such activity is both acceptable and encouraged, but you must indicate on all submitted work any assistance that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding, writing up, and being able to explain all work that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

Datasets: This assignment makes use of the following two datasets; more information is provided on the Datasets page, but what's here is sufficient for you to complete the assignment.
  • Kickstarter Dataset Each row captures one Kickstarter campaign and its stats.
    • name - Kickstarter project name
    • goal - Goal amount
    • pledged - Amount of money pledged so far
    • status - Current status of the project (successful/live/incomplete)
    • country - Country name
    • staff_pick - TRUE or FALSE if the kickstarter staff picked it
    • backers_count - Number of backers
    • category - Category
  • Countries and Continents Each row has a country and its continent.
    • Country - Country Name
    • Continent - Continent

Setup Instructions: We recommend using Google Sheets for the Data Analysis problems. If you use Excel or another spreadsheet program we may not be able to help you with special features, and you should explain any nonstandard features that you use.
  1. Download the two datasets by clicking on the links above.
  2. Go to the Google Drive website. You may use your personal Gmail account to log in, or you can use your SUNet ID by entering your Stanford email address -- Google should automatically redirect you to WebLogin.
  3. To upload the CSV files, drag them from your desktop or file browser to the main Google Drive page.
  4. To open a newly uploaded CSV file, right-click the CSV, select "Open with" and choose "Google Sheets". After you open your CSV file in Google Sheets once, it will save a new version that opens automatically in Sheets from then on.
  5. You're now ready to solve the problems -- have fun!
Submission Instructions: You are to create one PDF file containing all of your answers for this assignment, then submit it to your Canvas account. (You may submit multiple times, however we will use your last submission only, both for grading and for the late policy.)
  • For our grading system, it's critical that the answer to each problem is on a separate page. Answers spanning multiple pages are allowed, as long as the answer to the next problem starts on a new page.
  • When you're ready to submit, save your final document as pdf, and please do check it over to make sure it looks how you expect and that each answer is indeed on its own page.


Part 1: Spreadsheet Data Analysis (3 points per problem)

For each of these problems you will submit an answer obtained by performing spreadsheet operations on the provided data, and a description of the how you manipulated the spreadsheet(s) to obtain your answer.

Problem 1. Which was the project with the smallest goal? State the name of the project and the goal amount, then explain briefly how you arrived at the answer.

Problem 2. What is the total sum of goal amounts for projects where the pledged amount was greater or equal than the goal amount? State the total sum and explain briefly how you manipulated the spreadsheet to get the answer.

Problem 3. Which was the most ambitious project (highest goal) where the pledged amount was greater or equal than the goal amount? State the project's name and goal amount, then explain briefly how you manipulated the spreadsheet to get the answer.

Problem 4. Which country has the highest percentage of successful projects (status = successful)? State the country's name and the percentage rate of success, then explain briefly how you manipulated the spreadsheet to get the answer.

Problem 5. Which are the top 5 most popular categories (From highest to lowest) in terms of number of backers? List out the 5 categories along with the number of backers for each category, then explain briefly how you manipulated the spreadsheet to get the answer.

Problem 6. What is the average pledge amount by continent? State the average pledge amount for Europe, North America and Oceania.

Part 2: Data Visualization (3 points per problem)

Problem 7. Build an alluvial diagram (http://app.rawgraphs.io/) to visualize status, country and staff_pick. Copy the diagram onto your submission.

Problem 8. Come up with an interesting visualization of your own, using Google Sheets, then label and describe it.

Problem 9. Come up with an interesting visualization of your own, using rawgraphs, then label and describe it.