In the final project for this course, you will apply the techniques learned in this class to analyze a data set of personal interest to you. Your goal should be to create an original project that you would be proud to show off to a potential employer. You are encouraged to upload your project to Github.
You are expected to work on this project with a partner (in a group of 2).
A project which meets our expectations will earn 8/10 in each category. To earn higher, you have to exceed our expectations. We cannot tell you how to do a great data science project, in the same way there is no recipe to write a great novel. We have taught you the tools; now your task is to do something surprising, creative, and human with them: this is the value of data science in an age of AI.
| Criterion | 10 points | 8 points | 6 points | 3 points | 0 points |
|---|---|---|---|---|---|
| Research Question | Interesting research question that could be the basis of a publication. | Clear, well-motivated research question. | Research question is fuzzy or not motivated. | Research question is not well defined. | No clear research question. |
| Data Collection | Data collection is extraordinarily complex. | Data collection meets the complexity requirement. | Data collection was simplistic but challenging in some way. | Superficial data collection (e.g., downloaded a CSV from Kaggle) | No data collection. |
| Data Visualization | Unusually appealing and/or insightful visualizations. | Data visualizations were clean, labeled, and insightful. | Visualizations were technically correct, but not insightful. | Poor data visualizations that were incorrect (e.g., bar plot for a quantitative variable) | No visualizations were provided. |
| Data Analysis | Correctly applied a broad range of techniques from this class and perhaps some beyond this class, in technically challenging situations. | Correctly applied a broad range of techniques from this class. | Applied techniques incorrectly, or applied only a limited set of techniques. | Data analysis was done, but the approach was fundamentally flawed. | No data analysis. |
| Storytelling | Weaved visualizations and analysis into a compelling story. | Visualizations and analyses told a logical story. | Visualizations and analyses seemed scattered, with the main thread unclear. | Visualizations and analyses were not tied to a main thread. | No attempt to tell a story. |
| Real-World Application | Project generates insights with tangible real-world impact. | Project generates insights that clearly have the potential to be useful. | With some tweaking, project could have generated useful insights. | The insights generated are not clearly useful. | No insights were generated from this project. |
| Poster | Poster goes above and beyond in integrating design with content. | Poster communicates the content clearly, with a good balance of text and visuals. | Poster content is satisfactory, but a bit lacking in professionalism (e.g., too much text, blurry images). | Poster layout is sloppy. | No poster was made. |
| Presentation | Memorable presentation that was engaging. Fielded tough questions. | Clear, concise high-level summary of the poster (3 minutes or less) and answered questions well. | One of the following issues: high-level summary was unclear or too long, questions were not answered well. | Multiple issues: high-level summary was unclear or too long, questions were not answered well. | Did not attend presentation session. |
The best data set is one that you are passionate about. I recommend that you start by finding a question you want to answer and then finding data to answer that question, rather than starting with a data set. That said, here are some helpful websites with large collections of data.