In the final project for this course, you will apply the techniques learned in this class to analyze a data set of personal interest to you. Your goal should be to create an original project that you would be proud to show off to a potential employer. You are encouraged to upload your project to Github.
You must work on this project with a partner (in a group of 2).
Then, you will turn your work into a poster. Here is a template that you can (but are not required to) use. Your poster will need to be printed on 24" x 36" paper. You may select all the most basic printing options (e.g., matte paper, no lamination, not mounted). We will supply a board and an easel during the poster session; all you need to bring is the poster.
Here are some printing options:
You will present this poster at one of two sessions. (The second session is the registrar-scheduled final exam time for this course. The first session is provided as a convenience for students with conflicts.)
Please sign up here for a poster session.
You will also upload your poster and code to Canvas.
| Criterion | 10 points | 8 points | 6 points | 3 points | 0 points |
|---|---|---|---|---|---|
| Research Question | Interesting research question that could be the basis of a publication. | Clear, well-motivated research question. | Research question is fuzzy or not motivated. | Research question is not well defined. | No clear research question. |
| Data Collection | Data collection is extraordinarily complex. | Data collection meets the complexity requirement. | Data collection was simplistic but challenging in some way. | Superficial data collection (e.g., downloaded data set from Kaggle) | No data collection. |
| Data Visualization | Unusually appealing and/or insightful visualizations. | Data visualizations were clean, labeled, and insightful. | Visualizations were technically correct, but not insightful. | Poor data visualizations that were incorrect (e.g., bar plot for a quantitative variable) | No visualizations were provided. |
| Data Analysis | Correctly applied a broad range of techniques from this class and perhaps a few beyond this class, in technically challenging situations. | Correctly applied a broad range of techniques from this class. | Applied techniques incorrectly, or applied only a limited set of techniques. | Data analysis was done, but the approach was fundamentally flawed. | No data analysis. |
| Storytelling | Weaved visualizations and analysis into a compelling story. | Visualizations and analyses told a coherent story. | Visualizations and analyses seemed scattered, with the main thread unclear. | Visualizations and analyses were not tied to a main thread. | No attempt to tell a story. |
| Real-World Application | Project generates insights with immediate real-world impact. | Project generates insights that clearly have the potential to be useful. | With some tweaking, project could have generated useful insights. | The insights generated are not clearly useful. | No insights were generated from this project. |
| Poster | Poster goes above and beyond. | Poster is clean, with a good balance of text and visuals. | Poster content is satisfactory, but a bit lacking in professionalism (e.g., too much text, blurry images). | Poster layout is sloppy. | No poster was made. |
| Presentation | Presentation was highly engaging and memorable. Fielded tough questions. | Gave a good summary of the poster and answered questions well. | Presentation was unclear, or speakers had difficulty answering questions. | Presentation was unclear, and speakers had difficulty answering questions. | Did not attend presentation session. |
| Peer Reviews | Completed required peer reviews and provided insightful feedback that even the instructors missed. | Completed required peer reviews and provided good feedback about each poster. | Completed required peer reviews, but provided perfunctory feedback. | Completed some, but not all peer reviews. Feedback was perfunctory. | Did not complete peer reviews. |
| Submission | Poster and code submitted on time, well-organized. | Did not submit poster or code. |
If you have a different idea for a data science project that does not fit neatly with the above requirements, please talk to Professor Sun.
The best data set is one that you are passionate about. I recommend that you start by finding a question you want to answer and then finding data to answer that question, rather than starting with a data set. That said, here are some helpful websites with large collections of data.