In the final project for this course, you will apply the techniques learned in this class to analyze a data set of personal interest to you. Your goal should be to create an original project that you would be proud to show off to a potential employer. You are encouraged to upload your project to Github.
You must work on this project with a partner (in a group of 2).
You will turn your work into a poster. Here is a template that you can (but are not required to) use. Your poster will need to be printed on 24" x 36" paper. You may select all the most basic printing options (e.g., matte paper, no lamination, not mounted). We will supply a board and an easel during the poster session; all you need to bring is the poster. Here are some printing options:
You will present this poster at one of two sessions. (The second session is the registrar-scheduled final exam time for this course. The first session is provided as a convenience for students with conflicts.)
Please sign up here for a poster session.
Each poster session is divided into multiple "rounds" of approximately 1 hour each. You will be presenting during one round, and you will review other posters during the other rounds. The presentation is informal, as people will be coming and going. You may want to prepare a 3-minute summary of your project, but otherwise you will mostly be answering questions about your project.
You will also upload your poster and submit your code to Canvas.
| Criterion | 10 points | 8 points | 6 points | 3 points | 0 points |
|---|---|---|---|---|---|
| Research Question | Interesting research question that could be the basis of a publication. | Clear, well-motivated research question. | Research question is fuzzy or not motivated. | Research question is not well defined. | No clear research question. |
| Data Collection | Data collection is extraordinarily complex. | Data collection meets the complexity requirement. | Data collection was simplistic but challenging in some way. | Superficial data collection (e.g., downloaded data set from Kaggle) | No data collection. |
| Data Visualization | Unusually appealing and/or insightful visualizations. | Data visualizations were clean, labeled, and insightful. | Visualizations were technically correct, but not insightful. | Poor data visualizations that were incorrect (e.g., bar plot for a quantitative variable) | No visualizations were provided. |
| Data Analysis | Correctly applied a broad range of techniques from this class and perhaps a few beyond this class, in technically challenging situations. | Correctly applied a broad range of techniques from this class. | Applied techniques incorrectly, or applied only a limited set of techniques. | Data analysis was done, but the approach was fundamentally flawed. | No data analysis. |
| Storytelling | Weaved visualizations and analysis into a compelling story. | Visualizations and analyses told a coherent story. | Visualizations and analyses seemed scattered, with the main thread unclear. | Visualizations and analyses were not tied to a main thread. | No attempt to tell a story. |
| Real-World Application | Project generates insights with immediate real-world impact. | Project generates insights that clearly have the potential to be useful. | With some tweaking, project could have generated useful insights. | The insights generated are not clearly useful. | No insights were generated from this project. |
| Poster | Poster goes above and beyond. | Poster is clean, with a good balance of text and visuals. | Poster content is satisfactory, but a bit lacking in professionalism (e.g., too much text, blurry images). | Poster layout is sloppy. | No poster was made. |
| Presentation | Presentation was highly engaging and memorable. Fielded tough questions. | Gave a good summary of the poster and answered questions well. | Presentation was unclear, or speakers had difficulty answering questions. | Presentation was unclear, and speakers had difficulty answering questions. | Did not attend presentation session. |
You will be separately graded on the organization of your code submission and the quality of the peer reviews that you submit (for other posters).
The best data set is one that you are passionate about. I recommend that you start by finding a question you want to answer and then finding data to answer that question, rather than starting with a data set. That said, here are some helpful websites with large collections of data.