Semester Project#
This project is designed to take you through the full process of a data science analysis. Working in a group (how research truly done), you will find a data set, analyze it, and share the results. We will work on this project each day in class but you will need to work on it outside class as well.
The project will include several elements that you will need as a data scientist:
Collection
Processing and Cleaning
Analysis
Visualization
Interpretation
Collectively this is called the Data Life Cycle1.
Project Structure#
First you will want to talk in your group about something that you all are interested in. This could be something you have in common through school, a shared interest or hobby, or something else. The point of this is to narrow down all the possible subjects to a focused list of topics.
You will then need to formulate a research question based on your interests? This could be some thing broad like “how can we stop a pandemic?” or specific “is the book really better than the movie?”
Once you have your research question, go looking for a data set “in the wild”. This means the data set you pick might or might not exist in a reasonable form, much less be clean and structured. Most data used in research starts this way. Some starting points are listed below.
Each member of the group will then be responsible for taking their own unique spin on answering the research question.
Graded Components#
As part of the project, you’ll need to complete an analysis, starting with your research question and raw data all the way to a presentation to the class demonstrating your group’s collective analysis.
Each group will be responsible for:#
Their dataset
Their analysis code, hosted in GitHub (the entire project should be managed in GitHub)
A jointly written paper (min 5 pages) explaining their project, dataset, analyses, and conclusions
A 20 minute presentation explaining their project on the last day of classes
A 10 minute “checkpoint” presentation at the halfway point in the semester
As well as other components as the semester progresses
Each member will also provide:#
A paper explaining each individual analysis (min 3 pages)
Peer feedback for presentations; both midpoint and final.
The end result of this project should be an analysis that can be used as a portfolio piece for future employers.
Dataset Resources#
A few places to look for data sets. If you can’t find what you are looking for in one of these, you can always collect your own data as well.