The objective of the final project is for you to apply what we’ve seen in the class to real-world data. You get to choose the question you tackle!
Successful projects should pose a novel and interesting/relevant question, have appropriate data to answer that question, and use pertinent analysis.
Students have the option (and are encouraged!) to reach out at any time during the semester to get feedback about their project. Additionally, there are some milestones during the semester designed to provide feedback to students.
The final project is a group project (3 to 4 students). Students are free to choose their own group, but will be assigned to one by the instructor if they haven’t chosen one (or if their group doesn’t comply with the requirements) by Sept 9th.
The project should be tackled as a team, which means that students are expected to be in constant communication with their group. Failure to do so will have a significant penalty on the student’s grade.
There will be 4 submissions throughout the semester:
I encourage groups to work with existing datasets (see lists below), but, depending on the project, you could also collect your own data (e.g. survey experiments or small lab experiments). If you want to go down this route, please reach out the instruction team for further advice.
In terms of finding existing data, you can use any data you want (as long as it can be shared with the instruction team), but you need to describe how and where you obtained it from. Make sure your data is not too simple (e.g. too few variables, very small dataset), so you can run some interesting analysis on it.
I’m also providing some resources here that can be useful if you don’t know where to start:
#TidyTuesday: A lot of datasets that can be useful. Some of them are pared down, but usually you also have a link to the main data source.
Data Is Plural: A long list of datasets that might be interesting.
Opportunity Insights: Comprehensive datasets put together by the OI team, such as social capital, college mobility, economic tracker, etc.
Current Population Survey: Survey created jointly by the Census Bureau and the Bureau of Labor Statistics.
Home Mortgage Disclosure Act: Publicly available data on the US mortgage market.
Zillow data: Many real estate platforms have some aggregated information available that could be interesting to analyze. You can merge their datasets with other datasets if you need more information (e.g. census data).
Open data portals (e.g. NYC restaurant inspection): Most cities and states have open data portals that allow you to download a lot of relevant data in different areas (e.g. education, health, crime, etc.)
Iowa State University - Business Analytics Research Guide (List of Datasets)
mosaicData: R package that contains multiple datasets (see list here)
YouTube API: This requires some API knowledge, and you need to submit an application for it, but really cool data if you want to dive into this.
Twitter API: Twitter API for academic purpose. You can follow some of the tutorials to see how you can use it (there’s also an R package).
Site-limited testing on Google (especially for Github): One good way of discovering new data can be to search Github for it in Google, for example: climate data .csv site:http://github.com
I’ll keep adding to this list during the first weeks of class