Stat 370 Project Idea

From Sean_Carver
Jump to: navigation, search

You are not required to follow this suggestion. Some students are already doing something else, and that is fine. Synergies with other projects, outside of this class, are encouraged.

  • Choose a data rich field. A few ideas: weather, social media, baseball, climate (although access to government data on climate seems uncertain, at the moment).
  • Write code to access the data via the web automatically (thus, your program will update the data, as new data arrive).
  • Write code to analyze and display the data in interesting ways.
  • Tie everything together into a dynamic document.
  • Do something more (ideas below) ...

The reason to "do something more" is that I am planning to do most of the above in class. Specifically,

  • We will spend parts of a few class periods devoted to brainstorming and discussing ideas of where to get data for this type of project.
  • In class, or at home, we will investigate what needs to be done to access the data.
  • I imagine that for some data sets, even accessing the data in a form useful for the dynamic document you envision will be a project in and of itself. This could be considered (part of) the "do something more."
  • Part of at least one class period will be devoted to learning how to access data via the web, and discussing what others have come up with. There is some flexibility here in terms of how much we cover.
  • We will go over analyzing, and displaying data and dynamic documents.

Some other ideas for "do something more".

  • Web Scraping (getting data from websites with an R-Program, e.g. How many job postings on the Indeed job board have the word "RStudio" or "R" in them?)
  • Displaying your dynamics document on the web, and updating automatically every day. (This is probably easy.)
  • Interactive Dynamic Documents (on the website select parameters like bin width for histogram, etc.)
  • Displaying geographic data with maps, etc.
  • Text mining and sentiment analysis (there are R packages for this). E.g. what percentage of tweets with the word "Trump" display a fearful emotion?
  • More?