Difference between revisions of "Course Materials Stat 370 Spring 2017"

From Sean_Carver
Jump to: navigation, search
(Day 3: January 27, 2017)
(Day 3: January 27, 2017)
Line 9: Line 9:
 
== Day 3: January 27, 2017 ==
 
== Day 3: January 27, 2017 ==
  
I started by discussing a [[Stat_370_Project_Idea|default project idea].  You are not required to follow this suggestion.  Some students are already doing something else, and that is fine.  Synergies with other projects, outside of this class, are encouraged.
+
I started by discussing a [[Stat_370_Project_Idea|default project idea]].  You are not required to follow this suggestion.  Some students are already doing something else, and that is fine.  Synergies with other projects, outside of this class, are encouraged.
  
 
* Choose a data rich field.  A few ideas: weather, social media, baseball, climate (although access to government data on this subject seems uncertain, at the moment).
 
* Choose a data rich field.  A few ideas: weather, social media, baseball, climate (although access to government data on this subject seems uncertain, at the moment).

Revision as of 21:28, 28 January 2017

Day 1: January 17, 2017

We had introductions and we went over the syllabus. We talked about our goals and dreams and reasons for taking this course. We also talked about our experience with R and with programming in general.

Day 2: January 24, 2017

We talked about getting and installing R and RStudio. We discussed some of the panels of using RStudio including the editor and the console. I showed you dynamic documents with R-Markdown. We wrote an R Script, and a dynamic document. We talked about Markov chains. I presented several project ideas concerning Markov chains, including applications to baseball, and neuroscience.

Day 3: January 27, 2017

I started by discussing a default project idea. You are not required to follow this suggestion. Some students are already doing something else, and that is fine. Synergies with other projects, outside of this class, are encouraged.

  • Choose a data rich field. A few ideas: weather, social media, baseball, climate (although access to government data on this subject seems uncertain, at the moment).
  • Write code to access the data via the web automatically (thus, your program will update the data, as new data arrive).
  • Write code to analyze and display the data in interesting ways.
  • Tie everything together into a dynamic document.
  • Do something more (ideas below) ...

The reason to "do something more" is that I am planning to do most of the above in class. Specifically,

  • We will spend parts of a few class periods devoted to brainstorming and discussing ideas of where to get data for this type of project.
  • In class or at home we, will investigate what needs to be done to access the data.
  • I imagine that for some data sets, even accessing the data in a form useful for the dynamic document you envision will be a project in and of itself. This could be considered (part of) the "do something more."
  • Part of at least one class period will be devoted to learning how to access data via the web, and discussing what others have come up with. There is some flexibility here in terms of how much we cover.
  • We will go over analyzing, and displaying data and dynamic documents.

Some other ideas for "do something more".

  • Web Scraping (getting data from websites with an R-Program, e.g. How many job postings on the Indeed job board have the word "RStudio" or "R" in them?
  • Displaying your dynamics document on the web, and updating automatically every day. (This is probably easy.)
  • Interactive Dynamic Documents (on the website select parameters like bin width for histogram, etc.)
  • Displaying geographic data with maps, etc.
  • Text mining and sentiment analysis (there are R packages for this). E.g. what percentage of tweets with the word "Trump" display a fearful emotion?
  • More?

After I discussed this project idea, we talked about calling and writing functions, and passing arguments to functions. We discussed required/positional arguments and optional/named arguments. The function getwd() was an example of one with no arguments. We talked about getting help...from Google and from the console with the ? and ?? commands. E.g. type "?getwd" for help on getwd() or "??topic" if you don't know the exact function name. We talked about operators such as + - * / and ^. We talked about variables, though we have more to discuss here. We talked about sampling from the vector c("HomeRun", "Single", "Double", "Triple", "Out").