Discussion of Stat 370
John Nolan's course description for Stat 370 left a lot of room for personalization. To decide what direction to take the course, I instructed students to propose projects of interest to them. I told students to let their passion motivate their learning. We discussed and brainstormed ideas for projects, in class, and during office hours, and I used student feedback and stated interests to guide the decisions about where to take the course.
Early on in the class, we covered reproducible research with R-Markdown and knitr. As part of the discussion, I mentioned to the class that R-Studio can easily build an entire website and I raised the possibility of students presenting their projects in this way. I explained that, with a website, students could showcase their work for family, friends, classmates, and potential employers. A good number of students became passionately interested in this opportunity, and none of the students objected to me taking the class in this direction. We decided that all graded work for this class would go onto dynamic documents that would be accessible via this website.
Here is what we have covered so far, or will cover, time permitting.
- Using R-Studio, including the editor and the console.
- Working with variables in R, including all the various types.
- Working with Vectors, Matrices, Arrays, Lists, Factors, and Data Frames.
- Programming with functions, passing arguments, and assigning default values to arguments.
- Programming with loops and conditionals.
- Performing reproducible research with R-Markdown and knitr.
- Loading data into R.
- Downloading data sets with R.
- Using Git and Github for version control, including through a shell, Git's app, and R-Studio IDE.
- Committing, pushing and pulling Git repositories.
- Branching with Git.
- Creating and downloading Git repositories from Github, including cloning repositories from other people.
- Using Git and Github for collaboration, including forking repositories, submitting pull requests, using the project's issue tracker, and using project wikis.
- Performing random number generation and sampling.
- Simulating Markov Chains, especially with regard to sports applications: baseball, tennis and volleyball.
- Creating websites in R-Studio with R-Markdown.
- Creating interaction in website with Shiny.
- Creating elegant plots with ggplot2.
- Text mining twitter.
- Performing sentiment analysis.
- Using database software with R and using SQL.
- Model Selection
Some of the material covered, or to be covered, preserves the original intent of the course, including R programming, random number generation, sampling, simulations, graphical analysis, text mining and database/SQL. But I am also taking the course in new directions including version control and collaboration with Git and Github, reproducible research, website generation, and website interaction through Shiny. The driving force behind these new directions has mostly been student interest, although I admit that I am also interested. I became emboldened to try these new avenues by informal discussions with colleagues that suggested that this material is not taught elsewhere at American University.
There is no plans to offer a course designated as Stat 370 again. John Nolan's course will become a 400/600 course, and will probably not be taught the way I am teaching this course. But it seems like there is a strong demand for the material I am currently teaching. I plan to propose a new course labeled "Data Presentation and Reproducible Research."
To avoid overlapping with Nolan's course, I would emphasize Git and Github, website generation, knitr, R-Markdown, Shiny and ggplot2. We would not cover Markov Chains. But to teach this course well, we would have to cover R programming, and to do something non-trivial with reproducible research we would need to cover other topics that may overlap including R programming, variables, functions, loading data into R, downloading files, sampling and random numbers, text mining, web scraping, and SQL databases. If I taught the 400/600 class, in addition to the proposed class, I would emphasize Markov chains. I would also teach Euler's method using an Integrate and Fire neuron, then a network of such neurons. We would talk about dynamical systems. Additionally, I would cover Newton's method and various optimization methods, and model selection. I would prefer that Data Presentation and Reproducible Research be a prerequisite, because then students would already would know how to program in R and present their work. And in that case, the Calculus prerequisite to the 400/600 course would make more sense.