Syllabus: Stat 370 Spring 2017

From Sean_Carver
Revision as of 05:34, 15 January 2017 by Carver (talk | contribs)
Jump to: navigation, search

Introduction to Statistical Computing and Modeling (STAT 370) Spring 2017 Section 001 [UNDER CONSTRUCTION]

Instructor: Sean Carver, Ph.D., Professorial Lecturer, American University.

Contact:

  • office location: 107 Gray Hall
  • email: carver@american.edu
  • office phone: 202-885-6629

Course Description (from department website): The basics of programming using the open source statistical program R. Data analysis, both numerical and qualitative, including graphical and formal inference. Applications include numerical methods, text mining, modeling, and simulation. Usually offered every spring. Prerequisite:

Prerequisite: MATH-221 and STAT-202 or STAT-203, or permission of instructor.

Text: The Book of R, by Tilman M. Davies, No Starch Press, 2016.

Optional Text: Analyzing Baseball Data with R, by Max Marchi and Jim Albert, CRC Press, 2013. This is a great book, if you like Baseball, data science, or especially both. We are going to use it to learn how to simulate a baseball game with a Markov Chain. I may be able to legally provide the relevant chapter of the book, if you want to save money. Don't like baseball, or don't know the rules? Don't worry, neither do I, but this example is fantastic from a pedagogical perspective, and I think you will agree, regardless of your interest and knowledge in the sport.

Software: Please install the following software on the machines you intend to use for this class: R, RStudio, Git, XQuartz (Mac), GitBash (Windows). You are welcome to use a lab computer or your laptop during class.

Learning Outcomes: Students will be able to

  • Submit work as a dynamic document via GitHub
  • Use R as a powerful calculator.
  • Write basic programs using control and data structures.
  • Import data from external sources.
  • Perform analyses using regression, text mining, and simulation.
  • Use SQL databases to retrieve data with specified features.

Office Hours: Students are strongly encouraged to come to office hours if they need or want help.

My office is Gray Hall, Room 107. Office hours are TENTATIVELY scheduled as follows: (may be adjusted throughout the semester)

  • Tuesday, Wednesday, Friday: 4:00 PM TO 6:00 PM.

NOTE: If you would like to come to office hours on a regular or irregular basis and you have a compelling reason why you cannot make it during the hours listed above, please send me an email. I cannot guarantee that I will be able to find a time that works (this semester will be a very busy one for me), but I will try.

Class times and locations:

  • Tuesday, Friday 11:20 AM TO 12:35 PM, ANDERSON B-13

Important Dates:

  • January 17 (Tuesday): First day of class.
  • January 20 (Friday): Inaugaration Day, no class.
  • January 24 (Tuesday): Initial Project Brainstorm (come with ideas)
  • February 14 (Tuesday): Project Proposals due.
  • March 12 - 19: Spring Break, no class.
  • March 21 (Tuesday): Project Updates due.
  • March 31 (Friday): Midterm Exam.
  • April 28 (Friday): Last day of classes and final projects due.
  • May 12 (Friday): Grades due to registrar, no final exam.

Optional Extra-credit: I give you the opportunity to complete an optional extra-credit project. These projects can be a lot of work, but they can also be, for less extra credit, much less work. Topics will be different for each person. Your project must relate to statistics. Your project must involve effort that has an educational benefit to you. There must be a component of the project that communicates your results to me, as either a paper, a PowerPoint presentation, a statistical dashboard (Google this, if you do not know what this is), a YouTube video, etc. If doing a YouTube video, email me the link and include it in the written part you turn in. (Obviously, you won't print out a video, but, as explained below, there is more to turn in, and a printed link should be included.) For all other media, you must give me a hard copy. PowerPoint presentations should be turned in as a printout of the slides -- also, if there is time, you can present the PowerPoint to me during office hours, but it must be before the deadline. For PowerPoint printouts, black and white, reduced sized, images are fine, as long as they are readable.

The suggested project involves obtaining data from the web, exploring the data, asking and answering questions with statistics, then communicating the results in a compelling way. In addition to working with data, there can also be independent study, library research, interviews of statisticians, etc. Part of your project could be learning a software tool useful for statistics or data science. If you want to collect your own data, (I actually discourage this), you MUST do it in a scientifically acceptable way. This semester, there is a separate election day data collection project that is worth more points. These projects can be combined (collection and analysis) for credit on both.

If these projects sound like a lot of work, they can be, but remember that they are optional and extra credit. You will get some credit for anything you do along these lines, and anything you do will help you.

If you are thinking of doing a project, please work with me to decide on a project topic. We will also brainstorm ideas in class. Pick a topic and a project that excites you. Your project should relate to your passions, goals, dreams and/or interests. My idea is that you will really want to do this project which is why I am giving you a lot of freedom to design it.

Suggested topics (actually, whatever interests you): sports (of various kinds, there are lots of free good data on baseball), entertainment, movies (again good data), law, criminology, government, city planning, architecture, weather, climate, geology, seismology, medicine, epidemiology, health, fitness, biology, evolution, extinction, ecology, math, computer science, statistics, data science, anthropology, ethnic studies, gender studies, history, sociology, culture, tourism, archeology, art, literature, writing, journalism, census, linguistics, finance, economics, business, astronomy, physics, chemistry, library sciences, theology, anything else you can think of.

Curated data sets exist for many of these topics, although some cost money. For curated data sets, free or otherwise, you just download them, although sometimes you have to do more work to get the data into a usable format.

A more advanced technique is to use a "web scraper" which masquerades as a browser and pulls data directly from the web. One student was successful at doing this last Spring (she used a website dedicated to this effort). Some websites have their own Application Programming Interfaces (API) which facilitate this process (examples: twitter, facebook, linked in). These more advanced techniques may be difficult, and often involve computer programming. I am a computer programmer, but I do not have a lot of experience with web scraping. That said, I have a lot of books on the subject and would love to learn how. If you are interested, let's try it together during office hours.

Last Spring, I gave some students extensive help on their projects. Help does not count against you, even extensive help. Many other students did not ask for help, and that is OK, too. Of course, some students chose to not even do a project, which was also fine. (If you don't do a project, you won't get any extra credit, but it won't count against you, either). Anything you choose is fine with me, but if you want help, ask early and come to office hours in the beginning, and all throughout the semester. Things can get busy toward the end, both for you and for me. Starting early will also give you more time, and you will need time to do these projects well. You can also get help from other sources (family, friends, other professors, etc), but you must disclose the help you receive in writing in an "acknowledgements" section, when you turn it in. That said, I encourage you to get help, if you need or want it, as long as you do not take credit for others' work. Along these same lines, cite your sources. You must also cite the source(s) of your data.

There are several planned milestones with due dates for completing the project:

  • September 8 (Thursday): Getting Data Discussion and Extra Credit Preparation due.
  • September 15 (Thursday): Extra Credit Project Proposal due. Indicate participation in election day data collection and choose a partner.
  • October 17 (Monday): Extra Credit Project Updates due.
  • November 8 (Tuesday): Election Day - Extra Credit Data Collection happens on that day.
  • November 10 (Thursday): Extra Credit Election Data Collection Data and Write-Up due
  • December 8 (Thursday): (Last day of class, and) Extra Credit Projects due.

Data discussion, September 8:

You can get extra credit (1 homework) for doing the data discussion preparation whether or not you choose to do a project. Due Thursday, September 8. You must be present in class on Thursday, September 8 to get the points, and you must both participate in the discussion in class and turn in (that same day) a short written piece (one or a few paragraphs) describing your experience with the assignment and answering the questions below. To complete the assignment, pick a topic, suggestions are listed above. See what data you can find on the web concerning this topic. Use Google, and start with the key words "data" and your topic. Are the data you find free or do they cost money? Can you download the data set or do you need a computer program (or hand copy) to pull them off the web? If you can download the data, can you load it into StatCrunch or do the data require "munging" to be used by StatCrunch? Then answer the following questions (you should know how by September 8): What are the cases, and what are variables? (If there are many variables, what are some of the ones that are of interest to you?) Spend at least 45 minutes on this assignment. If you finish with your first topic in less than 45 minutes, try another topic.

The project proposal, September 15:

Turn in one or a few paragraphs describing what you would like to do. If you intend to participate in the election day data collection (whether or not you are doing a project) please indicate your intention on a separate sheet of paper, submitted together with your name and the name of your chosen partner (each pair turns in one sheet of paper). If you haven't found a partner by class time on September 15, don't worry. Single participants will pair up during the September 15 class.

The project update: October 17 (Monday).

Turn in several paragraphs describing what you have done so far and what problems you have run into.

Election Day Data collection: November 8 (Tuesday).

This is a project run by Mary Gray, another professor in Statistics. Participation in the data collection is separate from the project. You can get credit for one project, or the other project, or both. Your main project can involve the data being collected on election day (yours and everyone else's) but it doesn't have to. See Professor Gray's description of the whole project here here.

Data due: November 10 (Thursday).

Turn in the data you collected and a write up of your experience (one or a few paragraphs) to me (Carver) on November 10 (Thursday) or November 9 (Wednesday) and I'll pass the data along to Professor Gray.

Final projects due December 8 (Thursday): (Last day of class).

There are various allowed formats for the final project write up (paper, PowerPoint Presentation, Data Dashboard, YouTube video, etc, described above). Whatever format you choose, your final projects must have an addendum titled "behind the scenes" which describes how you did the project and where you got your data, and must also include an acknowledgements section. Additionally, optional sections may include "dead-ends" and "dreams for the future" for which you will get credit for things that did not work, or good ideas you had which you did not have time to implement. Creativity will be rewarded!

Grades will be awarded as percentage points added to your final score. Typically, this will be up to 3 percentage points added to your final grade. A perfect "3" will generally be a project which is a good start of something that looks promising for publication. Fractional scores (e.g. 2.5) may also be awarded. Some credit will be given for partially completed projects, but you must complete the milestones by the deadlines.

Tentative grading scheme:

ITEM PERCENT
Attendance and Participation 15%
Homework 10%
Exam 1 25%
Exam 2 25%
Final 25%
Extra Credit Data Discussion Preparation Write-up + 1 Homework
Extra Credit Project + 0-3%
Election Day Data Collection + 1.5%

Class Etiquette: Please give the class your full attention and refrain from talking during lectures, texting, surfing the web, and similar distractions. If you need to attend to something urgently, it is OK to excuse yourself from the classroom.

Please participate in class by asking questions when you do not understand something. Invariably other students benefit from these questions. Please engage in discussions, and please engage with the class, generally.

Academic Integrity: Cheating is not acceptable and will not be tolerated. Consider this: in subtle ways, cheating to get a better grade on an exam can result in lowering the grades of some of your classmates. Certainly this is true when a specific curve is used to assign grades. Even when I don't use curves explicitly, they can be implicit in decisions about writing and grading exams. As required by the policy of American University, I will report all suspected cases of cheating to the Dean's office who will proceed to investigate and adjudicate the issues. Cheating is giving or receiving unauthorized assistance on exams, from other students or other people, from notes, from books, or from the web. When inappropriate copying between students is caught, both parties may be culpable.

Homework, Attendance and Participation Policy: Homework is worth 10% of your grade, with each assignment weighted equally (10 points maximum). The due date for homework is technically the end of class, one week after it is assigned, however you should make every effort to complete each assignment as soon as possible. That said, you can turn in homework anytime, up to Wednesday, December 7. Homework turned in after the technical due date will receive a maximum of only half credit (5 points). I plan to post deadlines on this website.

I like to give the solutions to homework problems at the same time I assign the problems. Conscientious students, who wrestle with problems before looking at the answers, benefit from having instant feedback about their solutions, right, wrong, or incomplete. Less conscientious students who use the answers to easily complete the assignments often do poorly on exams. The responsibility for your education rests in your own hands. Don't be one of the outliers who use shortcuts to avoid preparing for the exams. Concerning homework, you are encouraged to work with your classmates, if you find that helpful. In fact, you are encouraged to do whatever you find most helpful with the homework, but by turning in a solution to a problem, you pledge that you understand the solution, or that you talked to me in office hours or during or after class and made a good faith effort to understand how to do the problem. If it looks like you got the full benefit from the assignment, and if you turn it in by the due date, I will award you a perfect 10 points. I may mark you down if it seems that you have copied the answers without including any of the required calculations. You must include your work.

Additionally, 15% of your grade is for class attendance and participation. I understand that there are times when you cannot make it to class for compelling reasons. To accommodate the unavoidable, I will forgive a fixed number of absences for everyone when I compute the final grades. If you need to miss more than several classes, please see the Dean of Students. I will only erase absences from my grade book with a request through the Dean of Students. That said, I do appreciate an email when you can't make it to class. Absences on exam days must be excused through the Dean of Students.

I plan to post grades to Blackboard, promptly.

Public Service Announcement: A representative of AU's Students Against Sexual Violence (SASV) approached me and asked me to include on my syllabi a list of resources available for survivors of sexual assault and their friends. While sexual violence is by no means the only challenge faced by students, I agree that this issue merits particular attention, so I am honoring her request by attaching the list she gave me:

Sexual Assault Resources

  • It’s never the survivor’s fault. There are many people you can talk to if you or someone you care about has been sexually assaulted:
  • AU's Sexual Assault Prevention Coordinator Daniel Rappaport (rappapor@american.edu)
  • AU's Coordinator for Victim Advocacy Sara Yzaguirre (sarayza@american.edu)
  • DC SANE Program (Sexual Assault Nurse Examiner) 1-800-641-4028
  • The only hospital in DC area that gives Physical Evidence Recover Kits (rape kits) is Medstar Washington Hospital
  • DC Rape Crisis Center: 202-333-7273
  • Students found responsible for sexual misconduct can be sanctioned with penalties that include suspension or expulsion from American University, and they may be subject to criminal charges
  • If you want to submit a formal complaint against someone who has sexually assaulted you, harassed you, or discriminated against you based on your gender identity or sexual orientation, you can do so online at http://www.american.edu/ocl/dos/, or contact the Dean of Students at dos@american.edu or 202-885-3300. These are Title IX violations, and universities are legally required to prohibit these actions.
  • Resources on campus that are required to keep what you tell them confidential are Daniel Rappaport, Sara Yzaguirre, ordained chaplains in Kay, and counselors at the counseling center. (OASIS may also belong here but it didn't exist when this list was created.)