Difference between revisions of "Syllabus: Stat 370 Spring 2017"

From Sean_Carver
Jump to: navigation, search
 
(45 intermediate revisions by the same user not shown)
Line 1: Line 1:
<big> '''Introduction to Statistical Computing and Modeling (STAT 370) Spring 2017 Section 001 [UNDER CONSTRUCTION]''' </big>
+
<big> '''Introduction to Statistical Computing and Modeling (STAT 370) Spring 2017 Section 001''' </big>
  
'''Materials:''' [[Lectures:_Stat_202_Fall_2016|''[Lectures Notes (Fall 2016)]'']][[Old_Lectures:_Stat_202|''[Old Lectures Notes (Previous Semesters)]'']][[Homework:_Stat_202_Summer_2016|''[Homework]'']][[Data:_Stat_202|''[Data]'']][[Links:_Stat_202_Summer_2016|''[Links and Other Materials]'']][[Practice:_Stat_20X|''[Practice Problems]'']]
+
[[Course_Materials_Stat_370_Spring_2017|Course Materials (click here)]]
  
 
'''Instructor:''' <big> Sean Carver, Ph.D., </big> Professorial Lecturer, American University.
 
'''Instructor:''' <big> Sean Carver, Ph.D., </big> Professorial Lecturer, American University.
Line 11: Line 11:
 
* office phone: 202-885-6629
 
* office phone: 202-885-6629
  
'''Course Description (from department website):''' Data presentation, display, and summary, averages, dispersion, simple linear regression, and correlation, probability, sampling distributions, confidence intervals, and tests of significance. Use of statistical software both to analyze real data and to demonstrate and explore concepts. Four credit hours.
+
'''Course Description (from department website):''' The basics of programming using the open source statistical program R. Data analysis, both numerical and qualitative, including graphical and formal inference. Applications include numerical methods, text mining, modeling, and simulation. Usually offered every spring. Prerequisite:
  
'''A Word of Warning:''' The Math/Stat Department at AU teaches STAT 202 to prepare students to use statistics in advanced courses required for many majors.  Thus the STAT 202 instructor does not always have the luxury of setting the most comfortable and easy pace through the course material.  The pace will be determined by what we need to cover for your future classes.  There is a lot of material in the curriculum, so be prepared to work hard and spend a lot of time studying outside of class.
+
'''Prerequisite:''' MATH-221 and STAT-202 or STAT-203, or permission of instructor.
  
'''Another Word of Warning:''' The material at the beginning of the class is ''much'' easier than the material at the end. '''Do not''' assume that Stat 202 is an easy class based on your effort and performance on the first exam, or even on your effort and performance on the first two exams.  The second exam is harder than the first and the last exam is the hardest.
+
'''Text:''' ''The Book of R'', by Tilman M. Davies, No Starch Press, 2016.
  
'''Prerequisite:''' MATH-15x or higher, or permission of departmentNo prior knowledge of statistics is assumed.
+
'''Optional Text:''' ''Analyzing Baseball Data with R'', by Max Marchi and Jim Albert, CRC Press, 2013.  This is a great book, if you like baseball, data science, or especially both.  We are going to use it to learn how to simulate a baseball game with a Markov Chain.  I may be able to legally provide the relevant chapter of the book, if you want to save money.  Don't like baseball, or don't know the rules?  Don't worry, neither do I, but this example is fantastic from a pedagogical perspective, and I think you will agree, regardless of your interest and knowledge in the sport.
 +
 +
'''Software:''' Please install the following software on the machines you intend to use for this class: R, RStudio, Git, XQuartz (Mac), GitBash (Windows)You are welcome to use a lab computer or your laptop during class.
  
'''Text:''' Intro to Practice of Statistics, Edition: 8th. (About $200, new from Amazon.) Online version of textbook (12 month subscription for about $100):  http://www.macmillanhighered.com/Catalog/Product.aspx?isbn=1464133409 .  Every night, there will be reading assignments from the text involving the material we just covered.  We will discuss this reading next class, when I review.
+
'''Learning Outcomes:'''  Students will be able to
  
'''Software:''' StatCrunch (web-based software), accessed from a browser with this link:  http://statcrunch.american.edu/.  From this link, StatCrunch is free with AU credentials.  You can also access StatCrunch from StatCrunch.Com but you will need to pay for access through this site.
+
* Submit work as a dynamic document via GitHub, and learn some of its tools for collaboration.
 
+
* Use R as a powerful calculator.
'''Bring Your Laptops To Class!'''  I will be demonstrating software in class with the idea that you follow along with your own computer.  Additionally, I will be giving problems to solve in class that require a computer.  If you do not have a laptop of your own, you may be able to borrow one from the library.
+
* Write basic programs using control and data structures.
 
+
* Import data from external sources.
'''Learning Outcomes:''' These learning objectives may be tweaked and edited throughout the semester.
+
* Perform analyses using regression, text mining, and simulation.
 
+
* Use SQL databases to retrieve data with specified features.
By the end of the course, the student should be able to:
 
 
 
* Use and understand common statistical terminology.
 
* Understand data collection methods including designed experiments and sampling methods.
 
* Know when to use stemplot, histograms, pie charts, bar charts, and box plots to describe a given distribution.
 
* Calculate and interpret the measures of center and spread.
 
* Understand the concepts of correlation and linear regression.
 
* Understand the concepts of randomness and probability.
 
* Understand and interpret probability distributions such as the normal, student's t- and chi-square distributions.
 
* State the central limit theorem and understand the concept of a sampling distribution.
 
* Calculate confidence intervals for means and proportions--one sample.
 
* Use sampling techniques to test hypotheses for means and proportions--one and two samples, contingency table, and goodness-of-fit.
 
  
 
'''Office Hours:'''  Students are strongly encouraged to come to office hours if they need or want help.
 
'''Office Hours:'''  Students are strongly encouraged to come to office hours if they need or want help.
Line 44: Line 34:
 
My office is Gray Hall, Room 107.  Office hours are '''TENTATIVELY''' scheduled as follows:  (may be adjusted throughout the semester)
 
My office is Gray Hall, Room 107.  Office hours are '''TENTATIVELY''' scheduled as follows:  (may be adjusted throughout the semester)
  
* Monday, Wednesday, Thursday: 2:30 PM TO 4:30 PM.
+
* Tuesday, Wednesday, Friday: 4:00 PM TO 6:00 PM.
  
 
NOTE: If you would like to come to office hours on a regular or irregular basis and you have a compelling reason why you cannot make it during the hours listed above, please send me an email.  I cannot guarantee that I will be able to find a time that works (this semester will be a very busy one for me), but I will try.
 
NOTE: If you would like to come to office hours on a regular or irregular basis and you have a compelling reason why you cannot make it during the hours listed above, please send me an email.  I cannot guarantee that I will be able to find a time that works (this semester will be a very busy one for me), but I will try.
 
'''Tutoring through AU's Academic Support and Access Center.'''  By appointment.  See http://www.american.edu/ocl/asac/Tutor-Services.cfm
 
 
'''Tutoring through MATH/STAT tutoring center:'''  Gray Hall, Room 110, walk-ins welcome
 
* Monday - Thursday: 11:00 AM - 8:00 PM
 
* Friday: 11:00 AM - 3:00 PM
 
* Sunday: 3:00 PM - 8:00 PM
 
* Saturday: Closed
 
* The Tutoring Lab will open on September 6th
 
* Contact: Dr. Behzad Jalali
 
* Phone: 202-885-3154
 
* Alt Phone: 202-885-3120
 
* E-mail: bjalali@american.edu,
 
  
 
'''Class times and locations:'''
 
'''Class times and locations:'''
* Section 3: Monday, Wednesday, Thursday: 11:20 AM TO 12:35 PM, WARD 303 (Wednesday's class ends 15 minutes early).
+
* Tuesday, Friday 11:20 AM TO 12:35 PM, ANDERSON B-13
  
 
'''Important Dates:'''
 
'''Important Dates:'''
* August 29 (Monday): First day of class.
+
* January 17 (Tuesday): First day of class.
* September 5 (Monday): Labor Day, no class.
+
* January 20 (Friday): Inaugaration Day, no class.
* September 8 (Thursday): Getting Data Discussion and Extra Credit Preparation due.
+
* January 24 (Tuesday): Initial Project Brainstorm (come with ideas).
* September 15 (Thursday): Extra Credit Project Proposal due.  Also, last day to indicate your intention to do election day data collection, and choose your partner.
+
* February 14 (Tuesday): Project Proposals due.
* September 22 (Thursday): Exam 1.  Location to be decided.
+
* March 12 - 19: Spring Break, no class.
* October 17 (Monday): Extra Credit Project Updates due.
+
* March 21 (Tuesday): Project Updates due.
* November 3 (Thursday): Exam 2.  Location to be decided.
+
* March 31 (Friday): Midterm Exam.
* November 8 (Tuesday): Election Day - Extra Credit Data Collection happens on that day.
+
* April 28 (Friday): Last day of classes and final projects due.
* November 10 (Thursday): Extra Credit Data Collection Write-Up due
+
* May 12 (Friday): Grades due to registrar, no final exam.
* November 22-27 (Tuesday-Sunday): Thanksgiving Break, no class.
 
* December 8 (Thursday): Last day of class, and Extra Credit Projects due.
 
* December 12 (Monday): 11:20-1:50 Final Exam.  Location to be decided.
 
* See also AU's academic calendar: http://www.american.edu/provost/registrar/upload/Academic-Calendar-2016-2017-Inc-WCL-20160720.pdf
 
 
 
'''Optional Extra-credit:'''  I give you the opportunity to complete an '''optional extra-credit''' project.  These projects can be a lot of work, but they can also be, for less extra credit, much less work.  Topics will be different for each person.  Your project must relate to statistics. Your project must involve effort that has an educational benefit to you.  There must be a component of the project that communicates your results to me, as either a paper, a PowerPoint presentation, a statistical dashboard (Google this, if you do not know what this is), a YouTube video, etc.  If doing a YouTube video, email me the link and include it in the written part you turn in.  (Obviously, you won't print out a video, but, as explained below, there is more to turn in, and a printed link should be included.)  For all other media, you must give me a hard copy.  PowerPoint presentations should be turned in as a printout of the slides -- also, if there is time, you can present the PowerPoint to me during office hours, but it must be before the deadline.  For PowerPoint printouts, black and white, reduced sized, images are fine, as long as they are readable.
 
 
 
The suggested project involves obtaining data from the web, exploring the data, asking and answering questions with statistics, then communicating the results in a compelling way.  In addition to working with data, there can also be independent study, library research, interviews of statisticians, etc.  Part of your project could be learning a software tool useful for statistics or data science.  If you want to collect your own data, (I actually discourage this), you MUST do it in a scientifically acceptable way.  This semester, there is a separate election day data collection project that is worth more points.  These projects can be combined (collection and analysis) for credit on both.
 
 
 
If these projects sound like a lot of work, they can be, but remember that they are optional and extra credit.  You will get some credit for anything you do along these lines, and anything you do will help you.
 
 
 
If you are thinking of doing a project, please work with me to decide on a project topic.  We will also brainstorm ideas in class.  Pick a topic and a project that excites you.  Your project should relate to your passions, goals, dreams and/or interests.  My idea is that you will really want to do this project which is why I am giving you a lot of freedom to design it.
 
 
 
Suggested topics (actually, whatever interests you): sports (of various kinds, there are lots of free good data on baseball), entertainment, movies (again good data), law, criminology, government, city planning, architecture, weather, climate, geology, seismology, medicine, epidemiology, health, fitness, biology, evolution, extinction, ecology, math, computer science, statistics, data science, anthropology, ethnic studies, gender studies, history, sociology, culture, tourism, archeology, art, literature, writing, journalism, census, linguistics, finance, economics, business, astronomy, physics, chemistry, library sciences, theology, anything else you can think of.
 
 
 
Curated data sets exist for many of these topics, although some cost money.  For curated data sets, free or otherwise, you just download them, although sometimes you have to do more work to get the data into a usable format.
 
 
 
A more advanced technique is to use a "web scraper" which masquerades as a browser and pulls data directly from the web.  One student was successful at doing this last Spring (she used a website dedicated to this effort).  Some websites have their own Application Programming Interfaces (API) which facilitate this process (examples: twitter, facebook, linked in).  These more advanced techniques may be difficult, and often involve computer programming.  I am a computer programmer, but I do not have a lot of experience with web scraping.  That said, I have a lot of books on the subject and would love to learn how.  If you are interested, let's try it together during office hours. 
 
 
 
Last Spring, I gave some students extensive help on their projects.  Help does not count against you, even extensive help.  Many other students did not ask for help, and that is OK, too.  Of course, some students chose to not even do a project, which was also fine.  (If you don't do a project, you won't get any extra credit, but it won't count against you, either).  Anything you choose is fine with me, but if you want help, ask early and come to office hours in the beginning, and all throughout the semester.  Things can get busy toward the end, both for you and for me.  Starting early will also give you more time, and you will need time to do these projects well.  You can also get help from other sources (family, friends, other professors, etc), but you must disclose the help you receive in writing in an "acknowledgements" section, when you turn it in.  That said, I encourage you to get help, if you need or want it, as long as you do not take credit for others' work.  Along these same lines, cite your sources.  You must also cite the source(s) of your data.
 
 
 
There are several planned milestones with due dates for completing the project:
 
* September 8 (Thursday): Getting Data Discussion and Extra Credit Preparation due.
 
* September 15 (Thursday): Extra Credit Project Proposal due.  Indicate participation in election day data collection and choose a partner.
 
* October 17 (Monday): Extra Credit Project Updates due.
 
* November 8 (Tuesday): Election Day - Extra Credit Data Collection happens on that day.
 
* November 10 (Thursday): Extra Credit Election Data Collection Data and Write-Up due
 
* December 8 (Thursday): (Last day of class, and) Extra Credit Projects due.
 
 
 
Data discussion, September 8:
 
: You can get extra credit (1 homework) for doing the data discussion preparation whether or not you choose to do a project.  Due Thursday, September 8. You must be present in class on Thursday, September 8 to get the points, and you must both participate in the discussion in class and turn in (that same day) a short written piece (one or a few paragraphs) describing your experience with the assignment and answering the questions below.  To complete the assignment, pick a topic, suggestions are listed above.  See what data you can find on the web concerning this topic.  Use Google, and start with the key words "data" and your topic.  Are the data you find free or do they cost money?  Can you download the data set or do you need a computer program (or hand copy) to pull them off the web?  If you can download the data, can you load it into StatCrunch or do the data require "munging" to be used by StatCrunch?  Then answer the following questions (you should know how by September 8):  What are the cases, and what are variables?  (If there are many variables, what are some of the ones that are of interest to you?)  Spend at least 45 minutes on this assignment.  If you finish with your first topic in less than 45 minutes, try another topic.  
 
  
The project proposal, September 15:
+
'''Projects:'''  In my experience, there is no better way to learn to code than to engage with a project that you feel passionate about.  For the first month of class, we are going to spend some class time finding and defining projects that meet the course objectives, and that inspire this kind of passion in us.  I anticipate that some of these projects may involve a lot of work for one person, which is why I am going to teach some of the tools that the open-source software community uses to facilitate collaboration.  Collaboration will be accomplished through the cloud based service GitHub, and a local program called Git.  As a pedagogical exercise, you will use these tools to collaborate on writing a children's story in R-Markdown (available within R)After the exercise you will not be compelled to collaborate, but the option will be there.  Do group projects give you nightmares?  The open source community has figured out paradigms for successful collaboration, although these paradigms are not widely used outside of the coding community because they are not especially simple to learn.  These tools make attribution for work very transparent.  This transparency will make it possible for students to get credit for contributions to several projects, not just one.  You will be graded on the body of your work, if you choose to divide your effort among more than one project.  If you start a project, it will be up to you whether you want to allow and invite others to contribute.  Project milestone will be a proposal (February 14), and project update (March 21), final submission (April 28, last day of class).  Each project will have a cloud based repository.  GitHub is best for collaboration, but they charge for private repositories (needed if you don't want the world to see your work) -- GitLab doesn't charge for private repositoriesEither way, you will make it possible for me to pull the most recent version of the project onto my computer for grading.  I will do this at a specified date and time -- when it is due.  For collaborative projects you will also be turning in a portfolio of your submissions which should be easy to generate.
: Turn in one or a few paragraphs describing what you would like to doIf you intend to participate in the election day data collection (whether or not you are doing a project) please indicate your intention on a separate sheet of paper, submitted together with your name and the name of your chosen partner (each pair turns in one sheet of paper).  If you haven't found a partner by class time on September 15, don't worrySingle participants will pair up during the September 15 class.
 
  
The project update: October 17 (Monday).
+
'''Reading Material:''' Class time will be used most effectively if you have read the relevant section of the book ahead of time.  Please be conscientious about the reading.  Usually it will only be one chapter, however for the second class (January 24), please read both chapters 1 and 2. Reading assignments will be announced during the previous class.
: Turn in several paragraphs describing what you have done so far and what problems you have run into.
 
  
Election Day Data collection: November 8 (Tuesday).
+
'''Homework:''' For homework we will use private repositories on GitLab, except for the children's story assignment. For the children's story, each student will start their own GitHub repository for their own story, and invite others to collaborate.  There will be work to do most classesYou will commit this work to the cloud whenever you want.  You can update the work as you progress.  At specified times and dates, I will pull your work from the cloud onto my computer to grade itThe specified time for the pull will be when it is due.  I won't see any changes you commit after the pull timeI expect most homework sets will come from the required text book, although there is some flexibility, based on class interest, especially vis-a-vis the projects students are engaged in.
: This is a project run by Mary Gray, another professor in StatisticsParticipation in the data collection is separate from the project.  You can get credit for one project, or the other project, or bothYour main project can involve the data being collected on election day (yours and everyone else's) but it doesn't have toSee Professor Gray's description of the whole project here [[Media:Election_Day_2016_Data_Collection.pdf|here]].
 
  
Data due: November 10 (Thursday).
+
'''Attendance:'''  You are expected to attend class unless there is a compelling reason why you cannot make it.  Attendance and participation is worth 10% of your grade.  Beyond the 10%, I believe that excellent attendance will be necessary to meet the objectives of this class.  However, I understand that there are times when you cannot make it to class for compelling reasons.  To accommodate the unavoidable, I will forgive occasional absences for everyone when I compute the final grades.  If you need to miss more than a few classes, please see the Dean of Students.   Exam day absences must be excused through the Dean of Students.  On other days please send an email to me when you can't make it to class, explaining why.  If your attendance and/or participation is not acceptable, you will receive an early warning from me through the registrar.  This is how you will know you are in danger of losing the credit in this category  If your attendance and/or participation continues to be poor, you might miss all 10% (i.e. get a zero in this category).
: Turn in the data you collected and a write up of your experience (one or a few paragraphs) to me (Carver) on November 10 (Thursday) or November 9 (Wednesday) and I'll pass the data along to Professor Gray.
 
  
Final projects due December 8 (Thursday): (Last day of class).
+
'''Midterm:''' The midterm project will be a coding exercise, completed in class on March 31You will have access to your books, notes, and Google, but you will not be able to interact with another live person.  Pull time will be the end of class on that day. Midterm examinations will be pulled from the private homework repository.
:There are various allowed formats for the final project write up (paper, PowerPoint Presentation, Data Dashboard, YouTube video, etc, described above)Whatever format you choose, your final projects must have an addendum titled "behind the scenes" which describes how you did the project and where you got your data, and must also include an acknowledgements section.  Additionally, optional sections may include "dead-ends" and "dreams for the future" for which you will get credit for things that did not work, or good ideas you had which you did not have time to implement. Creativity will be rewarded!
 
  
Grades will be awarded as percentage points added to your final score.  Typically, this will be up to 3 percentage points added to your final grade.  A perfect "3" will generally be a project which is a good start of something that looks promising for publication.  Fractional scores (e.g. 2.5) may also be awarded.  Some credit will be given for partially completed projects, but you must complete the milestones by the deadlines.
 
  
 
'''Tentative grading scheme:'''
 
'''Tentative grading scheme:'''
Line 129: Line 69:
 
|-
 
|-
 
| Attendance and Participation
 
| Attendance and Participation
| 15%
+
| 10%
 
|-
 
|-
 
| Homework
 
| Homework
| 10%
+
| 35%
 
|-
 
|-
| Exam 1
+
| Midterm
| 25%
+
| 20%
 
|-
 
|-
| Exam 2
+
| Project  
| 25%
+
| 35%
|-
 
| Final
 
| 25%
 
|-
 
| Extra Credit Data Discussion Preparation Write-up
 
| + 1 Homework
 
|-
 
| Extra Credit Project
 
| + 0-3%
 
|-
 
| Election Day Data Collection
 
| + 1.5%
 
 
|}
 
|}
  
'''Class Etiquette:''' Please give the class your full attention and refrain from talking during lectures, texting, surfing the web, and similar distractions.  If you need to attend to something urgently, it is OK to excuse yourself from the classroom.
+
'''Class Etiquette:''' Please give the class your full attention and refrain from talking, texting, surfing the web, and similar distractions.  If it is clear to other students that you are not paying attention, it will be harder for them to pay attention to meThis statement is true in general, but it is especially true if you are talking. Also, it can also be harder for me to give good lectures, when it is clear that not everyone is paying attentionLike you, your classmates are paying a lot of money to be hereHave some respect for your fellow students!  Otherwise you are negatively impacting their educational experience, which isn't fair to themIf you need to attend to something urgently, it is OK to excuse yourself from the classroom. Please be warned that if people are not following this request, I may reread this statement to the class.
 
 
Please participate in class by asking questions when you do not understand something.  Invariably other students benefit from these questions.  Please engage in discussions, and please engage with the class, generally.
 
 
 
'''Academic Integrity:'''  ''Cheating is not acceptable and will not be tolerated.''  Consider this:  in subtle ways, cheating to get a better grade on an exam can result in ''lowering'' the grades of some of your classmatesCertainly this is true when a specific curve is used to assign grades.  Even when I don't use curves explicitly, they can be implicit in decisions about writing and grading exams.  As required by the policy of American University, I will report '''all''' suspected cases of cheating to the Dean's office who will proceed to investigate and adjudicate the issues.  Cheating is giving or receiving unauthorized assistance on exams, from other students or other people, from notes, from books, or from the web. When inappropriate copying between students is caught, both parties may be culpable.
 
 
 
'''Homework, Attendance and Participation Policy:'''  Homework is worth 10% of your grade, with each assignment weighted equally (10 points maximum).  The due date for homework is technically the end of class, one week after it is assigned, however you should make every effort to complete each assignment as soon as possibleThat said, you can turn in homework anytime, up to Wednesday, December 7.  Homework turned in after the technical due date will receive a maximum of only half credit (5 points).  I plan to post deadlines on this website.
 
 
 
I like to give the solutions to homework problems at the same time I assign the problemsConscientious students, who wrestle with problems before looking at the answers, benefit from having instant feedback about their solutions, right, wrong, or incomplete.  Less conscientious students who use the answers to easily complete the assignments often do poorly on exams.  The responsibility for your education rests in your own hands.  Don't be one of the outliers who use shortcuts to avoid preparing for the exams.    Concerning homework, you are encouraged to work with your classmates, if you find that helpfulIn fact, you are encouraged to do whatever you find most helpful with the homework, but by turning in a solution to a problem, you pledge that you understand the solution, or that you talked to me in office hours or during or after class and made a good faith effort to understand how to do the problem.   If it looks like you got the full benefit from the assignment, and if you turn it in by the due date, I will award you a perfect 10 points.  I may mark you down if it seems that you have copied the answers without including any of the required calculations. You must include your work. 
 
  
Additionally, 15% of your grade is for class attendance and participation.  I understand that there are times when you cannot make it to class for compelling reasonsTo accommodate the unavoidable, I will forgive a fixed number of absences for everyone when I compute the final grades.   If you need to miss more than several classes, please see the Dean of Students.  I will only erase absences from my grade book with a request through the Dean of StudentsThat said, I do appreciate an email when you can't make it to class.  Absences on exam days must be excused through the Dean of Students.
+
Please participate in class by asking questions when you do not understand somethingInvariably other students benefit from these questions. Please engage in discussions, and please engage with the class, generally.  I find it easier to give good lectures when students are asking questions, and engaging with the material.
  
I plan to post grades to Blackboard, promptly.
+
'''Academic Integrity:'''  ''Cheating is not acceptable and will not be tolerated.''  Consider this:  in subtle ways, cheating to get a better grade on an exam can result in ''lowering'' the grades of some of your classmates.  Certainly this is true when a specific curve is used to assign grades.  Even when I don't use curves explicitly, they can be implicit in decisions about writing and grading exams.  As required by the policy of American University, I will report '''all''' suspected cases of cheating to the Dean's office who will proceed to investigate and adjudicate the issues.  Cheating is giving or receiving unauthorized assistance on exams, from other students or other people.  When inappropriate copying between students is caught, both parties may be culpable.  You can get help on homework from other students, but you must write up the work yourself, and the work must reflect your own understanding of the material.
  
 
'''Public Service Announcement:'''  A representative of AU's Students Against Sexual Violence (SASV) approached me and asked me to include on my syllabi a list of resources available for survivors of sexual assault and their friends.  While sexual violence is by no means the only challenge faced by students, I agree that this issue merits particular attention, so I am honoring her request by attaching the list she gave me:
 
'''Public Service Announcement:'''  A representative of AU's Students Against Sexual Violence (SASV) approached me and asked me to include on my syllabi a list of resources available for survivors of sexual assault and their friends.  While sexual violence is by no means the only challenge faced by students, I agree that this issue merits particular attention, so I am honoring her request by attaching the list she gave me:

Latest revision as of 01:47, 17 January 2017

Introduction to Statistical Computing and Modeling (STAT 370) Spring 2017 Section 001

Course Materials (click here)

Instructor: Sean Carver, Ph.D., Professorial Lecturer, American University.

Contact:

  • office location: 107 Gray Hall
  • email: carver@american.edu
  • office phone: 202-885-6629

Course Description (from department website): The basics of programming using the open source statistical program R. Data analysis, both numerical and qualitative, including graphical and formal inference. Applications include numerical methods, text mining, modeling, and simulation. Usually offered every spring. Prerequisite:

Prerequisite: MATH-221 and STAT-202 or STAT-203, or permission of instructor.

Text: The Book of R, by Tilman M. Davies, No Starch Press, 2016.

Optional Text: Analyzing Baseball Data with R, by Max Marchi and Jim Albert, CRC Press, 2013. This is a great book, if you like baseball, data science, or especially both. We are going to use it to learn how to simulate a baseball game with a Markov Chain. I may be able to legally provide the relevant chapter of the book, if you want to save money. Don't like baseball, or don't know the rules? Don't worry, neither do I, but this example is fantastic from a pedagogical perspective, and I think you will agree, regardless of your interest and knowledge in the sport.

Software: Please install the following software on the machines you intend to use for this class: R, RStudio, Git, XQuartz (Mac), GitBash (Windows). You are welcome to use a lab computer or your laptop during class.

Learning Outcomes: Students will be able to

  • Submit work as a dynamic document via GitHub, and learn some of its tools for collaboration.
  • Use R as a powerful calculator.
  • Write basic programs using control and data structures.
  • Import data from external sources.
  • Perform analyses using regression, text mining, and simulation.
  • Use SQL databases to retrieve data with specified features.

Office Hours: Students are strongly encouraged to come to office hours if they need or want help.

My office is Gray Hall, Room 107. Office hours are TENTATIVELY scheduled as follows: (may be adjusted throughout the semester)

  • Tuesday, Wednesday, Friday: 4:00 PM TO 6:00 PM.

NOTE: If you would like to come to office hours on a regular or irregular basis and you have a compelling reason why you cannot make it during the hours listed above, please send me an email. I cannot guarantee that I will be able to find a time that works (this semester will be a very busy one for me), but I will try.

Class times and locations:

  • Tuesday, Friday 11:20 AM TO 12:35 PM, ANDERSON B-13

Important Dates:

  • January 17 (Tuesday): First day of class.
  • January 20 (Friday): Inaugaration Day, no class.
  • January 24 (Tuesday): Initial Project Brainstorm (come with ideas).
  • February 14 (Tuesday): Project Proposals due.
  • March 12 - 19: Spring Break, no class.
  • March 21 (Tuesday): Project Updates due.
  • March 31 (Friday): Midterm Exam.
  • April 28 (Friday): Last day of classes and final projects due.
  • May 12 (Friday): Grades due to registrar, no final exam.

Projects: In my experience, there is no better way to learn to code than to engage with a project that you feel passionate about. For the first month of class, we are going to spend some class time finding and defining projects that meet the course objectives, and that inspire this kind of passion in us. I anticipate that some of these projects may involve a lot of work for one person, which is why I am going to teach some of the tools that the open-source software community uses to facilitate collaboration. Collaboration will be accomplished through the cloud based service GitHub, and a local program called Git. As a pedagogical exercise, you will use these tools to collaborate on writing a children's story in R-Markdown (available within R). After the exercise you will not be compelled to collaborate, but the option will be there. Do group projects give you nightmares? The open source community has figured out paradigms for successful collaboration, although these paradigms are not widely used outside of the coding community because they are not especially simple to learn. These tools make attribution for work very transparent. This transparency will make it possible for students to get credit for contributions to several projects, not just one. You will be graded on the body of your work, if you choose to divide your effort among more than one project. If you start a project, it will be up to you whether you want to allow and invite others to contribute. Project milestone will be a proposal (February 14), and project update (March 21), final submission (April 28, last day of class). Each project will have a cloud based repository. GitHub is best for collaboration, but they charge for private repositories (needed if you don't want the world to see your work) -- GitLab doesn't charge for private repositories. Either way, you will make it possible for me to pull the most recent version of the project onto my computer for grading. I will do this at a specified date and time -- when it is due. For collaborative projects you will also be turning in a portfolio of your submissions which should be easy to generate.

Reading Material: Class time will be used most effectively if you have read the relevant section of the book ahead of time. Please be conscientious about the reading. Usually it will only be one chapter, however for the second class (January 24), please read both chapters 1 and 2. Reading assignments will be announced during the previous class.

Homework: For homework we will use private repositories on GitLab, except for the children's story assignment. For the children's story, each student will start their own GitHub repository for their own story, and invite others to collaborate. There will be work to do most classes. You will commit this work to the cloud whenever you want. You can update the work as you progress. At specified times and dates, I will pull your work from the cloud onto my computer to grade it. The specified time for the pull will be when it is due. I won't see any changes you commit after the pull time. I expect most homework sets will come from the required text book, although there is some flexibility, based on class interest, especially vis-a-vis the projects students are engaged in.

Attendance: You are expected to attend class unless there is a compelling reason why you cannot make it. Attendance and participation is worth 10% of your grade. Beyond the 10%, I believe that excellent attendance will be necessary to meet the objectives of this class. However, I understand that there are times when you cannot make it to class for compelling reasons. To accommodate the unavoidable, I will forgive occasional absences for everyone when I compute the final grades. If you need to miss more than a few classes, please see the Dean of Students. Exam day absences must be excused through the Dean of Students. On other days please send an email to me when you can't make it to class, explaining why. If your attendance and/or participation is not acceptable, you will receive an early warning from me through the registrar. This is how you will know you are in danger of losing the credit in this category If your attendance and/or participation continues to be poor, you might miss all 10% (i.e. get a zero in this category).

Midterm: The midterm project will be a coding exercise, completed in class on March 31. You will have access to your books, notes, and Google, but you will not be able to interact with another live person. Pull time will be the end of class on that day. Midterm examinations will be pulled from the private homework repository.


Tentative grading scheme:

ITEM PERCENT
Attendance and Participation 10%
Homework 35%
Midterm 20%
Project 35%

Class Etiquette: Please give the class your full attention and refrain from talking, texting, surfing the web, and similar distractions. If it is clear to other students that you are not paying attention, it will be harder for them to pay attention to me. This statement is true in general, but it is especially true if you are talking. Also, it can also be harder for me to give good lectures, when it is clear that not everyone is paying attention. Like you, your classmates are paying a lot of money to be here. Have some respect for your fellow students! Otherwise you are negatively impacting their educational experience, which isn't fair to them. If you need to attend to something urgently, it is OK to excuse yourself from the classroom. Please be warned that if people are not following this request, I may reread this statement to the class.

Please participate in class by asking questions when you do not understand something. Invariably other students benefit from these questions. Please engage in discussions, and please engage with the class, generally. I find it easier to give good lectures when students are asking questions, and engaging with the material.

Academic Integrity: Cheating is not acceptable and will not be tolerated. Consider this: in subtle ways, cheating to get a better grade on an exam can result in lowering the grades of some of your classmates. Certainly this is true when a specific curve is used to assign grades. Even when I don't use curves explicitly, they can be implicit in decisions about writing and grading exams. As required by the policy of American University, I will report all suspected cases of cheating to the Dean's office who will proceed to investigate and adjudicate the issues. Cheating is giving or receiving unauthorized assistance on exams, from other students or other people. When inappropriate copying between students is caught, both parties may be culpable. You can get help on homework from other students, but you must write up the work yourself, and the work must reflect your own understanding of the material.

Public Service Announcement: A representative of AU's Students Against Sexual Violence (SASV) approached me and asked me to include on my syllabi a list of resources available for survivors of sexual assault and their friends. While sexual violence is by no means the only challenge faced by students, I agree that this issue merits particular attention, so I am honoring her request by attaching the list she gave me:

Sexual Assault Resources

  • It’s never the survivor’s fault. There are many people you can talk to if you or someone you care about has been sexually assaulted:
  • AU's Sexual Assault Prevention Coordinator Daniel Rappaport (rappapor@american.edu)
  • AU's Coordinator for Victim Advocacy Sara Yzaguirre (sarayza@american.edu)
  • DC SANE Program (Sexual Assault Nurse Examiner) 1-800-641-4028
  • The only hospital in DC area that gives Physical Evidence Recover Kits (rape kits) is Medstar Washington Hospital
  • DC Rape Crisis Center: 202-333-7273
  • Students found responsible for sexual misconduct can be sanctioned with penalties that include suspension or expulsion from American University, and they may be subject to criminal charges
  • If you want to submit a formal complaint against someone who has sexually assaulted you, harassed you, or discriminated against you based on your gender identity or sexual orientation, you can do so online at http://www.american.edu/ocl/dos/, or contact the Dean of Students at dos@american.edu or 202-885-3300. These are Title IX violations, and universities are legally required to prohibit these actions.
  • Resources on campus that are required to keep what you tell them confidential are Daniel Rappaport, Sara Yzaguirre, ordained chaplains in Kay, and counselors at the counseling center. (OASIS may also belong here but it didn't exist when this list was created.)