# Stat 202 Discussion

From Sean_Carver

## Contents

## Broad Objectives

### Single Variable Descriptive Statistics

- Understand the traditional way of structuring data (datasets, tables, cases, variables, values); as well as be aware of the existence of unstructured data.
- Be able to recognize the types of variables in a data set (quantitative, identifier, categorical (ordinal, nominal, binary)).
- Understand the distinction between a quantitative variable and a more general numeric variable.
- Understand that different analyses and displays are appropriate for different types of variables and that the correct judgment of what analyses and displays are appropriate determines the variable's type---sometimes more than one type can be justified.
- Understand the concept of a distribution of a variable (conveying the possible values of the variable, and (equivalently) their frequency, relative frequency, or, (for quantitative variables) density).
- Describe the distribution of a single quantitative variable (histogram, box plot, QQ plot, shape, outliers, center, spread, modes, symmetry, skewness, normal/bell shaped, mean, median, standard deviation, Q1, Q3, IQR, percentiles).
- Describe the outliers in single quantitative variables (tails, 1.5 IQR rule), and know which measures are resistant to outliers (resistant) and which are sensitive to outliers (not resistant), and what that means.
- Describe the distribution of a single categorical variable (bar plot, pie chart, frequency table).
- Understand the concept of and apply transformations of a variable (e.g. z-score, change of units, log) and know the special properties of a linear transformation.
- Understand what it means for data (a quantitative variable) to fit a Normal model with parameters (revealed by histogram, QQ-Plot, including typical noise) and know how to make predictions based on that assumption.
- Understand descriptions of Normal and other models with density curves.

### Paired Variables Descriptive Statistics

- Describe the relationship between a categorical variable and a quantitative variable (with a series of histograms, or side-by-side box plots).
- Describe the relationship between pairs of quantitative variables (with scatter plots, and numerical measures below).
- Know what it means for a pair of quantitative variables to fit a linear model with scatter.
- Know that correlation and regression are only appropriate for pairs of quantitative variables that fit a linear model with scatter.
- Describe the linear relationship between quantitative variables with correlation and regression, and know the properties of these analyses.
- Know the difference between outliers for a relationship between variables, and outliers for individual variables, understand the influence of influential data points.
- Understand the concept of predicting the value of the response variable with the value of the explanatory and the regression line.
- Understand the concept of, and how to compute, residuals of a regression analysis; know how to analyze residuals for the appropriateness of the linear model.
- Know the significance of R^2 in assessing the fraction of variance explained by linear regression between two quantitative variables.
- Know that neither correlation nor association implies causation.

### Design of Experiments

- Understand the distinctions and differences between samples, census, and population (and related concepts of sample size, statistics and parameters).
- Understand why it is important to sample and assign groups randomly, its relationship of this concept to a representative sample.
- Understand common sample designs (simple random sample, stratified sampling, cluster and multistage sampling, systematic sampling).
- Understand bias, and the common sources of bias in sampling (voluntary response sampling, convenience sampling, undercoverage nonresponse bias, response bias).
- Understand the concept of a pilot study, and why it is useful.

### Probability

- Understand and use set notation (element of, subset of, union, intersection, complement of, null set, disjoint sets).
- Understand and apply the terminology of probability (random phenomenon, sample space, outcomes, events, independent sets) and the rules/"axioms" of probability.
- Understand the mathematical concept of a function to be able to apply it to the definition of a random variable.
- Know and understand the definition of a random variable as a function mapping a sample space of a random phenomenon to real numbers, and be able to give important examples of random variables (coin toss, die toss, sampling, binomial).
- Know how to compute the mean and standard deviation of a discrete random variable from its probability table.
- Know how to apply formulas to compute means, variances, and standard deviations of sums, differences, and linear transformations of independent and correlated random variables.
- Know the assumptions behind the use of a Binomial model, and recognize situations when this model is applicable.
- Use the Binomial calculator to make predictions in StatCrunch; know how to use other StatCrunch calculators (geometric, Normal, Uniform, Discrete Uniform) and when they apply.
- Apply and recognize formulas for mean and standard deviation of binomial random variables.
- Understand the concept of a random number table or a pseudorandom number generator and how it applies to simulation; understand the concept of a pseudorandom seed.

### Sampling Distributions

- Understand that, in the context of sampling, the values of parameters do not depend on the sample, whereas the values of statistics do.
- Understand that, in the context of sampling, the sample space is the set of all possible samples of a certain size (n).
- Understand the definition of sampling distribution for a statistic: what values it takes, over the whole sample space, and how often it takes those values.
- Use the binomial distribution for making predictions about the sampling distribution of counts and proportions; know the randomization condition and 10% condition for the appropriateness of this endeavor.
- Know the success/failure condition for approximating a binomial distribution with a Normal one.
- Use the Normal distribution for making predictions about the sampling distribution for proportions; know the conditions for the appropriateness of this endeavor (randomization condition, 10% condition, success/failure condition).
- Use the Normal distribution for making predictions about the sampling distribution for means; know the randomization condition and the sample size condition for the appropriateness of this endeavor.
- Apply and recognize formula for the theoretical means and standard deviations of the sampling distributions of means and proportions.

### Point Estimation

- Understand the concepts of estimating a parameter from a sample, and of a biased and unbiased estimator.
- Know that the sample proportion is an unbiased estimator of the population proportion and that the sample mean is an unbiased estimate of the population mean.
- Know the sample standard deviation as an example of a biased estimator.
- Understand the difference between the
*accuracy*of an estimator (related to the mean of its sampling distribution) and its*precision*(related to the standard deviation of its sampling distribution). - Resolve the confusion created by the fact at the same statistics (mean and standard deviation) are used to define estimators as well as study their sampling distributions: the mean of the sample mean, the standard deviation of the sample mean, etc.

### One-Proportion z-Confidence Intervals

- Understand the difference between standard deviation of a sampling distribution and standard error of a sample distribution.
- Apply and recognize formulas for the standard error of the sample proportion.
- Be able to compute a one-proportion z-interval.
- Understand the conditions for using a one-proportion z-interval for sampling from a finite population: randomization condition, 10% condition, success/failure condition.
- Understand the conditions for using a one-proportion z-interval for sampling with replacement (e.g. simulation) or for drawing from an infinite population: independence condition, and success/failure condition.
- Appropriately interpret a confidence interval for a parameter as a plausible range of values for the parameter; if many samples are taken and from each a level-C confidence interval is derived the assertion is that C will be the approximate proportion of intervals containing the parameter.
- Understand why the confidence level is not the probability that a given confidence interval contains the parameter.
- Understand the concepts of margin of error, the critical value, and standard error, and how they interrelate; compute each of these quantities when the others are known.
- Compute the critical values from the confidence level of a one-proportion z-interval, and vice-versa.

### Testing Hypotheses About One-Sample Proportions

- Understand the concept of a null hypothesis as a statement of no effect, nothing difference, nothing has changed, etc.
- Understand the framework of a hypothesis test as a decision to reject or "fail to reject" a null hypothesis and if rejected, an alternative hypothesis is accepted.
- Understand the concept of a P-value as the probability, assuming the null hypothesis is true of seeing data as or more extreme than the data collected.
- Understand the P-values can range from 0 to 1.
- Interpret a low P-Value as evidence against the null hypothesis, but understand why "the probability that the null hypothesis is correct" is an incorrect interpretation of the P-value (either the Null hypothesis is true or it isn't), although this incorrect interpretation does give the right intuition.
- Understand the difference between a one-sided alternative and a two-sided alternative; know when each is appropriate.

[More coming ... ]