# De Veaux Map

From Sean_Carver

## Contents

- 1 Part I: Exploring and Understanding Data
- 2 Part II: Exploring Relationships Between Variables
- 3 Part III: Gathering Data
- 4 Part IV: Randomness and Probability
- 5 Part V: From the Data at Hand to the World at Large
- 6 Part VI: Accessing Associations Between Variables
- 7 Part VII: Inference When Variables are Related

## Part I: Exploring and Understanding Data

### Chapter 1: Exploring and Understanding Data

- 1.1: What is Statistics?
- 1.2: Data
- 1.3: Variables

- Types of variables: Quantitative, identifier, ordinal, categorical (categorical & nominal considered synonyms)

### Chapter 2: Displaying and Describing Categorical Data

- 2.1: Summarizing and Displaying a Single Categorical Variable

- The area principle
- Frequency tables
- Bar charts
- Pie charts

- 2.2: Exploring the Relationship Between Two Categorical Variables

- Contingency tables
- Conditional distributions
- Independence
- Plotting conditional distributions (with pie charts, bar charts and segmented bar charts)

### Chapter 3: Displaying and Displaying Quantitative Data

- 3.1: Displaying Quantitative Variables

- Histograms
- Stem and leaf displays
- Dotplots

- 3.2: Shape

- Unimodal, bimodal or multimodal
- Symmetric or skewed
- Outliers

- 3.3: Center

- Median

- 3.4: Spread

- Range, min, max
- Interquartile range, Q1, Q3

- 3.5: Boxplots and 5-Number Summaries
- 3.6: The Center of a Symmetric Distribution: The Mean

- Mean or Median?

- 3.7: The Spread of a Symmetric Distribution: The Standard Deviation

- Formulas for variance and standard deviation
- Thinking about variation

- 3.8: Summary---What to
*Tell*About a Quantitative Variable

### Chapter 4: Understanding and Comparing Distributions

- 4.1: Comparing Groups with Histograms
- 4.2: Comparing Groups with Boxplots
- 4.3: Outliers
- 4.4: Timeplots
- 4.5: Re-Expressing Data: A First Look

- ...To improve symmetry
- ...To equalize spread across groups

### Chapter 5: The Standard Deviation as a Ruler and the Normal Model

- 5.1: Standardizing with z-Scores
- 5.2: Shifting and Scaling

- Shifting to adjust the center
- Rescaling to adjust the scale
- Shifting, scaling and z-Scores

- 5.3: Normal Models

- The "nearly normal condition"
- The 68-95-99.7 Rule
- Working with pictures of the Normal curve
- Inflection points at mean +/- one standard deviation
- Interpretation of area under Normal curve as proportion of observations in interval (implied by pictures and exposition)

- 5.4: Finding Normal Percentiles

- Normal percentiles
- Other models
- From percentiles to scores: z in reverse

- 5.5: Normal Probability Plots

## Part II: Exploring Relationships Between Variables

### Chapter 6: Scatterplots, Association, and Correlation

- 6.1: Scatterplots

- Direction (negative or positive)
- Form
- Strength
- Outliers
- Explanatory and response variables

- 6.2: Correlation

- Formula
- Assumptions and conditions for correlation, including...
- "Quantitative variables condition,"
- "Straight enough condition,"
- "No outliers condition"

- Correlation Properties
- How strong is strong?
- Measuring trend: Kendall's tau
- Nonparametric association: Spearman's Rho

- 6.3: Warning: Correlation Does Not Equal Causation
- 6.4: Straightening Scatterplots

### Chapter 7: Linear Regression

- 7.1 Least Squares: The Line of "Best Fit"

- The linear model
- Predicted values and residuals
- The least squares line and the sense in which it is the best fit

- 7.2 The Linear Model

- Using the linear model to make predictions

- 7.3 Finding the Least Squares Line

- Formulas for slope and intercept

- 7.4 Regression to the Mean

- Etiology of the word "Regression"
- Math Box: Derivation of regression formula

- 7.5 Examining the Residuals

- Formula for residuals
- Appropriate (lack of) form of Residuals versus x-Values plot
- The residual standard deviation

- 7.6 R^2---The Variation Accounted For by the Model

- How big should R^2 be?
- Predicting in the other direction---A tale of two regressions

- 7.7 Regression Assumptions and Conditions

- "Quantitative variable" condition
- "Straight enough" condition
- "Outlier" condition
- "Does the plot thicken?" condition
- Judging the conditions with the residuals-versus-predicted-values plot

### Chapter 8: Regression Wisdom

- 8.1: Examining Residuals

- Getting the "bends": When the residuals aren't straight
- Sifting residuals for groups
- Subsetting with a categorical variable

- 8.2: Extrapolation: Reaching Beyond the Data

- Warning with extrapolation
- Warning with predicting what will happen to cases in the regression if they were changed

- 8.3: Outliers, Leverage, and Influence
- 8.4: Lurking Variables and Causation
- 8.5: Working with Summary Values

### Chapter 9: Re-expressing Data: Get It Straight!

- 9.1: Straightening Scatterplots -- The Four Goals

- Goal 1: Make the distribution of a variable more symmetric.
- Goal 2: Make the spread of several groups more alike, even if their centers differ
- Goal 3: Make the form of a scatterplot more nearly linear
- Goal 4: Make the scatter in a scatterplot spread out evenly rather than thinkening at one end
- Recognizing when a re-expression can help

- 9.2: Finding a Good Re-Expression

- Plan A: The ladder of powers
- Re-expressing to straighten a scatterplot
- Comparing re-expressions
- Plan B: Attack of the logarithms
- Multiple benefits to re-expressions
- Why not just fit a curve?

## Part III: Gathering Data

### Chapter 10: Understanding Randomness

- 10.1: What Is Randomness?

- Meaning of the word "random"
- Discussion of the process of generating random numbers

- 10.2: Simulating by Hand

- Basic terminology: Simulations, trials, components, response variable

### Chapter 11: Sample Surveys

- 11.1: The Three Big Ideas of Sampling

- Idea 1: Examine a part of the whole
- Population versus sample
- Bias

- Idea 2: Randomize
- Idea 3: It's the sample size
- Sample size
- Does a census make sense

- Idea 1: Examine a part of the whole

- 11.2: Populations and Parameters
- 11.3: Simple Random Samples

- Sampling frame
- Sampling variability

- 11.4: Other Sampling Designs

- Stratified sampling
- Cluster sampling
- Multistage sampling
- Systematic sampling

- 11.5: From the Population to the Sample: You Can't Always Get What You Want
- 11.6: The Valid Survey

- Know what you want to know
- Tune your instrument
- Ask specific rather than general questions
- Ask for quantitative results when possible
- Be careful in phrasing questions
- Pilot studies

- 11.7: Common Sampling Mistakes or How to Sample Badly

- Mistake 1: Sample volunteers
- Mistake 2: Sample convieniently
- Mistake 3: Use a bad sampling frame
- Mistake 4: Undercoverage
- Nonresponse bias
- Response bias
- How to think about biases
- Look for biases in any survey you encounter
- Spend your time and resources reducing biases
- Think about the members of the population who could have been excluded from your study
- Always report your sampling methods in detail

### Chapter 12: Experiments and Observational Studies

- 12.1: Observational Studies

- Observational studies
- Retrospective studies
- Prospective studies

- 12.2: Randomized, Comparative Experiments

- Random assignment of subjects to treatments
- Explanatory variables, factors and levels
- Response variables

- 12.3: The Four Principles of Experimental Design

- Principle 1: Control
- Principle 2: Randomize
- Principle 3: Replicate
- Principle 4: Block
- Diagramming experiments
- Statistically significant differences between groups
- Contrasting experiments and samples

- 12.4: Control Treatments

- Blinding (single and double)
- Placebos

- 12.5: Blocking

- Matched participants

- 12.6: Confounding

- Lurking or confounding

## Part IV: Randomness and Probability

### Chapter 13: From Randomness to Probability

- 13.1: Random Phenomena

- "A
**random phenomenon**is a situation in which we know what outcomes can possibly occur, but we don't know which particular outcome will happen" - Trials
- Outcomes
- Sample space
- Events
- The law of large numbers
- Empirical probability
- The nonexistent law of averages

- "A

- 13.2: Modeling Probability

- Theoretical probability
- Personal probability

- 13.3: Formal Probability

- The five rules of probability
- Rule 1: A probability must be a number between 0 and 1
- Rule 2: Probability assignment rule: The probability of a the sample space must be 1
- Rule 3: The complement rule
- Rule 4: The addition rule
- Rule 5: The multiplication rule

- The five rules of probability

### Chapter 14: Probability Rules!

- 14.1: The General Addition Rule
- 14.2: Conditional Probability and the General Multiplication Rule
- 14.3: Independence
- 14.4: Picturing Probability: Tables, Venn Diagrams, and Trees
- 14.5: Reversing the Conditioning and Bayes' Rule

### Chapter 15: Random Variables

- 15.1: Center: The Expected Value

- Definition of a random variable
- Discrete random variables (can "list" all the outcomes)
- Continuous random variables (not discrete)
- Probability models for discrete random variables
- Computation of expected value for discrete random variables

- 15.2: Spread: The Standard Deviation

- Computation of variance and standard deviation for discrete random variables

- 15.3: Shifting and Combining Random Variables

- E(X +/- c)
- Var(X +/- c)
- E(aX)
- Var(aX)
- E(X +/- Y)
- Var(X +/- Y), when X and Y are independent

- [Unnumbered section, labeled optional]: Correlation and Covariance

- Covariance of two random variables
- Var(X +/- Y), when X and Y covary
- Correlation of two random variables

- 15.4: Continuous Random Variables

- The Normal random variable as an example of a continuous random variable
- Caption to Figure 15.1: Interpretation of area under Normal curve as probability of finding an observation in the interval.
- How can every value have a probability 0?
- Sums of independent Normal random variables are Normal.

### Chapter 16: Probability Models

- 16.1: Bernoulli Trials
- 16.2: The Geometric Model

- Independence
- The 10% condition

- 16.3: The Binomial Model

- Binomial probabilities and the binomial model
- Binomial coefficients

- 16.4: Approximating the Binomial Model with a Normal Model

- The success/failure condition

- 16.5: The Continuity Correction
- 16.6: The Poisson Model
- 16.7: Other Continuous Random Variables: The Uniform and the Exponential

- The uniform distribution
- The exponential model

## Part V: From the Data at Hand to the World at Large

### Chapter 17: Sampling Distribution Models

- 17.1: Sampling Distribution of a Proportion

- Often, the Normal model well fits the sampling distribution for proportion
- Which Normal? Mean/standard deviation for Normal approximation to the sampling distribution for proportions
- Sampling variability

- 17.2: When Does the Normal Model Work Well? Assumptions and Conditions (for proportions)

- The independence assumption
- The randomization condition
- The 10% condition
- The success/failure condition

- 17.3: The Sampling Distributions of Other Statistics

- Simulating the sampling distributions of other statistics
- Medians
- Variances
- Minimums

- Simulating the sampling distribution of a mean

- Simulating the sampling distributions of other statistics

- 17.4: The Central Limit Theorem: The Fundamental Theorem of Statistics

- Statement of theorem
- Assumptions and conditions
- But which Normal: Mean and standard deviation for sampling distributions for means

- 17.5: Sampling Distributions: A Summary

### Chapter 18: Confidence Intervals for Proportions

- 18.1: A Confidence Interval

- The standard error
- What a confidence interval says about a parameter

- 18.2: Interpreting Confidence Intervals: What Does 95% Confidence Really Mean
- 18.3: Margin of Error: Certainty vs. Precision

- Margin of error
- How the margin of error depends upon the confidence level
- Critical values

- 18.4: Assumptions and Conditions

- Independence assumption
- Independence condition
- Randomization condition
- 10% condition

- Sample size assumption
- Success/failure condition

- Independence assumption

### Chapter 19: Testing Hypotheses About Proportions

- 19.1: Hypotheses

- The null hypothesis
- The alternative hypothesis
- A trial (criminal justice) as a hypothesis test

- 19.2: P-Values

- Definition of P-value
- What to do with an "innocent" defendant (verdict:
*not guilty*)

- 19.3: The Reasoning of Hypothesis Testing

- 1. Hypotheses (pose hypotheses)
- 2. Model (verify problem satisfies conditions)
- 3. Mechanics (perform calculations)
- 4. Conclusion (interpret results)

- 19.4: Alternative Alternatives

- Two-sided alternative
- One-sided alternative

- 19.5: P-Values and Decisions: What to Tell About a Hypothesis Test

- Discussion of when a p-value is small enough (no threshold yet)

### Chapter 20: Inference About Means

- 20.1: Getting Started: The Central Limit Theorem (Again)

- For means, population standard deviation is required, sample standard deviation is all we have

- 20.2: Gosset's t

- t-Distribution versus Normal distribution
- Degrees of freedom
- What did Gosset see?
- A confidence interval for means
- A practical sampling distribution model for means
- One-sample t-interval for the mean
- Assumptions and Condition
- Independence assumption (randomization condition)
- Normal population assumption (nearly normal condition)
- Relationship to sample size

- Using Table T to find t-Values

- 20.3: Interpreting Confidence Intervals
- 20.4: A Hypothesis Test for the Mean

- One-sample t-test for the mean
- Intervals and tests (relationship)
- The special case of proportions (relationship above differs)

- 20.5: Choosing the Sample Size

### Chapter 21: More About Tests and Intervals

- 21.1: Choosing Hypotheses
- 21.2: How to Think About P-Values

- The P-value is
*not*the probability that the null hypothesis is true - What to do with a small P-value
- A small p-value does not imply a large effect

- What to do with a high P-value
- A big p-value does not prove the null hypothesis

- The P-value is

- 21.3: Alpha Levels

- Alpha levels and statistical significance
- Where did the value 0.05 come from?
- Practical vs. statistical significance

- 21.4: Critical Values for Hypothesis Tests

- Table T
- A confidence interval for small samples
- Confidence intervals and hypothesis tests

- 21.5: Errors

- Type I errors
- Type II errors
- Probabilities defined as alpha and beta
- Power
- Effect size
- Pictures of errors
- Reducing both type I and type II errors

## Part VI: Accessing Associations Between Variables

### Chapter 22: Comparing Groups

- 22.1: The Standard Deviation of a Difference

- The standard deviation of the difference between two proportions

- 22.2: Assumptions and Conditions for Comparing Proportions

- Independence
- Independence assumption
- Randomization condition
- The 10% condition
- Independence groups assumptions

- Sample Size
- Success/failure condition for both groups

- Independence

- 22.3: A Confidence Interval for the Difference Between Two Proportions

- The sampling distribution model for a difference between two independent proportions
- A two-proportion z-interval
- Two-proportion z-test

- 22.4: The Two Sample z-Test: Testing for the Difference Between Proportions

- Pooling for tests of equal proportions

- 22.5: A Confidence Interval for the Difference Between Two Means

- The standard error for the difference between two means
- Two-sample t-interval
- Degrees of freedom and the two sample t-distribution
- Assumptions and conditions
- Independence
- Normal population (nearly normal condition, sample size)

- A note about independent groups

- 22.6: The Two-Sample t-Test: Testing for the Difference Between to Means
- [Unnumbered section, labeled optional]: Tukey's Quick Test
- [Unnumbered section, labeled optional]: A Rank Sum Test
- 22.7: The Pooled t-Test: Everyone into the Pool?

- Details of the pooled t-test
- Equal variance assumption (similar spreads condition)
- Pooled t-test and confidence interval for means
- Is the pool all wet (when to use a pooled t-test)
- Pooling (discussion and in more general contexts)

### Chapter 23: Paired Samples and Blocks

- 23.1: Paired Data
- 23.2: Assumptions and Conditions

- Paired data condition
- Independence assumption (differences independent)
- Normal population assumption
- Nearly normal condition
- Sample size

- 23.3: Confidence Intervals for Matched Pairs

- Paired t-interval
- Effect size

- 23.4: Blocking

### Chapter 24: Comparing Counts

- 24.1: Goodness-of-Fit Tests
- 24.2: Chi-Square Test of Homogeneity
- 24.3: Examining the Residuals
- 24.4: Chi-Square Tests of Independence