Difference between revisions of "De Veaux Map"
From Sean_Carver
(→Chapter 4: Understanding and Comparing Distributions) |
(→Chapter 6: Scatterplots, Association, and Correlation) |
||
(77 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | == Chapter 1: Exploring and Understanding Data == | + | == Part I: Exploring and Understanding Data == |
+ | |||
+ | === Chapter 1: Exploring and Understanding Data === | ||
* 1.1: What is Statistics? | * 1.1: What is Statistics? | ||
* 1.2: Data | * 1.2: Data | ||
* 1.3: Variables | * 1.3: Variables | ||
− | :::Types of | + | :::Types of variables: Quantitative, identifier, ordinal, categorical (categorical & nominal considered synonyms) |
− | == Chapter 2: Displaying and Describing Categorical Data == | + | === Chapter 2: Displaying and Describing Categorical Data === |
* 2.1: Summarizing and Displaying a Single Categorical Variable | * 2.1: Summarizing and Displaying a Single Categorical Variable | ||
Line 19: | Line 21: | ||
:::Plotting conditional distributions (with pie charts, bar charts and segmented bar charts) | :::Plotting conditional distributions (with pie charts, bar charts and segmented bar charts) | ||
− | == Chapter 3: Displaying and Displaying Quantitative Data == | + | === Chapter 3: Displaying and Displaying Quantitative Data === |
* 3.1: Displaying Quantitative Variables | * 3.1: Displaying Quantitative Variables | ||
Line 38: | Line 40: | ||
:::Mean or Median? | :::Mean or Median? | ||
* 3.7: The Spread of a Symmetric Distribution: The Standard Deviation | * 3.7: The Spread of a Symmetric Distribution: The Standard Deviation | ||
+ | :::Formulas for variance and standard deviation | ||
+ | :::Thinking about variation | ||
* 3.8: Summary---What to ''Tell'' About a Quantitative Variable | * 3.8: Summary---What to ''Tell'' About a Quantitative Variable | ||
− | == Chapter 4: Understanding and Comparing Distributions == | + | === Chapter 4: Understanding and Comparing Distributions === |
* 4.1: Comparing Groups with Histograms | * 4.1: Comparing Groups with Histograms | ||
Line 49: | Line 53: | ||
:::...To improve symmetry | :::...To improve symmetry | ||
:::...To equalize spread across groups | :::...To equalize spread across groups | ||
+ | |||
+ | === Chapter 5: The Standard Deviation as a Ruler and the Normal Model === | ||
+ | |||
+ | * 5.1: Standardizing with z-Scores | ||
+ | * 5.2: Shifting and Scaling | ||
+ | :::Shifting to adjust the center | ||
+ | :::Rescaling to adjust the scale | ||
+ | :::Shifting, scaling and z-Scores | ||
+ | * 5.3: Normal Models | ||
+ | :::The "nearly normal condition" | ||
+ | :::The 68-95-99.7 Rule | ||
+ | :::Working with pictures of the Normal curve | ||
+ | :::Inflection points at mean +/- one standard deviation | ||
+ | :::Interpretation of area under Normal curve as proportion of observations in interval (implied by pictures and exposition) | ||
+ | * 5.4: Finding Normal Percentiles | ||
+ | :::Normal percentiles | ||
+ | :::Other models | ||
+ | :::From percentiles to scores: z in reverse | ||
+ | * 5.5: Normal Probability Plots | ||
+ | |||
+ | == Part II: Exploring Relationships Between Variables == | ||
+ | |||
+ | === Chapter 6: Scatterplots, Association, and Correlation === | ||
+ | |||
+ | * 6.1: Scatterplots | ||
+ | ::: Direction (negative or positive) | ||
+ | ::: Form | ||
+ | ::: Strength | ||
+ | ::: Outliers | ||
+ | ::: Explanatory and response variables | ||
+ | * 6.2: Correlation | ||
+ | ::: Formula | ||
+ | ::: Assumptions and conditions for correlation, including... | ||
+ | ::::"Quantitative variables condition," | ||
+ | ::::"Straight enough condition," | ||
+ | ::::"No outliers condition" | ||
+ | :::Correlation Properties | ||
+ | :::How strong is strong? | ||
+ | :::Measuring trend: Kendall's tau | ||
+ | :::Nonparametric association: Spearman's Rho | ||
+ | * 6.3: Warning: Correlation Does Not Equal Causation | ||
+ | * 6.4: Straightening Scatterplots | ||
+ | |||
+ | === Chapter 7: Linear Regression === | ||
+ | |||
+ | * 7.1 Least Squares: The Line of "Best Fit" | ||
+ | :::The linear model | ||
+ | :::Predicted values and residuals | ||
+ | :::The least squares line and the sense in which it is the best fit | ||
+ | * 7.2 The Linear Model | ||
+ | :::Using the linear model to make predictions | ||
+ | * 7.3 Finding the Least Squares Line | ||
+ | :::Formulas for slope and intercept | ||
+ | * 7.4 Regression to the Mean | ||
+ | :::Etiology of the word "Regression" | ||
+ | :::Math Box: Derivation of regression formula | ||
+ | * 7.5 Examining the Residuals | ||
+ | :::Formula for residuals | ||
+ | :::Appropriate (lack of) form of Residuals versus x-Values plot | ||
+ | :::The residual standard deviation | ||
+ | * 7.6 R^2---The Variation Accounted For by the Model | ||
+ | :::How big should R^2 be? | ||
+ | :::Predicting in the other direction---A tale of two regressions | ||
+ | * 7.7 Regression Assumptions and Conditions | ||
+ | :::"Quantitative variable" condition | ||
+ | :::"Straight enough" condition | ||
+ | :::"Outlier" condition | ||
+ | :::"Does the plot thicken?" condition | ||
+ | :::Judging the conditions with the residuals-versus-predicted-values plot | ||
+ | |||
+ | === Chapter 8: Regression Wisdom === | ||
+ | |||
+ | * 8.1: Examining Residuals | ||
+ | :::Getting the "bends": When the residuals aren't straight | ||
+ | :::Sifting residuals for groups | ||
+ | :::Subsetting with a categorical variable | ||
+ | * 8.2: Extrapolation: Reaching Beyond the Data | ||
+ | :::Warning with extrapolation | ||
+ | :::Warning with predicting what will happen to cases in the regression if they were changed | ||
+ | * 8.3: Outliers, Leverage, and Influence | ||
+ | * 8.4: Lurking Variables and Causation | ||
+ | * 8.5: Working with Summary Values | ||
+ | |||
+ | === Chapter 9: Re-expressing Data: Get It Straight! === | ||
+ | |||
+ | * 9.1: Straightening Scatterplots -- The Four Goals | ||
+ | :::Goal 1: Make the distribution of a variable more symmetric. | ||
+ | :::Goal 2: Make the spread of several groups more alike, even if their centers differ | ||
+ | :::Goal 3: Make the form of a scatterplot more nearly linear | ||
+ | :::Goal 4: Make the scatter in a scatterplot spread out evenly rather than thinkening at one end | ||
+ | :::Recognizing when a re-expression can help | ||
+ | * 9.2: Finding a Good Re-Expression | ||
+ | :::Plan A: The ladder of powers | ||
+ | :::Re-expressing to straighten a scatterplot | ||
+ | :::Comparing re-expressions | ||
+ | :::Plan B: Attack of the logarithms | ||
+ | :::Multiple benefits to re-expressions | ||
+ | :::Why not just fit a curve? | ||
+ | |||
+ | == Part III: Gathering Data == | ||
+ | |||
+ | === Chapter 10: Understanding Randomness === | ||
+ | |||
+ | * 10.1: What Is Randomness? | ||
+ | :::Meaning of the word "random" | ||
+ | :::Discussion of the process of generating random numbers | ||
+ | * 10.2: Simulating by Hand | ||
+ | :::Basic terminology: Simulations, trials, components, response variable | ||
+ | |||
+ | === Chapter 11: Sample Surveys === | ||
+ | |||
+ | * 11.1: The Three Big Ideas of Sampling | ||
+ | :::Idea 1: Examine a part of the whole | ||
+ | ::::Population versus sample | ||
+ | ::::Bias | ||
+ | :::Idea 2: Randomize | ||
+ | :::Idea 3: It's the sample size | ||
+ | ::::Sample size | ||
+ | ::::Does a census make sense | ||
+ | * 11.2: Populations and Parameters | ||
+ | * 11.3: Simple Random Samples | ||
+ | :::Sampling frame | ||
+ | :::Sampling variability | ||
+ | * 11.4: Other Sampling Designs | ||
+ | :::Stratified sampling | ||
+ | :::Cluster sampling | ||
+ | :::Multistage sampling | ||
+ | :::Systematic sampling | ||
+ | * 11.5: From the Population to the Sample: You Can't Always Get What You Want | ||
+ | * 11.6: The Valid Survey | ||
+ | :::Know what you want to know | ||
+ | :::Tune your instrument | ||
+ | :::Ask specific rather than general questions | ||
+ | :::Ask for quantitative results when possible | ||
+ | :::Be careful in phrasing questions | ||
+ | :::Pilot studies | ||
+ | * 11.7: Common Sampling Mistakes or How to Sample Badly | ||
+ | :::Mistake 1: Sample volunteers | ||
+ | :::Mistake 2: Sample convieniently | ||
+ | :::Mistake 3: Use a bad sampling frame | ||
+ | :::Mistake 4: Undercoverage | ||
+ | :::Nonresponse bias | ||
+ | :::Response bias | ||
+ | :::How to think about biases | ||
+ | ::::Look for biases in any survey you encounter | ||
+ | ::::Spend your time and resources reducing biases | ||
+ | ::::Think about the members of the population who could have been excluded from your study | ||
+ | ::::Always report your sampling methods in detail | ||
+ | |||
+ | === Chapter 12: Experiments and Observational Studies === | ||
+ | |||
+ | * 12.1: Observational Studies | ||
+ | :::Observational studies | ||
+ | :::Retrospective studies | ||
+ | :::Prospective studies | ||
+ | * 12.2: Randomized, Comparative Experiments | ||
+ | :::Random assignment of subjects to treatments | ||
+ | :::Explanatory variables, factors and levels | ||
+ | :::Response variables | ||
+ | * 12.3: The Four Principles of Experimental Design | ||
+ | :::Principle 1: Control | ||
+ | :::Principle 2: Randomize | ||
+ | :::Principle 3: Replicate | ||
+ | :::Principle 4: Block | ||
+ | :::Diagramming experiments | ||
+ | :::Statistically significant differences between groups | ||
+ | :::Contrasting experiments and samples | ||
+ | * 12.4: Control Treatments | ||
+ | :::Blinding (single and double) | ||
+ | :::Placebos | ||
+ | * 12.5: Blocking | ||
+ | :::Matched participants | ||
+ | * 12.6: Confounding | ||
+ | :::Lurking or confounding | ||
+ | |||
+ | == Part IV: Randomness and Probability == | ||
+ | |||
+ | === Chapter 13: From Randomness to Probability === | ||
+ | |||
+ | * 13.1: Random Phenomena | ||
+ | :::"A '''random phenomenon''' is a situation in which we know what outcomes can possibly occur, but we don't know which particular outcome will happen" | ||
+ | :::Trials | ||
+ | :::Outcomes | ||
+ | :::Sample space | ||
+ | :::Events | ||
+ | :::The law of large numbers | ||
+ | :::Empirical probability | ||
+ | :::The nonexistent law of averages | ||
+ | * 13.2: Modeling Probability | ||
+ | :::Theoretical probability | ||
+ | :::Personal probability | ||
+ | * 13.3: Formal Probability | ||
+ | :::The five rules of probability | ||
+ | ::::Rule 1: A probability must be a number between 0 and 1 | ||
+ | ::::Rule 2: Probability assignment rule: The probability of a the sample space must be 1 | ||
+ | ::::Rule 3: The complement rule | ||
+ | ::::Rule 4: The addition rule | ||
+ | ::::Rule 5: The multiplication rule | ||
+ | |||
+ | === Chapter 14: Probability Rules! === | ||
+ | |||
+ | * 14.1: The General Addition Rule | ||
+ | * 14.2: Conditional Probability and the General Multiplication Rule | ||
+ | * 14.3: Independence | ||
+ | * 14.4: Picturing Probability: Tables, Venn Diagrams, and Trees | ||
+ | * 14.5: Reversing the Conditioning and Bayes' Rule | ||
+ | |||
+ | === Chapter 15: Random Variables === | ||
+ | |||
+ | * 15.1: Center: The Expected Value | ||
+ | :::Definition of a random variable | ||
+ | :::Discrete random variables (can "list" all the outcomes) | ||
+ | :::Continuous random variables (not discrete) | ||
+ | :::Probability models for discrete random variables | ||
+ | :::Computation of expected value for discrete random variables | ||
+ | * 15.2: Spread: The Standard Deviation | ||
+ | :::Computation of variance and standard deviation for discrete random variables | ||
+ | * 15.3: Shifting and Combining Random Variables | ||
+ | :::E(X +/- c) | ||
+ | :::Var(X +/- c) | ||
+ | :::E(aX) | ||
+ | :::Var(aX) | ||
+ | :::E(X +/- Y) | ||
+ | :::Var(X +/- Y), when X and Y are independent | ||
+ | * [Unnumbered section, labeled optional]: Correlation and Covariance | ||
+ | :::Covariance of two random variables | ||
+ | :::Var(X +/- Y), when X and Y covary | ||
+ | :::Correlation of two random variables | ||
+ | * 15.4: Continuous Random Variables | ||
+ | :::The Normal random variable as an example of a continuous random variable | ||
+ | :::Caption to Figure 15.1: Interpretation of area under Normal curve as probability of finding an observation in the interval. | ||
+ | :::How can every value have a probability 0? | ||
+ | :::Sums of independent Normal random variables are Normal. | ||
+ | |||
+ | === Chapter 16: Probability Models === | ||
+ | |||
+ | * 16.1: Bernoulli Trials | ||
+ | * 16.2: The Geometric Model | ||
+ | :::Independence | ||
+ | :::The 10% condition | ||
+ | * 16.3: The Binomial Model | ||
+ | :::Binomial probabilities and the binomial model | ||
+ | :::Binomial coefficients | ||
+ | * 16.4: Approximating the Binomial Model with a Normal Model | ||
+ | :::The success/failure condition | ||
+ | * 16.5: The Continuity Correction | ||
+ | * 16.6: The Poisson Model | ||
+ | * 16.7: Other Continuous Random Variables: The Uniform and the Exponential | ||
+ | :::The uniform distribution | ||
+ | :::The exponential model | ||
+ | |||
+ | == Part V: From the Data at Hand to the World at Large == | ||
+ | |||
+ | === Chapter 17: Sampling Distribution Models === | ||
+ | |||
+ | * 17.1: Sampling Distribution of a Proportion | ||
+ | :::Often, the Normal model well fits the sampling distribution for proportion | ||
+ | :::Which Normal? Mean/standard deviation for Normal approximation to the sampling distribution for proportions | ||
+ | :::Sampling variability | ||
+ | * 17.2: When Does the Normal Model Work Well? Assumptions and Conditions (for proportions) | ||
+ | :::The independence assumption | ||
+ | :::The randomization condition | ||
+ | :::The 10% condition | ||
+ | :::The success/failure condition | ||
+ | * 17.3: The Sampling Distributions of Other Statistics | ||
+ | :::Simulating the sampling distributions of other statistics | ||
+ | ::::Medians | ||
+ | ::::Variances | ||
+ | ::::Minimums | ||
+ | :::Simulating the sampling distribution of a mean | ||
+ | * 17.4: The Central Limit Theorem: The Fundamental Theorem of Statistics | ||
+ | :::Statement of theorem | ||
+ | :::Assumptions and conditions | ||
+ | :::But which Normal: Mean and standard deviation for sampling distributions for means | ||
+ | * 17.5: Sampling Distributions: A Summary | ||
+ | |||
+ | === Chapter 18: Confidence Intervals for Proportions === | ||
+ | |||
+ | * 18.1: A Confidence Interval | ||
+ | :::The standard error | ||
+ | :::What a confidence interval says about a parameter | ||
+ | * 18.2: Interpreting Confidence Intervals: What Does 95% Confidence Really Mean | ||
+ | * 18.3: Margin of Error: Certainty vs. Precision | ||
+ | :::Margin of error | ||
+ | :::How the margin of error depends upon the confidence level | ||
+ | :::Critical values | ||
+ | * 18.4: Assumptions and Conditions | ||
+ | :::Independence assumption | ||
+ | ::::Independence condition | ||
+ | ::::Randomization condition | ||
+ | ::::10% condition | ||
+ | :::Sample size assumption | ||
+ | ::::Success/failure condition | ||
+ | |||
+ | === Chapter 19: Testing Hypotheses About Proportions === | ||
+ | |||
+ | * 19.1: Hypotheses | ||
+ | :::The null hypothesis | ||
+ | :::The alternative hypothesis | ||
+ | :::A trial (criminal justice) as a hypothesis test | ||
+ | * 19.2: P-Values | ||
+ | :::Definition of P-value | ||
+ | :::What to do with an "innocent" defendant (verdict: ''not guilty'') | ||
+ | * 19.3: The Reasoning of Hypothesis Testing | ||
+ | :::1. Hypotheses (pose hypotheses) | ||
+ | :::2. Model (verify problem satisfies conditions) | ||
+ | :::3. Mechanics (perform calculations) | ||
+ | :::4. Conclusion (interpret results) | ||
+ | * 19.4: Alternative Alternatives | ||
+ | :::Two-sided alternative | ||
+ | :::One-sided alternative | ||
+ | * 19.5: P-Values and Decisions: What to Tell About a Hypothesis Test | ||
+ | :::Discussion of when a p-value is small enough (no threshold yet) | ||
+ | |||
+ | === Chapter 20: Inference About Means === | ||
+ | |||
+ | * 20.1: Getting Started: The Central Limit Theorem (Again) | ||
+ | :::For means, population standard deviation is required, sample standard deviation is all we have | ||
+ | * 20.2: Gosset's t | ||
+ | :::t-Distribution versus Normal distribution | ||
+ | :::Degrees of freedom | ||
+ | :::What did Gosset see? | ||
+ | :::A confidence interval for means | ||
+ | :::A practical sampling distribution model for means | ||
+ | :::One-sample t-interval for the mean | ||
+ | :::Assumptions and Condition | ||
+ | ::::Independence assumption (randomization condition) | ||
+ | ::::Normal population assumption (nearly normal condition) | ||
+ | ::::Relationship to sample size | ||
+ | :::Using Table T to find t-Values | ||
+ | * 20.3: Interpreting Confidence Intervals | ||
+ | * 20.4: A Hypothesis Test for the Mean | ||
+ | :::One-sample t-test for the mean | ||
+ | :::Intervals and tests (relationship) | ||
+ | :::The special case of proportions (relationship above differs) | ||
+ | * 20.5: Choosing the Sample Size | ||
+ | |||
+ | === Chapter 21: More About Tests and Intervals === | ||
+ | |||
+ | * 21.1: Choosing Hypotheses | ||
+ | * 21.2: How to Think About P-Values | ||
+ | :::The P-value is ''not'' the probability that the null hypothesis is true | ||
+ | :::What to do with a small P-value | ||
+ | ::::A small p-value does not imply a large effect | ||
+ | :::What to do with a high P-value | ||
+ | ::::A big p-value does not prove the null hypothesis | ||
+ | * 21.3: Alpha Levels | ||
+ | :::Alpha levels and statistical significance | ||
+ | :::Where did the value 0.05 come from? | ||
+ | :::Practical vs. statistical significance | ||
+ | * 21.4: Critical Values for Hypothesis Tests | ||
+ | :::Table T | ||
+ | :::A confidence interval for small samples | ||
+ | :::Confidence intervals and hypothesis tests | ||
+ | * 21.5: Errors | ||
+ | :::Type I errors | ||
+ | :::Type II errors | ||
+ | :::Probabilities defined as alpha and beta | ||
+ | :::Power | ||
+ | :::Effect size | ||
+ | :::Pictures of errors | ||
+ | :::Reducing both type I and type II errors | ||
+ | |||
+ | == Part VI: Accessing Associations Between Variables == | ||
+ | |||
+ | === Chapter 22: Comparing Groups === | ||
+ | |||
+ | * 22.1: The Standard Deviation of a Difference | ||
+ | :::The standard deviation of the difference between two proportions | ||
+ | * 22.2: Assumptions and Conditions for Comparing Proportions | ||
+ | :::Independence | ||
+ | ::::Independence assumption | ||
+ | ::::Randomization condition | ||
+ | ::::The 10% condition | ||
+ | ::::Independence groups assumptions | ||
+ | :::Sample Size | ||
+ | :::Success/failure condition for both groups | ||
+ | * 22.3: A Confidence Interval for the Difference Between Two Proportions | ||
+ | :::The sampling distribution model for a difference between two independent proportions | ||
+ | :::A two-proportion z-interval | ||
+ | :::Two-proportion z-test | ||
+ | * 22.4: The Two Sample z-Test: Testing for the Difference Between Proportions | ||
+ | :::Pooling for tests of equal proportions | ||
+ | * 22.5: A Confidence Interval for the Difference Between Two Means | ||
+ | :::The standard error for the difference between two means | ||
+ | :::Two-sample t-interval | ||
+ | :::Degrees of freedom and the two sample t-distribution | ||
+ | :::Assumptions and conditions | ||
+ | ::::Independence | ||
+ | ::::Normal population (nearly normal condition, sample size) | ||
+ | :::A note about independent groups | ||
+ | * 22.6: The Two-Sample t-Test: Testing for the Difference Between to Means | ||
+ | * [Unnumbered section, labeled optional]: Tukey's Quick Test | ||
+ | * [Unnumbered section, labeled optional]: A Rank Sum Test | ||
+ | * 22.7: The Pooled t-Test: Everyone into the Pool? | ||
+ | :::Details of the pooled t-test | ||
+ | :::Equal variance assumption (similar spreads condition) | ||
+ | :::Pooled t-test and confidence interval for means | ||
+ | :::Is the pool all wet (when to use a pooled t-test) | ||
+ | :::Pooling (discussion and in more general contexts) | ||
+ | |||
+ | === Chapter 23: Paired Samples and Blocks === | ||
+ | |||
+ | * 23.1: Paired Data | ||
+ | * 23.2: Assumptions and Conditions | ||
+ | :::Paired data condition | ||
+ | :::Independence assumption (differences independent) | ||
+ | :::Normal population assumption | ||
+ | ::::Nearly normal condition | ||
+ | ::::Sample size | ||
+ | * 23.3: Confidence Intervals for Matched Pairs | ||
+ | :::Paired t-interval | ||
+ | :::Effect size | ||
+ | * 23.4: Blocking | ||
+ | |||
+ | === Chapter 24: Comparing Counts === | ||
+ | |||
+ | * 24.1: Goodness-of-Fit Tests | ||
+ | * 24.2: Chi-Square Test of Homogeneity | ||
+ | * 24.3: Examining the Residuals | ||
+ | * 24.4: Chi-Square Tests of Independence | ||
+ | |||
+ | === Chapter 25: Inferences for Regression === | ||
+ | |||
+ | == Part VII: Inference When Variables are Related == | ||
+ | |||
+ | === Chapter 26: Analysis of Variance === | ||
+ | |||
+ | === Chapter 27: Multifactor Analysis of Variance === | ||
+ | |||
+ | === Chapter 28: Multiple Regression === | ||
+ | |||
+ | === Chapter 29: Multiple Regression Wisdom === |
Latest revision as of 01:30, 21 November 2018
Contents
- 1 Part I: Exploring and Understanding Data
- 2 Part II: Exploring Relationships Between Variables
- 3 Part III: Gathering Data
- 4 Part IV: Randomness and Probability
- 5 Part V: From the Data at Hand to the World at Large
- 6 Part VI: Accessing Associations Between Variables
- 7 Part VII: Inference When Variables are Related
Part I: Exploring and Understanding Data
Chapter 1: Exploring and Understanding Data
- 1.1: What is Statistics?
- 1.2: Data
- 1.3: Variables
- Types of variables: Quantitative, identifier, ordinal, categorical (categorical & nominal considered synonyms)
Chapter 2: Displaying and Describing Categorical Data
- 2.1: Summarizing and Displaying a Single Categorical Variable
- The area principle
- Frequency tables
- Bar charts
- Pie charts
- 2.2: Exploring the Relationship Between Two Categorical Variables
- Contingency tables
- Conditional distributions
- Independence
- Plotting conditional distributions (with pie charts, bar charts and segmented bar charts)
Chapter 3: Displaying and Displaying Quantitative Data
- 3.1: Displaying Quantitative Variables
- Histograms
- Stem and leaf displays
- Dotplots
- 3.2: Shape
- Unimodal, bimodal or multimodal
- Symmetric or skewed
- Outliers
- 3.3: Center
- Median
- 3.4: Spread
- Range, min, max
- Interquartile range, Q1, Q3
- 3.5: Boxplots and 5-Number Summaries
- 3.6: The Center of a Symmetric Distribution: The Mean
- Mean or Median?
- 3.7: The Spread of a Symmetric Distribution: The Standard Deviation
- Formulas for variance and standard deviation
- Thinking about variation
- 3.8: Summary---What to Tell About a Quantitative Variable
Chapter 4: Understanding and Comparing Distributions
- 4.1: Comparing Groups with Histograms
- 4.2: Comparing Groups with Boxplots
- 4.3: Outliers
- 4.4: Timeplots
- 4.5: Re-Expressing Data: A First Look
- ...To improve symmetry
- ...To equalize spread across groups
Chapter 5: The Standard Deviation as a Ruler and the Normal Model
- 5.1: Standardizing with z-Scores
- 5.2: Shifting and Scaling
- Shifting to adjust the center
- Rescaling to adjust the scale
- Shifting, scaling and z-Scores
- 5.3: Normal Models
- The "nearly normal condition"
- The 68-95-99.7 Rule
- Working with pictures of the Normal curve
- Inflection points at mean +/- one standard deviation
- Interpretation of area under Normal curve as proportion of observations in interval (implied by pictures and exposition)
- 5.4: Finding Normal Percentiles
- Normal percentiles
- Other models
- From percentiles to scores: z in reverse
- 5.5: Normal Probability Plots
Part II: Exploring Relationships Between Variables
Chapter 6: Scatterplots, Association, and Correlation
- 6.1: Scatterplots
- Direction (negative or positive)
- Form
- Strength
- Outliers
- Explanatory and response variables
- 6.2: Correlation
- Formula
- Assumptions and conditions for correlation, including...
- "Quantitative variables condition,"
- "Straight enough condition,"
- "No outliers condition"
- Correlation Properties
- How strong is strong?
- Measuring trend: Kendall's tau
- Nonparametric association: Spearman's Rho
- 6.3: Warning: Correlation Does Not Equal Causation
- 6.4: Straightening Scatterplots
Chapter 7: Linear Regression
- 7.1 Least Squares: The Line of "Best Fit"
- The linear model
- Predicted values and residuals
- The least squares line and the sense in which it is the best fit
- 7.2 The Linear Model
- Using the linear model to make predictions
- 7.3 Finding the Least Squares Line
- Formulas for slope and intercept
- 7.4 Regression to the Mean
- Etiology of the word "Regression"
- Math Box: Derivation of regression formula
- 7.5 Examining the Residuals
- Formula for residuals
- Appropriate (lack of) form of Residuals versus x-Values plot
- The residual standard deviation
- 7.6 R^2---The Variation Accounted For by the Model
- How big should R^2 be?
- Predicting in the other direction---A tale of two regressions
- 7.7 Regression Assumptions and Conditions
- "Quantitative variable" condition
- "Straight enough" condition
- "Outlier" condition
- "Does the plot thicken?" condition
- Judging the conditions with the residuals-versus-predicted-values plot
Chapter 8: Regression Wisdom
- 8.1: Examining Residuals
- Getting the "bends": When the residuals aren't straight
- Sifting residuals for groups
- Subsetting with a categorical variable
- 8.2: Extrapolation: Reaching Beyond the Data
- Warning with extrapolation
- Warning with predicting what will happen to cases in the regression if they were changed
- 8.3: Outliers, Leverage, and Influence
- 8.4: Lurking Variables and Causation
- 8.5: Working with Summary Values
Chapter 9: Re-expressing Data: Get It Straight!
- 9.1: Straightening Scatterplots -- The Four Goals
- Goal 1: Make the distribution of a variable more symmetric.
- Goal 2: Make the spread of several groups more alike, even if their centers differ
- Goal 3: Make the form of a scatterplot more nearly linear
- Goal 4: Make the scatter in a scatterplot spread out evenly rather than thinkening at one end
- Recognizing when a re-expression can help
- 9.2: Finding a Good Re-Expression
- Plan A: The ladder of powers
- Re-expressing to straighten a scatterplot
- Comparing re-expressions
- Plan B: Attack of the logarithms
- Multiple benefits to re-expressions
- Why not just fit a curve?
Part III: Gathering Data
Chapter 10: Understanding Randomness
- 10.1: What Is Randomness?
- Meaning of the word "random"
- Discussion of the process of generating random numbers
- 10.2: Simulating by Hand
- Basic terminology: Simulations, trials, components, response variable
Chapter 11: Sample Surveys
- 11.1: The Three Big Ideas of Sampling
- Idea 1: Examine a part of the whole
- Population versus sample
- Bias
- Idea 2: Randomize
- Idea 3: It's the sample size
- Sample size
- Does a census make sense
- Idea 1: Examine a part of the whole
- 11.2: Populations and Parameters
- 11.3: Simple Random Samples
- Sampling frame
- Sampling variability
- 11.4: Other Sampling Designs
- Stratified sampling
- Cluster sampling
- Multistage sampling
- Systematic sampling
- 11.5: From the Population to the Sample: You Can't Always Get What You Want
- 11.6: The Valid Survey
- Know what you want to know
- Tune your instrument
- Ask specific rather than general questions
- Ask for quantitative results when possible
- Be careful in phrasing questions
- Pilot studies
- 11.7: Common Sampling Mistakes or How to Sample Badly
- Mistake 1: Sample volunteers
- Mistake 2: Sample convieniently
- Mistake 3: Use a bad sampling frame
- Mistake 4: Undercoverage
- Nonresponse bias
- Response bias
- How to think about biases
- Look for biases in any survey you encounter
- Spend your time and resources reducing biases
- Think about the members of the population who could have been excluded from your study
- Always report your sampling methods in detail
Chapter 12: Experiments and Observational Studies
- 12.1: Observational Studies
- Observational studies
- Retrospective studies
- Prospective studies
- 12.2: Randomized, Comparative Experiments
- Random assignment of subjects to treatments
- Explanatory variables, factors and levels
- Response variables
- 12.3: The Four Principles of Experimental Design
- Principle 1: Control
- Principle 2: Randomize
- Principle 3: Replicate
- Principle 4: Block
- Diagramming experiments
- Statistically significant differences between groups
- Contrasting experiments and samples
- 12.4: Control Treatments
- Blinding (single and double)
- Placebos
- 12.5: Blocking
- Matched participants
- 12.6: Confounding
- Lurking or confounding
Part IV: Randomness and Probability
Chapter 13: From Randomness to Probability
- 13.1: Random Phenomena
- "A random phenomenon is a situation in which we know what outcomes can possibly occur, but we don't know which particular outcome will happen"
- Trials
- Outcomes
- Sample space
- Events
- The law of large numbers
- Empirical probability
- The nonexistent law of averages
- 13.2: Modeling Probability
- Theoretical probability
- Personal probability
- 13.3: Formal Probability
- The five rules of probability
- Rule 1: A probability must be a number between 0 and 1
- Rule 2: Probability assignment rule: The probability of a the sample space must be 1
- Rule 3: The complement rule
- Rule 4: The addition rule
- Rule 5: The multiplication rule
- The five rules of probability
Chapter 14: Probability Rules!
- 14.1: The General Addition Rule
- 14.2: Conditional Probability and the General Multiplication Rule
- 14.3: Independence
- 14.4: Picturing Probability: Tables, Venn Diagrams, and Trees
- 14.5: Reversing the Conditioning and Bayes' Rule
Chapter 15: Random Variables
- 15.1: Center: The Expected Value
- Definition of a random variable
- Discrete random variables (can "list" all the outcomes)
- Continuous random variables (not discrete)
- Probability models for discrete random variables
- Computation of expected value for discrete random variables
- 15.2: Spread: The Standard Deviation
- Computation of variance and standard deviation for discrete random variables
- 15.3: Shifting and Combining Random Variables
- E(X +/- c)
- Var(X +/- c)
- E(aX)
- Var(aX)
- E(X +/- Y)
- Var(X +/- Y), when X and Y are independent
- [Unnumbered section, labeled optional]: Correlation and Covariance
- Covariance of two random variables
- Var(X +/- Y), when X and Y covary
- Correlation of two random variables
- 15.4: Continuous Random Variables
- The Normal random variable as an example of a continuous random variable
- Caption to Figure 15.1: Interpretation of area under Normal curve as probability of finding an observation in the interval.
- How can every value have a probability 0?
- Sums of independent Normal random variables are Normal.
Chapter 16: Probability Models
- 16.1: Bernoulli Trials
- 16.2: The Geometric Model
- Independence
- The 10% condition
- 16.3: The Binomial Model
- Binomial probabilities and the binomial model
- Binomial coefficients
- 16.4: Approximating the Binomial Model with a Normal Model
- The success/failure condition
- 16.5: The Continuity Correction
- 16.6: The Poisson Model
- 16.7: Other Continuous Random Variables: The Uniform and the Exponential
- The uniform distribution
- The exponential model
Part V: From the Data at Hand to the World at Large
Chapter 17: Sampling Distribution Models
- 17.1: Sampling Distribution of a Proportion
- Often, the Normal model well fits the sampling distribution for proportion
- Which Normal? Mean/standard deviation for Normal approximation to the sampling distribution for proportions
- Sampling variability
- 17.2: When Does the Normal Model Work Well? Assumptions and Conditions (for proportions)
- The independence assumption
- The randomization condition
- The 10% condition
- The success/failure condition
- 17.3: The Sampling Distributions of Other Statistics
- Simulating the sampling distributions of other statistics
- Medians
- Variances
- Minimums
- Simulating the sampling distribution of a mean
- Simulating the sampling distributions of other statistics
- 17.4: The Central Limit Theorem: The Fundamental Theorem of Statistics
- Statement of theorem
- Assumptions and conditions
- But which Normal: Mean and standard deviation for sampling distributions for means
- 17.5: Sampling Distributions: A Summary
Chapter 18: Confidence Intervals for Proportions
- 18.1: A Confidence Interval
- The standard error
- What a confidence interval says about a parameter
- 18.2: Interpreting Confidence Intervals: What Does 95% Confidence Really Mean
- 18.3: Margin of Error: Certainty vs. Precision
- Margin of error
- How the margin of error depends upon the confidence level
- Critical values
- 18.4: Assumptions and Conditions
- Independence assumption
- Independence condition
- Randomization condition
- 10% condition
- Sample size assumption
- Success/failure condition
- Independence assumption
Chapter 19: Testing Hypotheses About Proportions
- 19.1: Hypotheses
- The null hypothesis
- The alternative hypothesis
- A trial (criminal justice) as a hypothesis test
- 19.2: P-Values
- Definition of P-value
- What to do with an "innocent" defendant (verdict: not guilty)
- 19.3: The Reasoning of Hypothesis Testing
- 1. Hypotheses (pose hypotheses)
- 2. Model (verify problem satisfies conditions)
- 3. Mechanics (perform calculations)
- 4. Conclusion (interpret results)
- 19.4: Alternative Alternatives
- Two-sided alternative
- One-sided alternative
- 19.5: P-Values and Decisions: What to Tell About a Hypothesis Test
- Discussion of when a p-value is small enough (no threshold yet)
Chapter 20: Inference About Means
- 20.1: Getting Started: The Central Limit Theorem (Again)
- For means, population standard deviation is required, sample standard deviation is all we have
- 20.2: Gosset's t
- t-Distribution versus Normal distribution
- Degrees of freedom
- What did Gosset see?
- A confidence interval for means
- A practical sampling distribution model for means
- One-sample t-interval for the mean
- Assumptions and Condition
- Independence assumption (randomization condition)
- Normal population assumption (nearly normal condition)
- Relationship to sample size
- Using Table T to find t-Values
- 20.3: Interpreting Confidence Intervals
- 20.4: A Hypothesis Test for the Mean
- One-sample t-test for the mean
- Intervals and tests (relationship)
- The special case of proportions (relationship above differs)
- 20.5: Choosing the Sample Size
Chapter 21: More About Tests and Intervals
- 21.1: Choosing Hypotheses
- 21.2: How to Think About P-Values
- The P-value is not the probability that the null hypothesis is true
- What to do with a small P-value
- A small p-value does not imply a large effect
- What to do with a high P-value
- A big p-value does not prove the null hypothesis
- 21.3: Alpha Levels
- Alpha levels and statistical significance
- Where did the value 0.05 come from?
- Practical vs. statistical significance
- 21.4: Critical Values for Hypothesis Tests
- Table T
- A confidence interval for small samples
- Confidence intervals and hypothesis tests
- 21.5: Errors
- Type I errors
- Type II errors
- Probabilities defined as alpha and beta
- Power
- Effect size
- Pictures of errors
- Reducing both type I and type II errors
Part VI: Accessing Associations Between Variables
Chapter 22: Comparing Groups
- 22.1: The Standard Deviation of a Difference
- The standard deviation of the difference between two proportions
- 22.2: Assumptions and Conditions for Comparing Proportions
- Independence
- Independence assumption
- Randomization condition
- The 10% condition
- Independence groups assumptions
- Sample Size
- Success/failure condition for both groups
- Independence
- 22.3: A Confidence Interval for the Difference Between Two Proportions
- The sampling distribution model for a difference between two independent proportions
- A two-proportion z-interval
- Two-proportion z-test
- 22.4: The Two Sample z-Test: Testing for the Difference Between Proportions
- Pooling for tests of equal proportions
- 22.5: A Confidence Interval for the Difference Between Two Means
- The standard error for the difference between two means
- Two-sample t-interval
- Degrees of freedom and the two sample t-distribution
- Assumptions and conditions
- Independence
- Normal population (nearly normal condition, sample size)
- A note about independent groups
- 22.6: The Two-Sample t-Test: Testing for the Difference Between to Means
- [Unnumbered section, labeled optional]: Tukey's Quick Test
- [Unnumbered section, labeled optional]: A Rank Sum Test
- 22.7: The Pooled t-Test: Everyone into the Pool?
- Details of the pooled t-test
- Equal variance assumption (similar spreads condition)
- Pooled t-test and confidence interval for means
- Is the pool all wet (when to use a pooled t-test)
- Pooling (discussion and in more general contexts)
Chapter 23: Paired Samples and Blocks
- 23.1: Paired Data
- 23.2: Assumptions and Conditions
- Paired data condition
- Independence assumption (differences independent)
- Normal population assumption
- Nearly normal condition
- Sample size
- 23.3: Confidence Intervals for Matched Pairs
- Paired t-interval
- Effect size
- 23.4: Blocking
Chapter 24: Comparing Counts
- 24.1: Goodness-of-Fit Tests
- 24.2: Chi-Square Test of Homogeneity
- 24.3: Examining the Residuals
- 24.4: Chi-Square Tests of Independence