De Veaux Map
From Sean_Carver
Contents
- 1 Part I: Exploring and Understanding Data
- 2 Part II: Exploring Relationships Between Variables
- 3 Part III: Gathering Data
- 4 Part IV: Randomness and Probability
- 5 Part V: From the Data at Hand to the World at Large
- 6 Part VI: Accessing Associations Between Variables
- 7 Part VII: Inference When Variables are Related
Part I: Exploring and Understanding Data
Chapter 1: Exploring and Understanding Data
- 1.1: What is Statistics?
- 1.2: Data
- 1.3: Variables
- Types of variables: Quantitative, identifier, ordinal, categorical (categorical & nominal considered synonyms)
Chapter 2: Displaying and Describing Categorical Data
- 2.1: Summarizing and Displaying a Single Categorical Variable
- The area principle
- Frequency tables
- Bar charts
- Pie charts
- 2.2: Exploring the Relationship Between Two Categorical Variables
- Contingency tables
- Conditional distributions
- Independence
- Plotting conditional distributions (with pie charts, bar charts and segmented bar charts)
Chapter 3: Displaying and Displaying Quantitative Data
- 3.1: Displaying Quantitative Variables
- Histograms
- Stem and leaf displays
- Dotplots
- 3.2: Shape
- Unimodal, bimodal or multimodal
- Symmetric or skewed
- Outliers
- 3.3: Center
- Median
- 3.4: Spread
- Range, min, max
- Interquartile range, Q1, Q3
- 3.5: Boxplots and 5-Number Summaries
- 3.6: The Center of a Symmetric Distribution: The Mean
- Mean or Median?
- 3.7: The Spread of a Symmetric Distribution: The Standard Deviation
- Formulas for variance and standard deviation
- Thinking about variation
- 3.8: Summary---What to Tell About a Quantitative Variable
Chapter 4: Understanding and Comparing Distributions
- 4.1: Comparing Groups with Histograms
- 4.2: Comparing Groups with Boxplots
- 4.3: Outliers
- 4.4: Timeplots
- 4.5: Re-Expressing Data: A First Look
- ...To improve symmetry
- ...To equalize spread across groups
Chapter 5: The Standard Deviation as a Ruler and the Normal Model
- 5.1: Standardizing with z-Scores
- 5.2: Shifting and Scaling
- Shifting to adjust the center
- Rescaling to adjust the scale
- Shifting, scaling and z-Scores
- 5.3: Normal Models
- The "nearly normal condition"
- The 68-95-99.7 Rule
- Working with pictures of the Normal curve
- Inflection points at mean +/- one standard deviation
- Interpretation of area under Normal curve as proportion of observations in interval (implied by pictures and exposition)
- 5.4: Finding Normal Percentiles
- Normal percentiles
- Other models
- From percentiles to scores: z in reverse
- 5.5: Normal Probability Plots
Part II: Exploring Relationships Between Variables
Chapter 6: Scatterplots, Association, and Correlation
- 6.1: Scatterplots
- Direction (negative or positive)
- Form
- Strength
- Outliers
- Explanatory and response variables
- 6.2: Correlation
- Formula
- Assumptions and conditions for correlation, including...
- "Quantitative variables condition,"
- "Straight enough condition,"
- "No outliers condition"
- 6.3: Warning: Correlation Does Not Equal Causation
- 6.4: Straightening Scatterplots
Chapter 7: Linear Regression
- 7.1 Least Squares: The Line of "Best Fit"
- The linear model
- Predicted values and residuals
- The least squares line and the sense in which it is the best fit
- 7.2 The Linear Model
- Using the linear model to make predictions
- 7.3 Finding the Least Squares Line
- Formulas for slope and intercept
- 7.4 Regression to the Mean
- Etiology of the word "Regression"
- Math Box: Derivation of regression formula
- 7.5 Examining the Residuals
- Formula for residuals
- Appropriate (lack of) form of Residuals versus x-Values plot
- The residual standard deviation
- 7.6 R^2---The Variation Accounted For by the Model
- How big should R^2 be?
- Predicting in the other direction---A tale of two regressions
- 7.7 Regression Assumptions and Conditions
- "Quantitative variable" condition
- "Straight enough" condition
- "Outlier" condition
- "Does the plot thicken?" condition
- Judging the conditions with the residuals-versus-predicted-values plot
Chapter 8: Regression Wisdom
- 8.1: Examining Residuals
- Getting the "bends": When the residuals aren't straight
- Sifting residuals for groups
- Subsetting with a categorical variable
- 8.2: Extrapolation: Reaching Beyond the Data
- Warning with extrapolation
- Warning with predicting what will happen to cases in the regression if they were changed
- 8.3: Outliers, Leverage, and Influence
- 8.4: Lurking Variables and Causation
- 8.5: Working with Summary Values
Chapter 9: Re-expressing Data: Get It Straight!
- 9.1: Straightening Scatterplots -- The Four Goals
- Goal 1: Make the distribution of a variable more symmetric.
- Goal 2: Make the spread of several groups more alike, even if their centers differ
- Goal 3: Make the form of a scatterplot more nearly linear
- Goal 4: Make the scatter in a scatterplot spread out evenly rather than thinkening at one end
- Recognizing when a re-expression can help
- 9.2: Finding a Good Re-Expression
- Plan A: The ladder of powers
- Re-expressing to straighten a scatterplot
- Comparing re-expressions
- Plan B: Attack of the logarithms
- Multiple benefits to re-expressions
- Why not just fit a curve?
Part III: Gathering Data
Chapter 10: Understanding Randomness
- 10.1: What Is Randomness?
- Meaning of the word "random"
- Discussion of the process of generating random numbers
- 10.2: Simulating by Hand
- Basic terminology: Simulations, trials, components, response variable
Chapter 11: Sample Surveys
- 11.1: The Three Big Ideas of Sampling
- Idea 1: Examine a part of the whole
- Population versus sample
- Bias
- Idea 2: Randomize
- Idea 3: It's the sample size
- Sample size
- Does a census make sense
- Idea 1: Examine a part of the whole
- 11.2: Populations and Parameters
- 11.3: Simple Random Samples
- Sampling frame
- Sampling variability
- 11.4: Other Sampling Designs
- Stratified sampling
- Cluster sampling
- Multistage sampling
- Systematic sampling
- 11.5: From the Population to the Sample: You Can't Always Get What You Want
- 11.6: The Valid Survey
- Know what you want to know
- Tune your instrument
- Ask specific rather than general questions
- Ask for quantitative results when possible
- Be careful in phrasing questions
- Pilot studies
- 11.7: Common Sampling Mistakes or How to Sample Badly
- Mistake 1: Sample volunteers
- Mistake 2: Sample convieniently
- Mistake 3: Use a bad sampling frame
- Mistake 4: Undercoverage
- Nonresponse bias
- Response bias
- How to think about biases
- Look for biases in any survey you encounter
- Spend your time and resources reducing biases
- Think about the members of the population who could have been excluded from your study
- Always report your sampling methods in detail
Chapter 12: Experiments and Observational Studies
- 12.1: Observational Studies
- Observational studies
- Retrospective studies
- Prospective studies
- 12.2: Randomized, Comparative Experiments
- Random assignment of subjects to treatments
- Explanatory variables, factors and levels
- Response variables
- 12.3: The Four Principles of Experimental Design
- Principle 1: Control
- Principle 2: Randomize
- Principle 3: Replicate
- Principle 4: Block
- Diagramming experiments
- Statistically significant differences between groups
- Contrasting experiments and samples
- 12.4: Control Treatments
- Blinding (single and double)
- Placebos
- 12.5: Blocking
- Matched participants
- 12.6: Confounding
- Lurking or confounding
Part IV: Randomness and Probability
Chapter 13: From Randomness to Probability
- 13.1: Random Phenomena
- "A random phenomenon is a situation in which we know what outcomes can possibly occur, but we don't know which particular outcome will happen"
- Trials
- Outcomes
- Sample space
- Events
- The law of large numbers
- Empirical probability
- The nonexistent law of averages
- 13.2: Modeling Probability
- Theoretical probability
- Personal probability
- 13.3: Formal Probability
- The five rules of probability
- Rule 1: A probability must be a number between 0 and 1
- Rule 2: Probability assignment rule: The probability of a the sample space must be 1
- Rule 3: The complement rule
- Rule 4: The addition rule
- Rule 5: The multiplication rule
- The five rules of probability
Chapter 14: Probability Rules!
- 14.1: The General Addition Rule
- 14.2: Conditional Probability and the General Multiplication Rule
- 14.3: Independence
- 14.4: Picturing Probability: Tables, Venn Diagrams, and Trees
- 14.5: Reversing the Conditioning and Bayes' Rule
Chapter 15: Random Variables
- 15.1: Center: The Expected Value
- Definition of a random variable
- Discrete random variables (can "list" all the outcomes)
- Continuous random variables (not discrete)
- Probability models for discrete random variables
- Computation of expected value for discrete random variables
- 15.2: Spread: The Standard Deviation
- Computation of variance and standard deviation for discrete random variables
- 15.3: Shifting and Combining Random Variables
- E(X +/- c)
- Var(X +/- c)
- E(aX)
- Var(aX)
- E(X +/- Y)
- Var(X +/- Y), when X and Y are independent
- [Unnumbered section, labeled optional]: Correlation and Covariance
- Covariance of two random variables
- Var(X +/- Y), when X and Y covary
- Correlation of two random variables
- 15.4: Continuous Random Variables
- The Normal random variable as an example of a continuous random variable
- Caption to Figure 15.1: Interpretation of area under Normal curve as probability of finding an observation in the interval.
- How can every value have a probability 0?
- Sums of independent Normal random variables are Normal.
Chapter 16: Probability Models
- 16.1: Bernoulli Trials
- 16.2: The Geometric Model
- Independence
- The 10% condition
- 16.3: The Binomial Model
- Binomial probabilities and the binomial model
- Binomial coefficients
- 16.4: Approximating the Binomial Model with a Normal Model
- The success/failure condition
- 16.5: The Continuity Correction
- 16.6: The Poisson Model
- 16.7: Other Continuous Random Variables: The Uniform and the Exponential
- The uniform distribution
- The exponential model
Part V: From the Data at Hand to the World at Large
Chapter 17: Sampling Distribution Models
- 17.1: Sampling Distribution of a Proportion
- Often, the Normal model well fits the sampling distribution for proportion
- Which Normal? Mean/standard deviation for Normal approximation to the sampling distribution for proportions
- Sampling variability
- 17.2: When Does the Normal Model Work Well? Assumptions and Conditions (for proportions)
- The independence assumption
- The randomization condition
- The 10% condition
- The success/failure condition
- 17.3: The Sampling Distributions of Other Statistics
- Simulating the sampling distributions of other statistics
- Medians
- Variances
- Minimums
- Simulating the sampling distribution of a mean
- Simulating the sampling distributions of other statistics
- 17.4: The Central Limit Theorem: The Fundamental Theorem of Statistics
- Statement of theorem
- Assumptions and conditions
- But which Normal: Mean and standard deviation for sampling distributions for means
- 17.5: Sampling Distributions: A Summary
Chapter 18: Confidence Intervals for Proportions
- 18.1: A Confidence Interval
- The standard error
- What a confidence interval says about a parameter
- 18.2: Interpreting Confidence Intervals: What Does 95% Confidence Really Mean
- 18.3: Margin of Error: Certainty vs. Precision
- Margin of error
- How the margin of error depends upon the confidence level
- Critical values
- 18.4: Assumptions and Conditions
- Independence assumption
- Independence condition
- Randomization condition
- 10% condition
- Sample size assumption
- Success/failure condition
- Independence assumption
Chapter 19: Testing Hypotheses About Proportions
- 19.1: Hypotheses
- The null hypothesis
- The alternative hypothesis
- A trial (criminal justice) as a hypothesis test
- 19.2: P-Values
- Definition of P-value
- What to do with an "innocent" defendant (verdict: not guilty)
- 19.3: The Reasoning of Hypothesis Testing
- 1. Hypotheses (pose hypotheses)
- 2. Model (verify problem satisfies conditions)
- 3. Mechanics (perform calculations)
- 4. Conclusion (interpret results)
- 19.4: Alternative Alternatives
- Two-sided alternative
- One-sided alternative
- 19.5: P-Values and Decisions: What to Tell About a Hypothesis Test
- Discussion of when a p-value is small enough (no threshold yet)
Chapter 20: Inference About Means
- 20.1: Getting Started: The Central Limit Theorem (Again)
- For means, population standard deviation is required, sample standard deviation is all we have
- 20.2: Gosset's t
- t-Distribution versus Normal distribution
- Degrees of freedom
- What did Gosset see?
- A confidence interval for means
- A practical sampling distribution model for means
- One-sample t-interval for the mean
- Assumptions and Condition
- Independence assumption (randomization condition)
- Normal population assumption (nearly normal condition)
- Relationship to sample size
- Using Table T to find t-Values
- 20.3: Interpreting Confidence Intervals
- 20.4: A Hypothesis Test for the Mean
- One-sample t-test for the mean
- Intervals and tests (relationship)
- The special case of proportions (relationship above differs)
- 20.5: Choosing the Sample Size
Chapter 21: More About Tests and Intervals
- 21.1: Choosing Hypotheses
- 21.2: How to Think About P-Values
- The P-value is not the probability that the null hypothesis is true
- What to do with a small P-value
- A small p-value does not imply a large effect
- What to do with a high P-value
- A big p-value does not prove the null hypothesis
- 21.3: Alpha Levels
- Alpha levels and statistical significance
- Where did the value 0.05 come from?
- Practical vs. statistical significance
- 21.4: Critical Values for Hypothesis Tests
- Table T
- A confidence interval for small samples
- Confidence intervals and hypothesis tests
- 21.5: Errors
- Type I errors
- Type II errors
- Probabilities defined as alpha and beta
- Power
- Effect size
- Pictures of errors
- Reducing both type I and type II errors
Part VI: Accessing Associations Between Variables
Chapter 22: Comparing Groups
- 22.1: The Standard Deviation of a Difference
- The standard deviation of the difference between two proportions
- 22.2: Assumptions and Conditions for Comparing Proportions
- Independence
- Independence assumption
- Randomization condition
- The 10% condition
- Independence groups assumptions
- Sample Size
- Success/failure condition for both groups
- Independence
- 22.3: A Confidence Interval for the Difference Between Two Proportions
- The sampling distribution model for a difference between two independent proportions
- A two-proportion z-interval
- Two-proportion z-test
- 22.4: The Two Sample z-Test: Testing for the Difference Between Proportions
- Pooling for tests of equal proportions
- 22.5: A Confidence Interval for the Difference Between Two Means
- The standard error for the difference between two means
- Two-sample t-interval
- Degrees of freedom and the two sample t-distribution
- Assumptions and conditions
- Independence
- Normal population (nearly normal condition, sample size)
- A note about independent groups
- 22.6: The Two-Sample t-Test: Testing for the Difference Between to Means
- [Unnumbered section, labeled optional]: Tukey's Quick Test
- [Unnumbered section, labeled optional]: A Rank Sum Test
- 22.7: The Pooled t-Test: Everyone into the Pool?
- Details of the pooled t-test
- Equal variance assumption (similar spreads condition)
- Pooled t-test and confidence interval for means
- Is the pool all wet (when to use a pooled t-test)
- Pooling (discussion and in more general contexts)
Chapter 23: Paired Samples and Blocks
- 23.1: Paired Data
- 23.2: Assumptions and Conditions
- Paired data condition
- Independence assumption (differences independent)
- Normal population assumption
- Nearly normal condition
- Sample size
- 23.3: Confidence Intervals for Matched Pairs
- Paired t-interval
- Effect size
- 23.4: Blocking
Chapter 24: Comparing Counts
- 24.1: Goodness-of-Fit Tests
- 24.2: Chi-Square Test of Homogeneity
- 24.3: Examining the Residuals
- 24.4: Chi-Square Tests of Independence