Difference between revisions of "Regression Lab In StatCrunch"
From Sean_Carver
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''Lab 3: Regression, Correlation and Outliers | '''Lab 3: Regression, Correlation and Outliers | ||
− | * We are going to work with the small diamonds data set again: [[Media:diamonds3K.xlsx|diamonds3K]] sampled from a larger data set with [http://ggplot2.tidyverse.org/reference/diamonds.html Codebook]. | + | * We are going to work with the small diamonds data set again, accessible here: [[Media:diamonds3K.xlsx|diamonds3K]] sampled from a larger data set with [http://ggplot2.tidyverse.org/reference/diamonds.html Codebook]. |
* We are going to look at the three dimensions of diamonds and how they covary: length (x), width (y), and height (z). | * We are going to look at the three dimensions of diamonds and how they covary: length (x), width (y), and height (z). | ||
* Make a scatter plot for (x versus y) and separately for (x versus z). | * Make a scatter plot for (x versus y) and separately for (x versus z). | ||
Line 6: | Line 6: | ||
* '''Do you see outliers in the plots?''' | * '''Do you see outliers in the plots?''' | ||
* Click on an outlier to turn it pink. | * Click on an outlier to turn it pink. | ||
− | * '''Is the outlier an outlier for the ''other'' relationship? (x versus y) versus (x versus z)'''. | + | * '''Is the outlier an outlier for the ''other'' relationship? (x versus y) versus (x versus z)'''. The dot on the other scatter plot also turns pink. |
− | * Look at the row in the data set for each outlier. | + | * Look at the row in the data set for each outlier. To find the row, press one of the arrows on the pink box that appeared on the lower left. |
− | * Can you tell if the data were recorded wrong or if the diamond really had those dimensions? Consider what x, y, and z mean and remember you also have a measure of the diamond's weight (carat). ''' | + | * Can you tell if the data were recorded wrong or if the diamond really had those dimensions? Consider what x, y, and z mean and remember you also have a measure of the diamond's weight (carat). Plot scatter plots with carat and x, y or z, and '''discuss'''. |
− | * '''Repeat for the other | + | * '''Repeat for the other most extreme outliers.''' |
− | + | * Plot the Residuals versus "X-Values" (explanatory variable), a histogram of the residuals, and a QQ-Plot of the residuals. '''Where does the outlier(s) show up? With the outliers removed, is a simple-linear regression analysis appropriate?''' | |
− | * Plot the Residuals versus X-Values, a histogram of the residuals, and a QQ-Plot of the residuals. '''Where does the outlier(s) show up? With the outliers removed, is a simple-linear regression analysis appropriate?''' | + | * '''Report the correlation coefficient and regression line with the outliers and without the outliers (see below).''' |
− | * To do the analyses without the outliers, use Stat, Regression, Simple Linear to Save Residuals | + | * To do the analyses without the outliers, use Stat, Regression, Simple Linear to Save Residuals, then use a "where" function to restrict the data to points with small enough residuals. |
− |
Latest revision as of 18:06, 17 February 2019
Lab 3: Regression, Correlation and Outliers
- We are going to work with the small diamonds data set again, accessible here: diamonds3K sampled from a larger data set with Codebook.
- We are going to look at the three dimensions of diamonds and how they covary: length (x), width (y), and height (z).
- Make a scatter plot for (x versus y) and separately for (x versus z).
- One of the conditions for correlation and regression is the "No outliers condition."
- Do you see outliers in the plots?
- Click on an outlier to turn it pink.
- Is the outlier an outlier for the other relationship? (x versus y) versus (x versus z). The dot on the other scatter plot also turns pink.
- Look at the row in the data set for each outlier. To find the row, press one of the arrows on the pink box that appeared on the lower left.
- Can you tell if the data were recorded wrong or if the diamond really had those dimensions? Consider what x, y, and z mean and remember you also have a measure of the diamond's weight (carat). Plot scatter plots with carat and x, y or z, and discuss.
- Repeat for the other most extreme outliers.
- Plot the Residuals versus "X-Values" (explanatory variable), a histogram of the residuals, and a QQ-Plot of the residuals. Where does the outlier(s) show up? With the outliers removed, is a simple-linear regression analysis appropriate?
- Report the correlation coefficient and regression line with the outliers and without the outliers (see below).
- To do the analyses without the outliers, use Stat, Regression, Simple Linear to Save Residuals, then use a "where" function to restrict the data to points with small enough residuals.