Difference between revisions of "Diamonds Regression Lab"
(→Step 1: Feature Engineering) |
(→Step 2: Preform the Regression) |
||
Line 39: | Line 39: | ||
Carat = b0 + b1 "x+y+z" | Carat = b0 + b1 "x+y+z" | ||
− | Here "x*y*z" and "x+y+z" should be whatever you called the columns in the Feature Engineering Step. | + | Here "x*y*z" and "x+y+z" should be whatever you called the columns in the Feature Engineering Step. If you didn't specify labels, they will be labeled as shown here, by default. |
Revision as of 20:29, 26 August 2019
This Lab is Under Construction
The size of a diamond can described by its dimensions (x, y, and z) and by its weight (usually expressed in carats). We are going to investigate the relationship between these two descriptions. Download the diamonds3K data, and view the codebook a larger data set, from which our data were randomly sampled (so they would fit reliably into StatCrunch).
The weight of a diamond is its volume times its density. As you might imagine, the shape of a diamond matters. If diamonds were cut as cylinders, the relationship between weight and dimensions would be:
weight = density*pi/4 * x*y*z.
If diamonds were cut as right-circular-cones this relationship would be
weight = density*pi/12 * x*y*z.
In both cases, the weight is a number (coefficient) times the product of the dimensions. The coefficient on x*y*z is the same for all diamonds of the same shape.
You are going to find this coefficient for "round cut" diamonds. Note the shape of different diamonds may be slightly different, depending on the diamond, so you should expect scatter in the data.
You want to find a formula that predicts weight in terms of x, y, and z (listed in a data set). Your formula would be very useful for a jeweler! Measuring the dimensions of a diamond is sometimes prone to error, and if a round cut diamond doesn't closely follow the formula, that would suggest to the jeweler that they should check that the diamond really has a proper round cut and check that the measurements have been performed correctly.
Step 1: Feature Engineering
We have three predictor variables x, y, and z. But for simple linear regression, we can have only one explanatory variable. Therefore, we want to create a new feature that combines the information from all these variables. As suggested above, a good feature might be
x*y*z
For comparison, we are also going to use the following feature in a separate model:
x+y+z
Go ahead and create these columns in your data set. Use Data --> Compute --> Expression in StatCrunch.
Step 2: Preform the Regression
Use simple linear regression (Stat --> Regression --> Simple Linear) to compute b0 and b1 for the following models:
Carat = b0 + b1 "x*y*z"
Carat = b0 + b1 "x+y+z"
Here "x*y*z" and "x+y+z" should be whatever you called the columns in the Feature Engineering Step. If you didn't specify labels, they will be labeled as shown here, by default.