Diamonds Exploration 1
Today we considered data from a sample of 3000 round cut diamonds sold by a large retailer. We pulled three variables (cut, carat, and price) from a larger data set, including two of four of the famous 4 C's of diamonds: carat, cut, clarity, and color. We looked at each variable individually and did not consider relationships among the variables.
The "cut" variable is categorical with 5 possible values: Fair, Good, Very Good, Premium, and Ideal; these are in order of least desirable to most desirable making the variable ordinal. An examination of the frequency table shows that the choicest diamonds are most frequent in the data set, and frequency drops off in a consistent manner: Ideal (1216), Premium (765), Very Good (662), Good (277), Fair (80).
The "carat" variable is quantitative, ranging from 0.21 to 3.01, with median 0.71, mean 0.808, standard deviation 0.482, and quartiles 0.4 and 1.05. The implication is that almost 75% of diamonds sold by this retailer are under 1 carat. The distribution is skewed to the right, and multimodal, with many prominent peaks just above "round values" such as 0.5, 1, and 2 carats. This property of the distribution evidently reflects choices and trade offs when cutting the diamonds. The heaviest 76 diamonds in the upper tail are identified as outliers by the 1.5 IQR Rule which places the cut off at 2.02 carat. There are no identified outliers in the lower tail.
The "price" variable is quantitative, ranging from $326 to $18,700, with median $2,455.50 and mean $4019.66, standard deviation $4087.06. The distribution is strongly skewed to the right, making resistant measures more appropriate. The quartiles of the distribution are $954 and $5340.50 implying that almost 75% of diamonds are over $1000 each and over 25% exceed $5,000 each. The distribution is largely unimodal, with a peak between $700 and $800, however there may be a much lesser mode above $4000. The most expensive 212 diamonds in the upper tail are identified as outliers by the 1.5 IQR rule, which places the cutoff at $11,917. There are no identified outliers in the lower tail.