(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)
Yesterday, I plotted the distribution of price among the diamonds dataset. One of the cons was that it showed the price distribution, but failed to really indicate any reasoning or correlations that might help us understand _why_ the prices were the way that they were.
To add some more depth, I’ve plotted price on the y-axis and carat on the x-axis:
If you’ve spent any time with this dataset (or R tutorials) you’ve likely seen this visualization before.
Pros:
– Gives us more context about what might be driving price
– Has some interesting vertical separations
– Looks like it may indicate a trend
Cons:
– The density of points makes it hard to tell whether a dot is one data point deep or 300 data points deep
– It bothers me that “price” is vertical
– What are those vertical separations about?
Code:
library(ggplot2) qplot(carat, price, data=diamonds)
2 for 2! Vertical separations are interesting. Seems like they occur at regular 1/2 carrot intervals with 1/2 and 2 1/2 being a little less defined and 3 simply lined up dots. Guessing it is a psychological/marketing thing. Like a 1.899 carrot does not sound as impressive as a 2.001 carrot now does it? Probably can’t tell the difference by looking at it but would be more desirable to buy and sell because it sounds better. Just guessing.
I really like your thinking, Chris! There are so many things at play with diamonds, but you’re right, I’m not even sure what a carat is but I prefer 2 to 1.98 š