DVC Day 3

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

After plotting the full dataset on a price vs. carat graph, one of the problems that occurred was this idea of dot density – proper data scientists probably have a more technical term for it. That is, with so many data points, it’s hard to tell how “deep” a dot is, since a visible dot may represent a larger number of data points.

It seems like one possible solution would be to reduce the size of each dot – since each dot’s size may be causing it to visually encroach upon nearby data points, making the graph less visually useful. So, I tried that:

Pros:
– Offers a bit more nuance to the visual distribution of price vs. carat.
– Maintains the interesting vertical separations

Cons:
– This isn’t really a solution – with this number of data points, we still experience these big ink blots of imprecise “Well, there’s lots.” areas.
– It bothers me that “price” is vertical still. I forgot about that.
– The small size of the dots makes it tough to quickly distinguish outliers from a dirty laptop screen.

Code:

library(ggplot2)
qplot(carat, price, data=diamonds, size=I(1/3))

Work

5 thoughts on “DVC Day 3”

One thing this does illustrate is the weird concentration of diamond sizes on “whole” divisions. 0.5 / 0.75 / 0.8 / 1.0 / 1.25 / 1.5 / 1.75 (a bit ) / 2.0

After 2.0, we lose the distinction.

Ben Lowery says:

April 20, 2015 at 8:56 pm

Ah! This was covered on day 2! This is what I get for reading backwards. 😀

Reply

The word I’ve used for “dot density” is “occlusion”. 😀

Having a hard time finding that word in use in the wild though! I’m pretty sure it was that, anyway.

Simon says:

May 11, 2015 at 11:56 am

Occlusion – I like it!

Reply

Simon Ouderkirk

DVC Day 3

5 thoughts on “DVC Day 3”

Leave a comment Cancel reply

Share this:

Related

5 thoughts on “DVC Day 3”

Leave a comment Cancel reply