Tag: R

DVC Day 14: Request #2

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

Responding to a comment (again from Ben!) on Day 11, I’ve used the more atomic plotting function of ggplot to break the data into another separate facet_grid – which definitely helps to illustrate (a little) the relationship between clarity and price on J colored diamonds:

Screen Shot 2015-04-25 at 8.30.09 PM

Thoughts:
– This may be less colorful, but it is a much more clear representation of the data at hand – we can see that as we progress from left to right, the apparent upward slope of price becomes steeper.
– It’s interesting that there is a real density change in the 2000-5000 price as we move from left to right – that’s where the lion’s share of column 2, 3 and 4 are, but in 5, 6 and 7 they narrow out. Maybe this is a cultural thing, re: pricing expectations for certain clarities?

Code:

> library(ggplot2)
> jsmall <- subset(diamonds, color=="J" & carat <= 1.5)
> j.facet <- ggplot(jsmall, aes(carat, price)) + geom_point()
> j.facet + facet_grid(. ~ clarity)

DVC Day 13: Dropping the “Quick”

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

As I get more comfortable poking around in ggplot2, I’ve come to a place where I want to step back a bit – qplot is 100% awesome and is a great way to get started with this library, but I want to try my hand at putting together plots fully manually, if only to better understand the way that ggplot2 works more generally. So, I did! Here’s a plot where we’re trying to confirm that across diamond color, as carat size increases, price tends to increase as well. You recall, especially with J, this seemed to be in question:

Screen Shot 2015-04-25 at 10.13.07 AM

Thoughts:
– Again I pulled a sample from the (huge) diamonds dataset. It’s the same smaller sample that I used in earlier days.
– You’ll note that instead of using a single qplot, we put together a few lines of code.
– Here we have three things going on – points indicating carat v. price, a facet_grid indicating each color on its own facet (rather than representing them via different colors or symbols) and a smoother line, indicating general path of the data (as opposed to a more jagged point-to-point line)
– I prefer this to the all-dots-on-board approach that I was using before – while it was colorful, and interesting, I don’t think it illustrated the relationships as well as splitting them up into facets, like this.

Code:

library(ggplot2)
set.seed(1410)
dsmall <- diamonds[sample(nrow(diamonds),100),]
dvc <- ggplot(dsmall, aes(carat,price)) + geom_point()
dvc + facet_grid(. ~ color) + geom_smooth(fill=NA)

DVC Day 12

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

I, too, was disappointed in yesterday. I spent some time looking at the data, and figured maybe a different approach would yield more information – part of what made yesterday’s chart a bit unwieldy was the amount of information. I reduced the subset to only carat weight below one, and plotted a density graph of carat alone, with fill color defined with a cohort’s clarity:

Screen Shot 2015-04-21 at 8.47.58 PM

Thoughts:
– Now we’re cooking! Look at that – it looks like under half a carat, the “IF” clarity completely dominates the other categories. This was not at all evident from yesterday’s visualization.
– Again, using the Brewer Palette – these are really sharp, and have palette options for lots of different use cases. Is it weird that I’m excited about this?
– Remember, these are not all of the diamonds, but only those with the color “J”

Code:

library(ggplot2)
jsmaller = subset(diamonds, color="J", carat <= 1)
plot.jsmaller = qplot(carat,data=jsmaller,fill=clarity, binwidth=.01, geom="density")
plot.jsmaller + scale_color_brewer(palette="Set1")

DVC Day 11: Yes I Take Requests

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

Back on Day 6, my buddy Ben asked:

There’s a lot more data down in the sub 1.0 carat range than above. What happens when you restrict the set to less than a carat?

Let’s take a look! You’ll recall that on Day Six we were looking at only the diamonds of the J color – my favorite, I always root for the underdog – viewing carat v. price with each point’s color indicating its clarity.

Screen Shot 2015-04-21 at 8.28.47 PM

Thoughts:
– Also at Ben’s suggestion, I’ve started reading about Brewer Color Scales, which are really interesting, useful, and all around awesome.
– We’ve reduced our area to only “J” color diamonds below 1.5 carats – a bit more than requested but I was curious!
– I’ve also added an alpha value, which helps us to deal with the dot density a bit. It essentially sets the opacity of a single point, so areas of lower density can be more easily identified, since they’re a bit faded.
– This visualization is not great. It’s sort of hard to see what’s happening here. There must be a better way to display this in a way that can provide some insights.

Code:

library(ggplot2)
jsmall <- subset(diamonds, color=="J" & carat <= 1.5)
plot.j.small <- qplot(carat, price, data=jsmall, color=clarity, size=I(1.5), alpha=I(.5))
plot.j.small + scale_color_brewer(palette="Set1")

DVC Day 10: Stacks on Stacks

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

One third of the way through! I’ve played a bit with yesterday’s histogram and done two things – added ‘cut’ as a fill, and widened the bin a bit, just for looks:

Screen Shot 2015-04-20 at 8.29.07 PM

Thoughts:
Here’s the same visualization at the narrower bin width.
 It’s interesting, here we can see that although we have about as many .25 carat diamonds as we have 1 carat diamonds, the .25 carat cohort includes quite a lot more Ideal cut gems. I wonder if this is a natural consequence of being a bit smaller, or if “lesser” cuts of smaller diamonds are discarded or used in other ways more frequently, which would throw off the ratio, since the 1 carat bar doesn’t look so far off of the other spikes.

Code:

library(ggplot2)
qplot(carat, data=diamonds, geom="bar", fill=cut, binwidth=.04, xlim=c(0,3))