Author: Simon

DVC Day 27: Practical Applications

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

As we finish out the 30 days, I’ll actually be using an example of work that I did to test a hypothesis at Automattic. We currently provide live chat support to two cohorts of our customers, the folks who purchase WordPress.com Business, and our customers who have purchased any upgrade at all (mostly domains and WordPress.com Premium). There has been a longstanding assumption that our live chats with Business customers were longer in duration – they have access to Ecommerce options, as well as no-cost access to our entire library of Premium Themes.

So, I ported our live chat data out of Olark and into R, and threw together a box plot:

Screen Shot 2015-05-11 at 4.02.00 PM

Thoughts:
– If this looks wrong somehow, that’s because it is: our box is so small as to be flattened. All we really see are the massive upward outliers.
– This clearly does not do anything to help us decide which style of chat tends to be longer in duration – our Business folks are on the left here, and our Paid customers are on the right.
– Clearly the next step is figuring out how to change this display so we can see what those boxes look like in a zoomed-in view.

Code:

> library(ggplot2)
> mydata = read.csv(“~/olark_april_2015.csv”)
> p = ggplot(mydata, aes(group_title, chat_duration)) 
> p + geom_boxplot()

DVC Day 26: Final Stretch!

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

I’ve been noodling a bit more with manipulating variables before visualization – that is, a gem’s volume is X +Y + Z, but they exist as separate columns in the actual data set. When we look at the relationship between a gem’s volume and its measured clarity:

Screen Shot 2015-05-10 at 1.11.52 PM

Thoughts:
– It’s surprising to see so many regular bumps across clarities – it may be that this is related to the similar structure around carat weight, that folks prefer a cleaner, easier to understand number, which results in a little fudging around volumes.
– I wonder what the significance of this is? Is there some other relationship between volume and clarity?

Code:

> library(ggplot2)
> qplot(x+y+z, data=diamonds, binwidth=.07, color=clarity, geom="density", alpha=I(.25)) + scale_x_continuous(limits=c(10,25)) + theme_bw()

DVC Day 25: Depth vs. Price/Carat

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

When we take a closer look at this data, adding a bit more complexity – here, price per carat rather than simply carat weight – helps us to see that even as the lower clarities become larger, their price per carat stays consistently low. It’s also interesting to see that there appears to be no relationship at all between price/carat and a stone’s depth.

Screen Shot 2015-05-09 at 5.16.36 PM

Thoughts:
– The violin plot is added on top to help alleviate some of the dot density issues with such a large data set.
– Five days to go! Really this time 🙂

Code:

> p=ggplot(diamonds, aes(depth, price/carat))
> p + geom_point(color="gray", alpha=1/2) + facet_grid(.~clarity) + theme_bw() + geom_violin(color="blue", alpha=1/2)

DVC Day 24: Depth Charging

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

When we charge a bit more into the question of depth and cut, an interesting thing happens: it looks as though the depth for a particular cut becomes more narrow up to Premium – but then loosens up a bit for Ideal:

Screen Shot 2015-05-06 at 7.22.42 PM

Thoughts:
– Six days to go!

Code:

> library(ggplot2) > ggplot(diamonds, aes(depth, carat)) + geom_point() + facet_grid(. ~ cut) + theme_minimal()

DVC Day 23: Finally Some Depth

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

In getting back to my roots a little (by which I mean playing with qplot), I found kind of an interesting relationship between depth and cut. I was noodling around with some of the measures to see if they had any obvious impact on price – but instead we see this fairly strict ordering between cut and depth.

 

 

 

Screen Shot 2015-05-06 at 7.04.47 PM

Thoughts:
– This makes it appear that depth and cut have some sort of relationship, though what sort of relationship that is is not totally clear.
– It’s interesting to see that there is not an apparent correlation between depth and price, even while the cuts correlate to certain depths. Practically all of the outliers sit at relatively low price points.

Code:

> library(ggplot2)
> qplot(depth, price, data=diamonds, alpha=I(.5), color=cut)