DVC Day 28: Enhance!

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the 'challenge' tag!)



Following the highly smushed boxes of yesterday, my next step was to limit our y-axis only to the lower end, where the vast majority of our data points were:

– Now we can see that our Business folks (on the left) and our larger cohort of all Paid users (on the right), have roughly the same median chat duration.
– In the interest of curiosity, though, it seems like this deserves more consideration, especially with the monster number of outliers. Box-and-whisker graphs are also not largely well understood, so bringing this before a broad audience wouldn’t work well if the goal is to communicate a difference (or lack of difference) in an effective way.


> library(ggplot2)
> mydata = read.csv(“~/olark_april_2015.csv”)
> p = ggplot(mydata, aes(group_title, chat_duration)) 
> p + geom_boxplot() + scale_y_continuous(limits=c(0,5000))

