Following the highly smushed boxes of yesterday, my next step was to limit our y-axis only to the lower end, where the vast majority of our data points were:
– Now we can see that our Business folks (on the left) and our larger cohort of all Paid users (on the right), have roughly the same median chat duration.
– In the interest of curiosity, though, it seems like this deserves more consideration, especially with the monster number of outliers. Box-and-whisker graphs are also not largely well understood, so bringing this before a broad audience wouldn’t work well if the goal is to communicate a difference (or lack of difference) in an effective way.
> library(ggplot2) > mydata = read.csv(“~/olark_april_2015.csv”) > p = ggplot(mydata, aes(group_title, chat_duration)) > p + geom_boxplot() + scale_y_continuous(limits=c(0,5000))