DVC Day 28: Enhance!

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)



Following the highly smushed boxes of yesterday, my next step was to limit our y-axis only to the lower end, where the vast majority of our data points were:

Screen Shot 2015-05-11 at 4.02.23 PM


– Now we can see that our Business folks (on the left) and our larger cohort of all Paid users (on the right), have roughly the same median chat duration.
– In the interest of curiosity, though, it seems like this deserves more consideration, especially with the monster number of outliers. Box-and-whisker graphs are also not largely well understood, so bringing this before a broad audience wouldn’t work well if the goal is to communicate a difference (or lack of difference) in an effective way.


> library(ggplot2)
> mydata = read.csv(“~/olark_april_2015.csv”)
> p = ggplot(mydata, aes(group_title, chat_duration)) 
> p + geom_boxplot() + scale_y_continuous(limits=c(0,5000))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.