Here we have the final graph that I presented to the rest of my colleagues in discussing the difference between our chat durations with Paid customers vs. our Business customers.
– This is more effective than the box-and-whisker graph because it illustrates that while Paid and Business chats may have roughly the same median duration, the breakdown of the chat duration field is not the same – note how the Business chats bump out on the longer end. Very interesting.
– Note also that the duration piece has been changed to a log scale – this is to handle some of those huge outliers.
&gt; library(ggplot2) &gt; mydata = read.csv(“~/olark_april_2015.csv”) > q = ggplot(mydata,aes(log(chat_duration))) > q + geom_density(aes(fill=factor(group_title, labels=c("Business","Paid")) , alpha=1/4)) + ylab("% of Total Chats")