Category: Work

DVC Day 7: Messin’ with Geoms

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

In noodling around with the different options of the qplot function (and there are plenty), I found myself going back and forth on the geom option – here’s one of the possible inputs, smooth, which takes us from yesterday’s graph to one of just very smooth lines, with a shading indicating the standard deviation of that particular collection of data:

 

 

nZnlIFK9Qn-3000x3000

Thoughts:
– This is a really interesting example of another case where we trade some visual precision for more visual utility – for example,that same graph using arguably more a more precise plotting of lines looks like a total, and useless, mess.
 The green line is particularly interesting, since it appears to plateau at a certain point – about the same place where it is the only remaining clarity.

Code:


library(ggplot2)
only.j <- subset(diamonds, color=="J")
j <- qplot(carat, price, data=only.j, color=clarity, geom=c("smooth"))
j

DVC Day 6

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

After taking a look at the different colors of diamonds in a sample, I noticed that the diamonds colored “J” appeared to be unusual outliers. Following this tack, I created a subset of the larger diamond data set that contained _only_ the J colored diamonds – then, plotted that on the same carat/price graph we’ve seen, but with color now indicating the diamond’s clarity:

ziakw_LTPh-3000x3000

Thoughts:
– I also added a title, and started using variable names as I build around a data frame, which makes it much easier.
– We can see at least one of those vertical striations that we saw in the original data set.
– It looks like the outliers on the low-price-high-carat scale of the J-colored diamonds are larger but less valuable than their peers.
– This graph is a bit muddy, but we can for sure see what look like trends in clarity correlating with price as we go from orange to green to pink/purple.

Code:


library(ggplot2)
only.j <- subset(diamonds, color=="J")
j <- j <- qplot(carat, price, data=only.j, color=clarity, size=I(1.5))
j <- j + ggtitle("J-Color Diamond Clarity & Pricing")
j

Live Chat and Lean Manufacturing

 

IMG_1883

In the Toyota Production System (TPS) and its ongoing adherence in Western companies (usually called Lean, often mixed in with Six Sigma processes), one of the ways that we are able to reduce waste is moving from batch production to single piece flow, or continuous flow.

The opposing styles here are characterized like so: if Process Zoidberg requires you to perform actions A, B, and C, and you have to perform 100 Zoidbergs, batch production would suggest you do _all_ of your As, then _all_ of your Bs, then _all_ of your Cs. Continuous flow would suggest that you do A, then B, then C, one hundred times.

When we think about supporting a customer base, we can visualize each customer experience as a finished product, with each of their questions or friction points as a discrete component. We could extend this metaphor to the entire product development life cycle, but for the scope of this article, let’s focus on the post-launch product support, by (mostly) dedicated support staff.

Thinking of customer support using the well-trod ground of manufacturing, we can start to use insights that have already provided serious gains for other industries – it can also help us to explain data that we already have, or better understand or phrase our support for new experiments and learning opportunities.

When we consider traditional email support from the side of the customer, a customer sends in a request, they wait, the support staff replies, wait, customer replies again, with a new question or concern, they wait, and so on. If you asked the customer, it looks a lot like an (especially slow) continuous flow model.

From the side of the support staff, we see a different picture: they reply to customer requests as they come in, working with many customers at many different points in that particular customers’ process. Rather than working with one customer from the beginning, through all of their questions, to the conclusion, they move from question to question.

When we consider live chat support, it looks to be much more in line with the continuous flow model – as a customer arrives, they are picked up by a support team member, and they are moved through each of their questions in turn, to the point of completion.

It would be interesting to see some data on how these two processes look side-by-side, especially in terms of efficiency of production – which here would mean customer-questions-answered. I acknowledge that it might be tricky to suss out exactly when a question is answered, especially in an automated way. Tricky, but interesting.

My hunch would be that providing support in the continuous-flow model would gain similar efficiency gains to the adoption of that model in other industries, but, that’s just a hunch.

DVC Day 5

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

One of the neat things that ggplot2 can do is take a plot like yesterday’s, and automatically add a third dimension of data – for today I added the column “color,” which indicates the color of the actual diamond measured, as a “color” aesthetic:

k0WNm7WHTW-3000x3000

Thoughts:
– Now we’re getting somewhere: we can see that yes, more carats tend to be associated with a higher price (with some outliers) but it also looks like certain colors (D,E, F) tend to be lower-priced and/or lower-carat. An interesting question might be to ask which of those factors is more strongly correlated with particular colors.
– We can also see that two of our outliers in this sample are both Js, so there may be interesting things to look into w/r/t that particular color.
– It might be nice to have some sort of drawn line or curve indicating a general trend, as well as maybe a trend per color?

Code:

library(ggplot2)
set.seed(1410)
dsmall <- diamonds[sample(nrow(diamonds),100),]
qplot(carat, price, data=dsmall, size=I(2), color=color)

DVC Day 4

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

Yesterday’s visualization tried to handle this issue of data point density by reducing the actual size of the dots on the graph. The ggplot2 book (there’s a book!) recommends giving sampling a try. I’ve written a little about sampling and abstraction in the past – so let’s give it a try!

9wQlBfMiCS.thumbNow we’re cooking! I did have to bump the size of the points back up, as only 100 of those specks were not visually very helpful – but now we have a bit more of a visually intuitive sense of what we’re looking at, without guessing at the larger, imprecise ink blots.

It’s interesting, to me as a lapsed philosopher, that sampling (as an abstraction) necessarily means that we’re giving up precision (in the form of data points), but we’re doing so to gain another type of precision, that is, quick and accurate visual meaning. I’m dropping the Pros list – I’ll try to cover positive thoughts in the body of each Post.

Thoughts:
– What does it mean, though? We can see that there appears to be some sort of relationship between price and carat – how do the other factors come into play here? Are there other patterns at work?
– How can I make it prettier?
– It bothers me that “price” is vertical still. Dangit.

Code:

library(ggplot2)
set.seed(1410)
dsmall <- diamonds[sample(nrow(diamonds),100),]
qplot(carat, price, data=dsmall, size=I(2))