DVC Day 18: Heat Maps for Everybody!

(This Post is part of my 30 day Data Visualization Challenge – you can follow along using the ‘challenge’ tag!)

It turns out heat map style representations of correlations are pretty easy in ggplot2 – here’s one that includes all of our roughly numerical values in the diamonds data set:

Screen Shot 2015-04-30 at 8.40.12 PM

Thoughts:
– This is so far the most pre-processing we’ve had to do during the challenge. First, grab a sample, then grab only the numbers-based columns, then convert them all into R-recognized numeric values, then create the correlation, then melt the table into a more heatmap friendly format, and then plot that data. Phew.
– The visualization itself is sort of neat, but it doesn’t really bring us any new insights. It’s kind of interesting to see that table and depth are not all that correlated. It makes some sense after reading this, but I’m not totally sure I understand, to be honest.
– I can see how a heatmap style correlation matrix like this would be very handy for more numerically-oriented data sets. I wonder if there’s any way to include non-numerical values in this type of visualization.

Code:

>library(ggplot2)
> library(reshape2)
> set.seed(1117)
> dsmall <- diamonds[sample(nrow(diamonds), 1000), ]
> dnum <- dsmall[c("carat", "clarity", "depth", "table", "price")]
> dnum <- sapply( dnum, as.numeric )
> dcor <- round(cor(dnum), 2)
> melted_dcor <- melt(dcor)
> ggplot(data=melted_dcor, aes(x=Var1, y=Var2, fill=value)) + geom_tile(color="white")

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s