It turns out heat map style representations of correlations are pretty easy in ggplot2 – here’s one that includes all of our roughly numerical values in the diamonds data set:
– This is so far the most pre-processing we’ve had to do during the challenge. First, grab a sample, then grab only the numbers-based columns, then convert them all into R-recognized numeric values, then create the correlation, then melt the table into a more heatmap friendly format, and then plot that data. Phew.
– The visualization itself is sort of neat, but it doesn’t really bring us any new insights. It’s kind of interesting to see that table and depth are not all that correlated. It makes some sense after reading this, but I’m not totally sure I understand, to be honest.
– I can see how a heatmap style correlation matrix like this would be very handy for more numerically-oriented data sets. I wonder if there’s any way to include non-numerical values in this type of visualization.
>library(ggplot2) > library(reshape2) > set.seed(1117) > dsmall <- diamonds[sample(nrow(diamonds), 1000), ] > dnum <- dsmall[c("carat", "clarity", "depth", "table", "price")] > dnum <- sapply( dnum, as.numeric ) > dcor <- round(cor(dnum), 2) > melted_dcor <- melt(dcor) > ggplot(data=melted_dcor, aes(x=Var1, y=Var2, fill=value)) + geom_tile(color="white")