Category: R

Pies and Waffles: Delicious Charts

I’m trying to catch up on my massive Pocket back scroll, and in surveying the massive and diverse landscape of its contents, noticed a few pieces all from the same site,  eagereyes, and all on the same topic, pie charts.

So, naturally, I read them. 

(As a sidebar, am I the only person who struggles with this with Pocket, or other content saving services? Am I coining the term “Pocket Zero” right now? Am I the next Merlin Mann? )

Here are the pieces – they’re all quite short, less than ten minutes reading, even if you do take in the discussion with Hadley Wickham in the comments section:

A Pair of Pie Chart Papers
Ye Olde Pie Chart Debate
Pie Charts
One thing I was surprised to learn was just how long the Great Pie Chart Debate has been going on – over a hundred years! And yet,  the pie chart lives on. 

It’s also interesting to me that,  despite their ubiquity in popular media, we don’t have a great sense of how or why we perceive pie charts the way we do – it makes me consider firing up the Doc’s eye tracker, just to see how eye patterns map onto different visualizations.

In this series of posts I was also introduced to the Waffle package for R, which makes it easy to put together a pie chart alternative which I quite like – like this:

It strikes me as easier than a pie chart to compare each of the pieces to one another, and indicates that each point is part of a continuous whole in the same sort of way that a pie chart does. 

I’m excited to play around with this package some in the coming days. I’ll have to dig a bit and see if it’s supported in Shiny yet!
 

STILL Visualizing the Support Driven Survey

I have been away from the blog for a bit – during the time I normally spend blogging and thinking about blogging, I’ve been spending trying to get to know a new tool for my R toolbox, a web app platform for R called Shiny.

Those of you who have been around for a while are familiar with my bizarre love of the intersection of information and design that data visualization represents, especially given that I am neither a statistician nor an artist.

(The heart wants what the heart wants!)

As demonstrated previously on this blog, I learn best by doing (hence my 30 day visualization sprint wherein I took a dive into the R library ggplot2) – so after going through the Shiny tutorial, I gave it a try, and pushed live my first ever web app, a super rudimentary user-adjustable visualization of the recent Support Driven compensation survey.

So, here’s a link. Check it out. I would genuinely and with a full heart appreciate your feedback. 

Munging NASA’s Open Meteor Data

Munging NASA’s Open Meteor Data

In snooping around the US Government’s open data sets a few months back, I found out that NASA has an entire web site dedicated to their publicly available data: https://data.nasa.gov/

Surely, you understand why that would excite me!

I dug around a bit and pulled out some information on meteor landings in the United States, with tons of information, mass, date, lots of stuff.

To simplify the data set and make things tidy for R, I wrote a quick Python script to strip out some columns and clean up the dates. Here’s the gist if you want to have a go at the data as well.

I ended up looking to see if there was a trend between date and meteor mass, to see if maybe there were obvious cycles or other interesting stuff, but some super-massive meteors ended up shoving the data into pretty uninteresting visualizations, which is too bad.

We can do some simpler stuff, even with some super-massive meteors. For instance, here’s a log(mass) histogram of all of the meteors:

Screen Shot 2016-01-05 at 7.49.24 PM.png

Check it out! It results in a somewhat normal, slightly right-skewed distribution. That means we can use inferential statistics on it, although I am not sure why you would want to! The R code is a super quick ggplot2 script.

It’s pretty amazing how easily we can access so, so much information. The trouble is figuring out how to use it in an actionable and simply explained way. The above histogram is accurate, and looks pretty (steelblue, the preferred default color of data folks everywhere), but it isn’t actually helpful in any way.

Just because we can transform a dense .csv into a readable chart doesn’t mean it’s going to be useful.