Tag: Data Visualization

Pies and Waffles: Delicious Charts

I’m trying to catch up on my massive Pocket back scroll, and in surveying the massive and diverse landscape of its contents, noticed a few pieces all from the same site,  eagereyes, and all on the same topic, pie charts.

So, naturally, I read them. 

(As a sidebar, am I the only person who struggles with this with Pocket, or other content saving services? Am I coining the term “Pocket Zero” right now? Am I the next Merlin Mann? )

Here are the pieces – they’re all quite short, less than ten minutes reading, even if you do take in the discussion with Hadley Wickham in the comments section:

A Pair of Pie Chart Papers
Ye Olde Pie Chart Debate
Pie Charts
One thing I was surprised to learn was just how long the Great Pie Chart Debate has been going on – over a hundred years! And yet,  the pie chart lives on. 

It’s also interesting to me that,  despite their ubiquity in popular media, we don’t have a great sense of how or why we perceive pie charts the way we do – it makes me consider firing up the Doc’s eye tracker, just to see how eye patterns map onto different visualizations.

In this series of posts I was also introduced to the Waffle package for R, which makes it easy to put together a pie chart alternative which I quite like – like this:

It strikes me as easier than a pie chart to compare each of the pieces to one another, and indicates that each point is part of a continuous whole in the same sort of way that a pie chart does. 

I’m excited to play around with this package some in the coming days. I’ll have to dig a bit and see if it’s supported in Shiny yet!

STILL Visualizing the Support Driven Survey

I have been away from the blog for a bit – during the time I normally spend blogging and thinking about blogging, I’ve been spending trying to get to know a new tool for my R toolbox, a web app platform for R called Shiny.

Those of you who have been around for a while are familiar with my bizarre love of the intersection of information and design that data visualization represents, especially given that I am neither a statistician nor an artist.

(The heart wants what the heart wants!)

As demonstrated previously on this blog, I learn best by doing (hence my 30 day visualization sprint wherein I took a dive into the R library ggplot2) – so after going through the Shiny tutorial, I gave it a try, and pushed live my first ever web app, a super rudimentary user-adjustable visualization of the recent Support Driven compensation survey.

So, here’s a link. Check it out. I would genuinely and with a full heart appreciate your feedback. 

Munging NASA’s Open Meteor Data

Munging NASA’s Open Meteor Data

In snooping around the US Government’s open data sets a few months back, I found out that NASA has an entire web site dedicated to their publicly available data: https://data.nasa.gov/

Surely, you understand why that would excite me!

I dug around a bit and pulled out some information on meteor landings in the United States, with tons of information, mass, date, lots of stuff.

To simplify the data set and make things tidy for R, I wrote a quick Python script to strip out some columns and clean up the dates. Here’s the gist if you want to have a go at the data as well.

I ended up looking to see if there was a trend between date and meteor mass, to see if maybe there were obvious cycles or other interesting stuff, but some super-massive meteors ended up shoving the data into pretty uninteresting visualizations, which is too bad.

We can do some simpler stuff, even with some super-massive meteors. For instance, here’s a log(mass) histogram of all of the meteors:

Screen Shot 2016-01-05 at 7.49.24 PM.png

Check it out! It results in a somewhat normal, slightly right-skewed distribution. That means we can use inferential statistics on it, although I am not sure why you would want to! The R code is a super quick ggplot2 script.

It’s pretty amazing how easily we can access so, so much information. The trouble is figuring out how to use it in an actionable and simply explained way. The above histogram is accurate, and looks pretty (steelblue, the preferred default color of data folks everywhere), but it isn’t actually helpful in any way.

Just because we can transform a dense .csv into a readable chart doesn’t mean it’s going to be useful.

30 Day Challenge Post Mortem


I’ll admit up front that working with R every day for thirty days, producing a new visualization every day, was both harder and easier than I thought it was going to be.

There were days when I felt like I was on fire, found an interesting thread and produced four or five days of visualizations all at once. There were also days where it felt like a real drag, just trying to find something that even looked a little interesting.

There is some debate on the internet about whether a thirty day time period is sufficient to make something a habit – I can’t really speak to that, as creating a habit wasn’t the goal. The goal was to become familiar with a particular R library (ggplot2), and I think that goal has certainly been accomplished.

I really liked this format – thirty days is long enough to feel possible, for the finish line to always be in sight, but still requires discipline and buy-in. As far as a way to jump start a new skill, we’ll have to see a bit farther down the line, but I certainly feel about a hundred times more comfortable with ggplot2 than I did when I started the whole thing.

I’d recommend this format to folks who are looking to mix up their personal development. The hardest part is choosing an activity that will be interesting and challenging to do, thirty times, every day, but without picking something so large that it becomes onerous or negatively stressful.

I had considered, for instance, to use a new statistical analysis every day for thirty days. That would probably have been a bit too large a bite for me, and I would have really struggled to accomplish it.

Now, the only question remaining is: what should my next challenge be?