Tag: data

30 Days of Data Visualization Challenge

Processed with VSCOcam with hb2 preset

As I work my way through Discovering Statistics Using R and discover other R-related gems across the internet, I realize that I’m only going to get better at this software if I spend time using it.

As such, I’m challenging myself to do a new visualization of a single database every day for the next 30 days – starting today, April 15, and ending May 15. The goal of this is to become more familiar with the R language, more specifically the ggplot2 library, and to think about visualizing data more generally.

The data set I’ll be using is the “Diamonds” data set package with ggplot2.

Big Data, Maps, and the Precision Tradeoff

Preaching-From-the-Rooftops

In reading Discovering Statistics Using R, written by the peerless Andy Field, I came across his quick explanation of sampling, and why it’s done in science – and it brought together a broader circle of thinking that I’ve been chewing on recently. Let’s talk about abstraction.

We’ve all heard this one before, right? “The map is not the territory.” When we break this down a bit, it can tell us some interesting things. A map is useful because it communicates geography in an abstract way – it trades a certain amount of precision for a larger view, a more abstract vision of the lay of the land. A map that is perfectly precise would need to be the same size as the area it is representing – which would not help you find your way to the nearest gas station!

Consider the globe: the amount of information a globe has left out is monumental, and yet it is still highly useful – but the utility is abstracted from the precision in an interesting way. In this illustration we can see that information does not have to be perfectly precise to be useful – and in fact sometimes we can have _too much precision_, in the case of the lost travelers seeking a gas station.

Note, too, that if you’re lost somewhere between Wausau and Sheboygan, a perfectly precise map and a perfectly abstract globe are equally, perfectly, useless.

When conducting a scientific experiment, using a sample of the population operates in the same way – you conduct an experiment using a subset of the population, a sample, and then aim to apply your findings to the population at large. Interestingly, in science, as we increase precision by growing our sample size, we do not necessarily impact the final utility of our findings, but we do in fact make those findings more costly – and surely at some point there are declining returns for each additional research subject.

Again, we see in science, a need to balance precision with utility – you could, possibly, survey every English speaking human, but it’s not obviously true that your results would be much more useful than if you surveyed only 10% – or less. In this way, we can see another way in which we need to balance precision and utility.

As statistics and semi-scientific testing (I’m looking at you, ad-hoc hypothesizers!) becomes more popular among Big Data enthusiasts and Growth Engineers, it’s important that we keep in mind the need to balance precision, abstraction, and utility.

There comes a point where we can become so precise that we are no longer creating any good (and certainly no increase in revenue) – imagine discovering precisely the manner in which 19-year-old Scottish males named Chris use your product. While precise, how helpful is this? How actionable is this information? Where’s the utility?

In the same way, there comes a point at which abstraction is a barrier to action – anyone who has faced down the pure, unadulterated data barf of an untouched Google Analytics account can attest to that!

Let’s try to emulate Aristotle in finding the golden mean between these poles – considering the final utility of a study or experiment first, then adjusting the abstraction accordingly.

Google Analytics for Science from Scientists

Google Analytics for Science from Scientists

In January, I had an opportunity (through Catchafire) to work with the science education nonprofit Science from Scientists. They had recently set up a Google Analytics property on their web site, and were looking for a volunteer to get things running properly.

Working with their Director of Web Services, I developed:

  • Custom Dashboards to track engagement, donations, lesson plan usage, and geographic interest.
  • Automated email reporting to various staff members and departments.
  • A Campaign Tracking URL builder.
  • Educational screencasts for all of the above.

Today, I’m happy to click the “Project Complete” button at Catchafire, setting Science from Scientists to sail, equipped with a batch of customized data delivery utilities and the educational resources to make use of them in the future.

You can see my Catchafire profile here.

Finding Hospitality in the Numbers

Smart-Phone-Notebook-Pen-and-Open-Laptop-On-Desk

It’s always a funny thing when you find a problem you weren’t expecting – especially when spending time with usage data, taking a moment to blink once or twice and consider why something looks odd can really bear dividends.

When doing a fairly standard rundown of the support statistics for our in-app support, I noticed that, despite making up about 40% of our userbase, our Android app users were submitting as many support requests as our iOS users. This meant that an Android user was almost twice as likely to contact support as an iOS user.

This seemed strange – I did some digging. Was the Android app more difficult to use? The app store rating for the Android app was actually higher than that of the iOS app. It was also noteworthy that the Android users accessed the in-app FAQ about half as much as iOS users – perhaps for some reason Android users tended to speed past the FAQ and go directly to support? Perhaps the FAQ wasn’t displaying properly?

Like anyone feeling stumped, I brought the question to the team, hoping someone would find some insight where I didn’t – and it turned out that our Android application in fact offered more points of access to support than the iOS app – that is, the Android app offered folks a chance to reach support at points of failure and error messages, whereas the iOS device did not. All of these additional access points did not require a customer to go through the usual flow of FAQ before reaching out to support.

Mystery solved. We’re increasing the number of access points to support in the iOS app.

Working on the mobile apps has revealed to me again and again that the lower the barrier to entry is, the better you’ll be able to hear from your customers. They have a lot of valuable things to say – given the opportunity, they’ll help you to make better things.

If you’re keeping track, yes, this is the second story about working with the mobile team where I end up increasing the number of incoming support requests. Yes, I am the worst.

Getting Started with SQL

Startup Stock Photo

If you’re interested in data, you’re going to have to learn to interact with data one way or another – there are an awful lot of tools out there, and many are optimized for certain professions or fields. The Doctor uses SPSS for her analysis, but she’s an academic psychologist – not a super useful tool for folks interested in growth. For me, working at WordPress.com, SQL seemed like a great place to start, since many of the WordPress foundations are built on plain old SQL tables.

There are lots of places where folks will be very glad to take your money to teach you SQL (or anything else, for that matter) – in matters of education, I would encourage you to examine your options, and to at least get a taste of the no-cost options. With the Internet as it is, there is such a wealth of information and generosity of spirit, a dedicated and motivated learner can often find themselves with more than enough educational resources at their fingertips.

For me, I started with Khan Academy’s Hour of Code on SQL – available here – KA really does a great job, and the subjects they cover are growing every day. Once you’ve spent an hour with them, if you’re following my footsteps anyway, you’d want to move on to SQL Zoo, a wiki-style educational series of problems and a number of different databases to play with.

After SQL Zoo, I’m not sure! Are there other resources that you would recommend for a data-driven autodidact?