Author: Simon

Big Data, Maps, and the Precision Tradeoff

Preaching-From-the-Rooftops

In reading Discovering Statistics Using R, written by the peerless Andy Field, I came across his quick explanation of sampling, and why it’s done in science – and it brought together a broader circle of thinking that I’ve been chewing on recently. Let’s talk about abstraction.

We’ve all heard this one before, right? “The map is not the territory.” When we break this down a bit, it can tell us some interesting things. A map is useful because it communicates geography in an abstract way – it trades a certain amount of precision for a larger view, a more abstract vision of the lay of the land. A map that is perfectly precise would need to be the same size as the area it is representing – which would not help you find your way to the nearest gas station!

Consider the globe: the amount of information a globe has left out is monumental, and yet it is still highly useful – but the utility is abstracted from the precision in an interesting way. In this illustration we can see that information does not have to be perfectly precise to be useful – and in fact sometimes we can have _too much precision_, in the case of the lost travelers seeking a gas station.

Note, too, that if you’re lost somewhere between Wausau and Sheboygan, a perfectly precise map and a perfectly abstract globe are equally, perfectly, useless.

When conducting a scientific experiment, using a sample of the population operates in the same way – you conduct an experiment using a subset of the population, a sample, and then aim to apply your findings to the population at large. Interestingly, in science, as we increase precision by growing our sample size, we do not necessarily impact the final utility of our findings, but we do in fact make those findings more costly – and surely at some point there are declining returns for each additional research subject.

Again, we see in science, a need to balance precision with utility – you could, possibly, survey every English speaking human, but it’s not obviously true that your results would be much more useful than if you surveyed only 10% – or less. In this way, we can see another way in which we need to balance precision and utility.

As statistics and semi-scientific testing (I’m looking at you, ad-hoc hypothesizers!) becomes more popular among Big Data enthusiasts and Growth Engineers, it’s important that we keep in mind the need to balance precision, abstraction, and utility.

There comes a point where we can become so precise that we are no longer creating any good (and certainly no increase in revenue) – imagine discovering precisely the manner in which 19-year-old Scottish males named Chris use your product. While precise, how helpful is this? How actionable is this information? Where’s the utility?

In the same way, there comes a point at which abstraction is a barrier to action – anyone who has faced down the pure, unadulterated data barf of an untouched Google Analytics account can attest to that!

Let’s try to emulate Aristotle in finding the golden mean between these poles – considering the final utility of a study or experiment first, then adjusting the abstraction accordingly.

Kevin Kruse Interview on Fresh Air

tumblr_nlfosh9swa1sfie3io1_1280

The New Deal, they argue, violates this natural order. In fact, they argue that the New Deal and the regulatory state violate the Ten Commandments. It makes a false idol of the federal government and encourages Americans to worship it rather than the Almighty.

Really interesting and thought provoking interview with Kevin Kruse about his new book on Fresh Airit’s available here – worth a listen for students of history or political wonks, or, like me, the dangerous combination of the two.

The Omnipresence of Optionality

“Optionality is the property of asymmetric upside (preferably unlimited) with correspondingly limited downside (preferably tiny).”

Reading Antifragile has changed my outlook on many, many things – but the idea of optionality especially has been cropping up in unexpected places. Here, watch this highlight reel:

One thing almost all of these outstanding plays have in common? Besides outstanding athleticism and an egg-shaped ball, all of these scoring opportunities are notably opportunities gained through optionality: with the exception of the very first play, all of these tries were scored by players who not only positioned themselves to have a number of options, they also executed those options wisely.

In Taleb’s terms, they were able to create a situation with a huge opportunity for upside (gaining points) and very little exposure to a downside, since they were well-supported by multiple team mates at any given time.

When he said “Optionality can be found everywhere if you know how to look,” I wasn’t expecting to find it on the rugby pitch!

Google Analytics for Science from Scientists

Google Analytics for Science from Scientists

In January, I had an opportunity (through Catchafire) to work with the science education nonprofit Science from Scientists. They had recently set up a Google Analytics property on their web site, and were looking for a volunteer to get things running properly.

Working with their Director of Web Services, I developed:

  • Custom Dashboards to track engagement, donations, lesson plan usage, and geographic interest.
  • Automated email reporting to various staff members and departments.
  • A Campaign Tracking URL builder.
  • Educational screencasts for all of the above.

Today, I’m happy to click the “Project Complete” button at Catchafire, setting Science from Scientists to sail, equipped with a batch of customized data delivery utilities and the educational resources to make use of them in the future.

You can see my Catchafire profile here.