Big Data, Maps, and the Precision Tradeoff


In reading Discovering Statistics Using R, written by the peerless Andy Field, I came across his quick explanation of sampling, and why it’s done in science – and it brought together a broader circle of thinking that I’ve been chewing on recently. Let’s talk about abstraction.

We’ve all heard this one before, right? “The map is not the territory.” When we break this down a bit, it can tell us some interesting things. A map is useful because it communicates geography in an abstract way – it trades a certain amount of precision for a larger view, a more abstract vision of the lay of the land. A map that is perfectly precise would need to be the same size as the area it is representing – which would not help you find your way to the nearest gas station!

Consider the globe: the amount of information a globe has left out is monumental, and yet it is still highly useful – but the utility is abstracted from the precision in an interesting way. In this illustration we can see that information does not have to be perfectly precise to be useful – and in fact sometimes we can have _too much precision_, in the case of the lost travelers seeking a gas station.

Note, too, that if you’re lost somewhere between Wausau and Sheboygan, a perfectly precise map and a perfectly abstract globe are equally, perfectly, useless.

When conducting a scientific experiment, using a sample of the population operates in the same way – you conduct an experiment using a subset of the population, a sample, and then aim to apply your findings to the population at large. Interestingly, in science, as we increase precision by growing our sample size, we do not necessarily impact the final utility of our findings, but we do in fact make those findings more costly – and surely at some point there are declining returns for each additional research subject.

Again, we see in science, a need to balance precision with utility – you could, possibly, survey every English speaking human, but it’s not obviously true that your results would be much more useful than if you surveyed only 10% – or less. In this way, we can see another way in which we need to balance precision and utility.

As statistics and semi-scientific testing (I’m looking at you, ad-hoc hypothesizers!) becomes more popular among Big Data enthusiasts and Growth Engineers, it’s important that we keep in mind the need to balance precision, abstraction, and utility.

There comes a point where we can become so precise that we are no longer creating any good (and certainly no increase in revenue) – imagine discovering precisely the manner in which 19-year-old Scottish males named Chris use your product. While precise, how helpful is this? How actionable is this information? Where’s the utility?

In the same way, there comes a point at which abstraction is a barrier to action – anyone who has faced down the pure, unadulterated data barf of an untouched Google Analytics account can attest to that!

Let’s try to emulate Aristotle in finding the golden mean between these poles – considering the final utility of a study or experiment first, then adjusting the abstraction accordingly.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.