OK, so let’s get something out of the way up front – yes, I wanted to be a data scientist.
But you know what? Once, I also wanted to be a professional coffee roaster.
These jobs (and aspirations) are similar primarily insofar as that my desire to have them took a nosedive once I got a real glimpse of what doing them was like.
If you like the idea of working with data, if you see yourself as someone who has ambitions or aspirations working in the data space, you should read this article from Dan Friedman – Data Science: Reality Doesn’t Meet Expectations
I work closely with data scientists – in some ways I genuinely envy their approach to work, and the way that they can find impact within organizations. I am super glad they are out there and I am so grateful for the insights and thoughtfulness they bring to the table – but that job’s not for me!
The job that I’ve found suits my nature, allows me to have a lot of impact, and work on important and interesting problems, is a new one – the Analytics Engineer!
Job titles in data and in tech are hard – do we really need a new one? The Analytics Engineer is this sort of emergent term, that describes an area of work that folks have been operating in for a while now, but with modern tooling and third party solutions has seen a rising need.
NB, not everyone knows that they need an Analytics Engineer – often you’ll see job descriptions for titles like Data Analysts, Business Analysts, Data Engineers, even Data Scientists – but the work that will be expected is Analytics Engineering work.
That work is more technical than a strictly Excel based analyst – no disrespect to Excel, sufficiently advanced Excel is indistinguishable from software engineering in my opinion, but, you will need some SQL chops to be effective as an Analytics Engineer. It’s less statistically heavy than a data science role. It requires literacy in data engineering but, in most cases, not necessarily the chops to originate an Airflow DAG. Strong opinions about data architecture is helpful but, often you can learn that on the job!
As I talk more with folks about this kind of work, and as we struggle to find qualified candidates for our own teams, I realized that I’ve repeated the same advice probably a half dozen times: sometimes to friends, at least once to an Uber driver, over Slack and in person. When this happens, I take it as a strong signal that I ought to put up a blog post!
So here it is: this is my guide to how you can become a competitive candidate for Analytics Engineering roles (even if they’re hiring for the wrong job title!)
One of the challenges to gaining the kind of experience you need in order to become a competitive candidate is that much of the best in class tooling for this kind of work is either hard to use alone or prohibitively expensive – something like Airflow is a great solution and very broadly used, but, it’s going to be a challenge to set up locally to use with toy data. Looker is a very common tool for this kind of work, but is terribly expensive for an individual to use as an educational tool.
So, this set of suggestions is meant to be used in reality by anyone – you should be able to follow this advice at low or no cost.
Yes, if a job description is looking for Airflow ETL experience or Looker modeling experience, you won’t have exactly that – BUT as someone hiring into a role with exactly that wording in our job description, I also recognize that the free tooling below is eminently transferable to the tooling that we use in-house. You can mention that you accomplished the same tasks with a different tool and that the skills are laterally transferable in the cover letter – a cover letter with that kind of attention to detail is already ahead of the pack.
Here’s your stack:
FIRST you have to find some free data that you’re interested in. That second part should not be neglected – if you want to see this project through to its completion (and gain your Competitive Candidate merit badge!) , it is absolutely imperative that you make choices that make it as easy as possible for you to stay motivated!
Are you interested in food? See if you can get data from your local agricultural co ops or agencies on historical data. I’m interested in local politics, so I FOIA requested the voter registration data for the entire State of New York – it came on a CD!
Being interested in the data you’re using is going to make a big difference when it comes to understanding it, modeling it, and then building some reporting – especially if the only end consumer is you! Bonus points if it is a streaming source of regularly-updated data, like web traffic or an ecommerce application.
SECOND I recommend using BigQuery as your data storage solution – they have good docs, they have a free plan, and they integrate really easily with the other parts of the data stack. If you have another solution you prefer, that’s fine too!
THIRD You must learn the excellent and open source dbt from your friends and mine at Fishtown. Here’s the tutorial and here is the Slack community. dbt is what you’ll use to take your ocean of raw data, transform it into tables that fit the dimensional modeling standard, and apply robust testing to those transformations.
If you have a little extra cash for this endeavor, I recommend buying the Database Warehouse Toolkit and reading the first four chapters to really dig deep into dimensional modeling. If you’re trying to stay absolutely no-cost, you can suss out some blog posts and other resources for free!
FOURTH You’ll build out your final reporting using the free tier of Mode Analytics – note that in order to stay within their free tier, you may need to reduce your final reporting tables to “Thousands of Rows” – take this as an extra challenge to your transformation later, and an opportunity to additionally leverage the power of dbt!
FIFTH Make sure you document the journey – I always recommend blog posts, but probably a well documented Github repo will be more interesting, and more likely to be reviewed, by most technical hiring managers.
At the end, your process would look something like this:
I recognize that the above glosses over a lot of the work that is behind this proposal – probably a dedicated person already working full time, putting in some time nights and weekends, could get through the above in six months. It’s not a short trip, but, if you’re looking to make a move, this is one way to do it.
The need for Analytics Engineers is only growing, even if the job title itself is still only starting to gain steam – I hope you’ll give it a try!