Category: General

The Emergent Import of the Tangible for the Remote Workplace

(what a title!)

The longer I lead remote teams, the more clear it becomes to me, that there are things which take on an outsized importance within a remote team – especially within an all-remote organization.

(an aside: I appreciate the definitional challenge of calling an organization “all remote” – remote from what exactly? “Distributed” is an alternate term that I appreciate more but, has yet to gain a ton of traction)

Some of the things that take on an outsized, and sometimes surprising, importance, that I’ve written about in the past:

Phatic Communication

Eating On Camera

What’s starting to appear more and more to me these days, is how important, or perhaps how impactful, the tangible becomes.

What do I mean when I say, the tangible?

Over the course of the last year, we’ve built out an in-house ETL solution that we call SQLT – pronounced “Sequel Tee” – and had to work really hard to gain buy-in from folks within the organization to start using it, rather than defaulting to other solutions that were more familiar or more comfortable, for whatever reason.

Once we had a couple dozen folks who had committed code using SQLT, I had some stickers made, and I mailed them to those who had a commit on their record – they were a surprise, no one was aware of them coming. I trucked down to the post office and sent a dozen envelopes to three different continents.

I have a few left! They look like this:


…and it was a great hit! Folks really appreciated them!

I wonder, if we were in a co-located environment, if it would have had much of an impact. Since, in an in-person workplace, you have many, nearly constant, tangible artifacts of your work – an office, a desk, a cafeteria.

In a distributed workplace, the tangible is much more rare: your workplace is your home, or a cafe, or another shared space – areas that aren’t exclusively the place for work, and often serve many duties.

You don’t often interact with your colleagues in the flesh: this is part of what makes regular meetups or conferences an important source of connection and re-connection.

So too, I think, receiving something you can hold, something that comes in the mail, from your otherwise largely ephemeral colleagues, takes on an outsized impact in a distributed environment.

I used to send my teammates postcards on their birthdays and their Automattic hiring anniversaries (naturally, Automattiversaries) – something that would probably seem a bit odd in a co-located workplace, but in a distributed one, it really felt like a special token of recognition, a tangible touchstone of the time and the work we’d done together.

Due to the intentionality that phatic communication requires when working remotely, it’s easy for distributed workers to fall into communication patterns that are totally professional: interactions can be purely work focused, transactional, without the kind of socially pleasant borders and decoration that you get for free in a co-located environment.

Asking your colleagues to eat lunch on camera can feel a bit awkward or out of place.  After all, in a co-located environment, which we still have in our brains as the default, it would be odd to ask for! But, part of the distributed organization’s success relies upon us recognizing the things we no longer get for free, what we maybe took for granted in a co-located office, and how we might replace them, or improve upon them.

Like that eating-on-camera piece, I think a birthday or work anniversary postcard would be strange in a traditional office – but it is, not only not strange, but maybe quite important in a distributed workplace.

The injections of the tangible help remind us that our colleagues and relationships are real – a postcard or a goofy sticker, by existing between our fingers, offers a kind of reminder that our colleagues too, are real and tangible.



Become an Analytics Engineer!

OK, so let’s get something out of the way up front – yes, I wanted to be a data scientist.

But you know what? Once, I also wanted to be a professional coffee roaster.

These jobs (and aspirations) are similar primarily insofar as that my desire to have them took a nosedive once I got a real glimpse of what doing them was like.

If you like the idea of working with data, if you see yourself as someone who has ambitions or aspirations working in the data space, you should read this article from Dan Friedman – Data Science: Reality Doesn’t Meet Expectations

I work closely with data scientists – in some ways I genuinely envy their approach to work, and the way that they can find impact within organizations. I am super glad they are out there and I am so grateful for the insights and thoughtfulness they bring to the table – but that job’s not for me!

The job that I’ve found suits my nature, allows me to have a lot of impact, and work on important and interesting problems, is a new one – the Analytics Engineer!

Job titles in data and in tech are hard – do we really need a new one? The Analytics Engineer is this sort of emergent term, that describes an area of work that folks have been operating in for a while now, but with modern tooling and third party solutions has seen a rising need.

NB, not everyone knows that they need an Analytics Engineer – often you’ll see job descriptions for titles like Data Analysts, Business Analysts, Data Engineers, even Data Scientists – but the work that will be expected is Analytics Engineering work.

That work is more technical than a strictly Excel based analyst – no disrespect to Excel, sufficiently advanced Excel is indistinguishable from software engineering in my opinion, but, you will need some SQL chops to be effective as an Analytics Engineer. It’s less statistically heavy than a data science role. It requires literacy in data engineering but, in most cases, not necessarily the chops to originate an Airflow DAG. Strong opinions about data architecture is helpful but, often you can learn that on the job!

As I talk more with folks about this kind of work, and as we struggle to find qualified candidates for our own teams, I realized that I’ve repeated the same advice probably a half dozen times: sometimes to friends, at least once to an Uber driver, over Slack and in person. When this happens, I take it as a strong signal that I ought to put up a blog post!

So here it is: this is my guide to how you can become a competitive candidate for Analytics Engineering roles (even if they’re hiring for the wrong job title!)

One of the challenges to gaining the kind of experience you need in order to become a competitive candidate is that much of the best in class tooling for this kind of work is either hard to use alone or prohibitively expensive – something like Airflow is a great solution and very broadly used, but, it’s going to be a challenge to set up locally to use with toy data. Looker is a very common tool for this kind of work, but is terribly expensive for an individual to use as an educational tool.

So, this set of suggestions is meant to be used in reality by anyone – you should be able to follow this advice at low or no cost.

Yes, if a job description is looking for Airflow ETL experience or Looker modeling experience, you won’t have exactly that – BUT as someone hiring into a role with exactly that wording in our job description, I also recognize that the free tooling below is eminently transferable to the tooling that we use in-house. You can mention that you accomplished the same tasks with a different tool and that the skills are laterally transferable in the cover letter – a cover letter with that kind of attention to detail is already ahead of the pack.

Here’s your stack:

FIRST you have to find some free data that you’re interested in. That second part should not be neglected – if you want to see this project through to its completion (and gain your Competitive Candidate merit badge!) , it is absolutely imperative that you make choices that make it as easy as possible for you to stay motivated!

Are you interested in food? See if you can get data from your local agricultural co ops or agencies on historical data. I’m interested in local politics, so I FOIA requested the voter registration data for the entire State of New York – it came on a CD!


Being interested in the data you’re using is going to make a big difference when it comes to understanding it, modeling it, and then building some reporting – especially if the only end consumer is you! Bonus points if it is a streaming source of regularly-updated data, like web traffic or an ecommerce application.

SECOND I recommend using BigQuery as your data storage solution – they have good docs, they have a free plan, and they integrate really easily with the other parts of the data stack. If you have another solution you prefer, that’s fine too!

THIRD You must learn the excellent and open source dbt from your friends and mine at Fishtown. Here’s the tutorial and here is the Slack community. dbt is what you’ll use to take your ocean of raw data, transform it into  tables that fit the dimensional modeling standard, and apply robust testing to those transformations.

If you have a little extra cash for this endeavor, I recommend buying the Database Warehouse Toolkit and reading the first four chapters to really dig deep into dimensional modeling. If you’re trying to stay absolutely no-cost, you can suss out some blog posts and other resources for free!

FOURTH You’ll build out your final reporting using the free tier of Mode Analytics – note that in order to stay within their free tier, you may need to reduce your final reporting tables to “Thousands of Rows” – take this as an extra challenge to your transformation later, and an opportunity to additionally leverage the power of dbt!

FIFTH Make sure you document the journey – I always recommend blog posts, but probably a well documented Github repo will be more interesting, and more likely to be reviewed, by most technical hiring managers.

At the end, your process would look something like this:

I recognize that the above glosses over a lot of the work that is behind this proposal – probably a dedicated person already working full time, putting in some time nights and weekends, could get through the above in six months. It’s not a short trip, but, if you’re looking to make a move, this is one way to do it.

The need for Analytics Engineers is only growing, even if the job title itself is still only starting to gain steam – I hope you’ll give it a try!

I’m Giving Video Content a Try!

As y’all may recall, last year I was lucky enough to spens some time working with the fine folks at Locally Optimistic to produce and run some AMA content for them – they ended up being more similar to traditional interviews, but folks seemed to enjoy them!

You can find those all here!

These were well received, and generated a TON of insight for folks working in the data and analytics space – but I had a few things I wanted to try doing a little differently:

  • They could be more discoverable: it was tough to know which guests talked about what, they were about an hour long so it was a big bite of content if there was only one thing the viewer was interested in – even with YouTube’s search function it’s likely folks were leaving before the parts they were interested in arrived.
  • They needed a little more social support: I tweeted about each one, but probably different parts and points of the conversation could have warranted its own outreach.
  • The live format, where we’d schedule them and invite members of the community to join, and then post afterward, was a bit tough to schedule, and we never really got the community engagement during the calls that we had hoped for.

So, I’m putting together some videos that hopefully are a step in the right direction – I’ll chat with similar folks, luminaries in the data and analytics space, and then publish the entire conversation, but also smaller chunks (ideally one per topic) which can be posted separately so that folks who are only interested in, say, data career ladders, can easily find and watch only that piece.

I still absolutely have a lot to learn – both about being a data professional as well as producing and sharing video content! – but, I’m giving it a try! I’m also hoping to use this energy to help carry me into blogging more, once more – but that’s a perennial hope, isn’t it?

With no further ado, here is the first full-length conversation, with my friends Stephen and Emilie – I think you’re going to like it!

Crowd Sourcing Organizational Improvement

Subtitle: It’s Called the Analytics Road Show!

Here’s the situation: our company has a data organization – it’s probably kind of like your company’s data organization: it has some data engineers, it has some governance experts, it has some analysts, some developers.

We’ve been making great strides in doing the right work, and getting better at delivering that work quickly, accurately, and in communication and consultation with our stakeholders.

But, that feels like table stakes, right? One lesson that really rings out to me from my time before I worked in tech, a lesson from one of the owners of the chain of bakeries where I taught over one hundred baristas how to do latte art:

You can’t work on the business if you’re always working in the business.

(This idea I believe originates with the book The E Myth ? Correct me if I’m wrong on that one though!)

This is something I’ve been cogitating on a lot these days: not just, how do we do what we do, and do it well? But, how do we improve the improvement? How do we improve our processes, our structure, the whole way we think about and engage with our data, with our measurements – even how we engage with one another within the organization.

So – I think I need to get outside of the organization to get greater insights here: I’m taking this show on the road. I’m calling it the Analytics Road Show. I have a deep and abiding love of chatting with folks – some might uncharitably call it nosiness – which I am hoping to leverage into a bunch of sit-down sessions with folks working in similar organizations but not mine.

Getting outside the building is a key part of this endeavor: I need to get at this with a beginner’s mind. So then, dear readers, where can I find folks willing to talk with ol’ SAO?

I have the great fortune of being a member of the Locally Optimistic Slack community (you should join us!), and when I dropped this into the #nyc channel:

… I got a serious no-joke resounding response. So, here goes nothing! July 15th and 16th (that’s next Monday and Tuesday!) I’ll be heading down the mighty Hudson to have coffees, lunches, and mid-level IPAs with some brand new friends in NYC.

I am really looking forward to this, as well as recording my thoughts in standard blog-post format for y’all – and internal action plans for my colleagues.

The Pitch for Looker at Automattic

Automattic, and the part of Automattic where I work,, has a lot of data.

A really tremendous amount of information: so much information it’s passed the point of being helpful, and has started to become a hindrance.

Where do you find the information you’re looking for? How do you know which potential source of information is correct? Do you want the numbers that accounting uses? The numbers that the billing system uses? Maybe a combination of the two – but wait, shouldn’t those numbers be the same?

One of the big goals, part of the vision of my current team, Marketing Data, is to figure out how we can be better ambassadors of our information to our colleagues. How we can serve as translators, or maybe sherpas (sherpii?), on this journey we’re all on together.

The importance of this goal is kind of abstract: we hire a lot of folks who are really brilliant, who shine in their areas of expertise and are poised to create explosive value for their team, their division, and the company at large.

For many of these folks, their background isn’t particularly technical: they may be super Excel savvy (which is, I believe, equivalent to programming in many ways!), but when it comes to directly querying and manipulating raw data with SQL or other querying languages, it’s too much – and I am sympathetic to that.

The way I see it is, we’re putting two obstacles between these folks and their ability to realize their own greatness – and to help maximize the velocity of their impact, how quickly they’re able to go from curiosity to insight to revenue change.

The first obstacle is technical: it doesn’t make sense for them to learn the ins and outs of our data as well as becoming fluent in querying languages. So, they have to make requests of data professionals or software engineers to get the information in their hands that will allow them to maximize their value – and in some sense, their own personal growth.

The second obstacle is social: as a good colleague, no one wants to feel like they’re wasting the time of the folks they work with. But this is the way that we’re making our comrades feel – maybe not intentionally, but implicitly. If, as my colleague Demet often says, only one out of any ten A/B tests will produce results, and those tests have to be analyzed by hand, the nine without results will have the added result of the analysis-requester feeling a bit sheepish.

When there exists a piece of friction between curiosity and insight, when a professional has to ask a question of an analyst or engineer before they can validate their curiosity and pursue insight, we attach a tiny (sometimes not so tiny) cost onto that question. Great marketers ask tons of questions, because they know that it’s only through curiosity and exploration that they can find the kinds of insights they need to discover explosive growth.

When that friction exists because they can’t explore their curiosity directly, but only through other folks, we’re doing them a disservice, and reducing their ability to do their work.

Therefore, any good pursuit of that overall goal – being good ambassadors or translators for our data to less-technical members of our professional community – will work to break down those two obstacles.

Enter Looker.

We tried a few different Business Intelligence tools, and there were a few contenders for our attention, but I kept hearing about the potential of Looker, and especially notably, the importance of its modeling layer.

It took me a shamefully long time to understand what this thing is – I here humbly tip my hat to Matt Mazur and Stephen Levin and the rest of the good folks in the #Measure Slack channel for being so patient and generous with their explanations – and it turns out they were right, the particular features of the modeling layer really are a game changer.

Here’s why:

Imagine that you’re onboarding a new member of your data team: part of that onboarding will be cultural and social but a big part of it will be about relationships between different pieces of data – probably mostly tables but maybe also different sources. You’d say things like:

This table contains every user id, hopefully only once, along with some facts about each customer.

This one is every transaction, with a receipt id, and a total paid amount. The user id is in here too, but many times, since a single customer can have many transactions. If we want to join this table to that table, we need to remember that many-to-one type relationship, or we’ll have problems.

… and so on. In a sense you’re trying to take all of your own understanding about your data, and the way that it all clicks together to form a cohesive whole, and communicate that understanding into someone else’s brain.

In a Matrix style future, you’d be able to plug a plug directly into your new hire and sort of transmit that understanding directly into their head, so they could behave as though they had all of your hard-won knowledge – along with everyone else who made use of that data, because, why not?

Well, that’s sort of the modeling layer.

The modeling layer is a way of defining, through code, all of the relationships and nuances of your existing data buildout, and then presenting them to folks in a useful way, with those relationships in place. They can ask questions and explore the data as though they had the sum total understanding of everyone who built the modeling layer.

In some sense you already have a modeling layer – it’s just in your head, and can only be shared by explaining it to other folks. What Looker does is it gives everyone on your team – everyone in your company – the same powers as Rogue from X-Men.

The modeling layer can literally act like a superpower. Folks don’t have to understand how the data is stored, how the tables relate to one another, they don’t have to wake up in a cold sweat with the date/time doc from Apache Hive seared into their subconcious (Is month MM or mm??) – they just have to use a nice, clean GUI with solid built-in viz. And, they don’t have to ask anyone for help.

Looker gets us past our two big obstacles – once your data is modeled (which is not easy or fast but it is worth it), there is no longer a technical requirement for folks to explore the data, and there is, in the majority of cases, no social obstacle either.

Thus far it has been the right choice for us, and it’s something I look forward to working with for a long time.