Author: Simon

Crowd Sourcing Organizational Improvement

Subtitle: It’s Called the Analytics Road Show!

Here’s the situation: our company has a data organization – it’s probably kind of like your company’s data organization: it has some data engineers, it has some governance experts, it has some analysts, some developers.

We’ve been making great strides in doing the right work, and getting better at delivering that work quickly, accurately, and in communication and consultation with our stakeholders.

But, that feels like table stakes, right? One lesson that really rings out to me from my time before I worked in tech, a lesson from one of the owners of the chain of bakeries where I taught over one hundred baristas how to do latte art:

You can’t work on the business if you’re always working in the business.

(This idea I believe originates with the book The E Myth ? Correct me if I’m wrong on that one though!)

This is something I’ve been cogitating on a lot these days: not just, how do we do what we do, and do it well? But, how do we improve the improvement? How do we improve our processes, our structure, the whole way we think about and engage with our data, with our measurements – even how we engage with one another within the organization.

So – I think I need to get outside of the organization to get greater insights here: I’m taking this show on the road. I’m calling it the Analytics Road Show. I have a deep and abiding love of chatting with folks – some might uncharitably call it nosiness – which I am hoping to leverage into a bunch of sit-down sessions with folks working in similar organizations but not mine.

Getting outside the building is a key part of this endeavor: I need to get at this with a beginner’s mind. So then, dear readers, where can I find folks willing to talk with ol’ SAO?

I have the great fortune of being a member of the Locally Optimistic Slack community (you should join us!), and when I dropped this into the #nyc channel:

… I got a serious no-joke resounding response. So, here goes nothing! July 15th and 16th (that’s next Monday and Tuesday!) I’ll be heading down the mighty Hudson to have coffees, lunches, and mid-level IPAs with some brand new friends in NYC.

I am really looking forward to this, as well as recording my thoughts in standard blog-post format for y’all – and internal action plans for my colleagues.

I’m Doing Live Video Interviews!

This post is a very exciting announcement for me, so I won’t do the typical online-content thing where I tell you a big, narrative tale about me and my values before I actually do the announcing – I’ll do that after.

This coming Monday – TOMORROW – June 24th, at 4PM EST, I’ll be doing the first of many live-streamed video Ask-Me-Anything style interviews with professionals working at the intersection of data and analytics!

(If you’re reading this and want to make a Google Calendar event right this minute, you know what, here’s the Zoom link: https://brooklyndata.zoom.us/j/501489762 )

This first session I’ll be sitting down with my friend and yours, the singular Matt Mazur, once my colleague at Automattic, then a member of the eminent customer-support software Help Scout, and now a free agent, applying his immense experience and insight to problems of analysis and data management for a number of companies, all of which are lucky to have him.

I’m putting this interview together in partnership with the Locally Optimistic team, who I have gotten to know over the last few months and have just been, honestly, consistently impressed!

I first joined the Locally Optimistic community via their blog, as I think is also the case for many of the current members of that Slack instance. As its membership has grown, it’s been a really excellent source of insight and camaraderie: I got to meet a few folks in person at a Looker meetup in NYC (I’m just a drive up the Hudson, remember), as well as at the Marketing Analysis and Data Science conference out in San Francisco, earlier this year.

Ever since I shuttered my podcast about hop farming (more about that here), I’ve missed the kind of social access that doing regular interviews can offer: I am by nature an inquisitive person (some might uncharitably say nosy), and having access to a socially acceptable way to totally pepper someone with questions was in so many ways a rewarding experience for me.

In some ways, Trellis to Table (the hop podcast referenced above) was about connecting small groups and individuals involved in small-scale hop farming, and helping them to share value: by interviewing this totally novel little crew of twenty-something first time farmers in Minnesota, their lessons and energy could leapfrog to the lifetime farmers in Upstate NY, in South Carolina, and suddenly this value had exploded across a network that didn’t even exist before – that was the big motivation for me, by the end.

I think in some ways the intersection of software engineering, data analysis, and business intelligence is in a similar place – there’s a good post about this new type of professional, the Analytics Engineer, on LO – there is this really large, and growing, community of folks whose work doesn’t yet have a clear set of job titles, or a clear sense of what their career progression might look like.

In tapping the Locally Optimistic community for exciting, interesting folks to engage in these video conversations, we can start to create a better shared understanding of our work, and what our work looks like, and how we can get better both as individuals but also as a community of practice.

I’m very excited to get back into the interview game: it’s something I really enjoy, and I hope that y’all are able to get a lot out of it as well.

Matt and I will talk about his professional journey, which has taken him from an officer in the Air Force, to leading an analytics team, to starting his own software business and becoming a business intelligence consultant.

We’ll also explore the world of internal organizational communication, working with non-data teams, and having an impact as a data analyst.I’m very excited to get back into the interview game: it’s something I really enjoy, and I hope that y’all are able to get a lot out of it as well.

As one last reminder, this first session is this coming Monday – TOMORROW – June 24th, at 4PM EST

Here’s the Zoom link: https://brooklyndata.zoom.us/j/501489762

If you want to be super cool, I am also going to be trying to live-stream this via my Twitch channel, which I am literally creating just for this series (!) here: My Real Not a Joke Twitch Stream

The Pitch for Looker at Automattic

Automattic, and the part of Automattic where I work, WordPress.com, has a lot of data.

A really tremendous amount of information: so much information it’s passed the point of being helpful, and has started to become a hindrance.

Where do you find the information you’re looking for? How do you know which potential source of information is correct? Do you want the numbers that accounting uses? The numbers that the billing system uses? Maybe a combination of the two – but wait, shouldn’t those numbers be the same?

One of the big goals, part of the vision of my current team, Marketing Data, is to figure out how we can be better ambassadors of our information to our colleagues. How we can serve as translators, or maybe sherpas (sherpii?), on this journey we’re all on together.

The importance of this goal is kind of abstract: we hire a lot of folks who are really brilliant, who shine in their areas of expertise and are poised to create explosive value for their team, their division, and the company at large.

For many of these folks, their background isn’t particularly technical: they may be super Excel savvy (which is, I believe, equivalent to programming in many ways!), but when it comes to directly querying and manipulating raw data with SQL or other querying languages, it’s too much – and I am sympathetic to that.

The way I see it is, we’re putting two obstacles between these folks and their ability to realize their own greatness – and to help maximize the velocity of their impact, how quickly they’re able to go from curiosity to insight to revenue change.

The first obstacle is technical: it doesn’t make sense for them to learn the ins and outs of our data as well as becoming fluent in querying languages. So, they have to make requests of data professionals or software engineers to get the information in their hands that will allow them to maximize their value – and in some sense, their own personal growth.

The second obstacle is social: as a good colleague, no one wants to feel like they’re wasting the time of the folks they work with. But this is the way that we’re making our comrades feel – maybe not intentionally, but implicitly. If, as my colleague Demet often says, only one out of any ten A/B tests will produce results, and those tests have to be analyzed by hand, the nine without results will have the added result of the analysis-requester feeling a bit sheepish.

When there exists a piece of friction between curiosity and insight, when a professional has to ask a question of an analyst or engineer before they can validate their curiosity and pursue insight, we attach a tiny (sometimes not so tiny) cost onto that question. Great marketers ask tons of questions, because they know that it’s only through curiosity and exploration that they can find the kinds of insights they need to discover explosive growth.

When that friction exists because they can’t explore their curiosity directly, but only through other folks, we’re doing them a disservice, and reducing their ability to do their work.

Therefore, any good pursuit of that overall goal – being good ambassadors or translators for our data to less-technical members of our professional community – will work to break down those two obstacles.

Enter Looker.

We tried a few different Business Intelligence tools, and there were a few contenders for our attention, but I kept hearing about the potential of Looker, and especially notably, the importance of its modeling layer.

It took me a shamefully long time to understand what this thing is – I here humbly tip my hat to Matt Mazur and Stephen Levin and the rest of the good folks in the #Measure Slack channel for being so patient and generous with their explanations – and it turns out they were right, the particular features of the modeling layer really are a game changer.

Here’s why:

Imagine that you’re onboarding a new member of your data team: part of that onboarding will be cultural and social but a big part of it will be about relationships between different pieces of data – probably mostly tables but maybe also different sources. You’d say things like:

This table contains every user id, hopefully only once, along with some facts about each customer.

This one is every transaction, with a receipt id, and a total paid amount. The user id is in here too, but many times, since a single customer can have many transactions. If we want to join this table to that table, we need to remember that many-to-one type relationship, or we’ll have problems.

… and so on. In a sense you’re trying to take all of your own understanding about your data, and the way that it all clicks together to form a cohesive whole, and communicate that understanding into someone else’s brain.

In a Matrix style future, you’d be able to plug a plug directly into your new hire and sort of transmit that understanding directly into their head, so they could behave as though they had all of your hard-won knowledge – along with everyone else who made use of that data, because, why not?

Well, that’s sort of the modeling layer.

The modeling layer is a way of defining, through code, all of the relationships and nuances of your existing data buildout, and then presenting them to folks in a useful way, with those relationships in place. They can ask questions and explore the data as though they had the sum total understanding of everyone who built the modeling layer.

In some sense you already have a modeling layer – it’s just in your head, and can only be shared by explaining it to other folks. What Looker does is it gives everyone on your team – everyone in your company – the same powers as Rogue from X-Men.

The modeling layer can literally act like a superpower. Folks don’t have to understand how the data is stored, how the tables relate to one another, they don’t have to wake up in a cold sweat with the date/time doc from Apache Hive seared into their subconcious (Is month MM or mm??) – they just have to use a nice, clean GUI with solid built-in viz. And, they don’t have to ask anyone for help.

Looker gets us past our two big obstacles – once your data is modeled (which is not easy or fast but it is worth it), there is no longer a technical requirement for folks to explore the data, and there is, in the majority of cases, no social obstacle either.

Thus far it has been the right choice for us, and it’s something I look forward to working with for a long time.

SAObattical: Return and Reflection

This post is my reflection on my sabbatical and also lots of pictures. I’m adding a Read More so it doesn’t clutter up the feed for folks who aren’t interested.

The most important thing is that I am grateful for a company with such a benefit, grateful that I get to work here, and thankful for a team that continued to succeed in my absence.

Continue reading “SAObattical: Return and Reflection”

Source & Medium: A Medium Sized Dilemma

Subtitle: Source, Medium, Attribution, Stale Information, and The Future of Data

Here’s our situation – we want to be able to slice reporting and dashboards by a number of dimensions, including source and medium.

MARDAT (the team I’m lucky enough to be working with) is working to make this kind of thing a simple exercise in curiosity and (dare I say) wonder. It’s really interesting to me, and has become more and more clear over the last year or so, how important enabling curiosity is. One of the great things that Google Analytics and other business intelligence tools can do is open the door to exploration and semi-indulgent curiosity fulfillment.

You can imagine, if you’re a somewhat non-technical member of a marketing or business development team, you’re really good at a lot of things. Your experience gives you a sense of intuition and interest in the information collected by and measured by your company’s tools.

If the only way you have access to that information is by placing a request, for another person to go do 30 minutes, two hours, three hours of work, that represents friction in the process, that represents some latency, and you’re going to find yourself disinclined to place that kind of request if you’re not fairly certain that there’s a win there – it pushes back on curiosity. It reduces your ability to access and leverage your expertise.

This is a bad thing!

That’s a little bit of a digression – let’s talk about Source and Medium. Source and Medium are defined pretty readily by most blogs and tools: these are buckets that we place our incoming traffic in. People who arrive at our websites, where ever they were right before they arrived at our websites, that’s Source and Medium.

We assign other things too – campaign name, keyword, all sorts of things. My dilemma here actually applies to the entire category of things we tag our customers with, but it’s quicker to just say, Source and Medium.

Broadly, Source is the origin (Google, another website, Twitter, and so forth) and Medium is the category (organic, referral, etc) – if this is all new to you I recommend taking a spin through this Quora thread for a little more context.

What I am struggling with, is this: for a site like WordPress.com, where folks may come and go many times before signing up, and they may enjoy our free product for a while before making a purchase, at what point do you say, “OK, THIS is the Source and Medium for this person!”

Put another way:  when you make a report, say, for all sales in May, and you say to the report, “Split up all sales by Source and Medium,” what do you want that split to tell you?

Here are some things it might tell you:

  • The source and medium for the very first page view we can attribute back to that customer, regardless of how long ago that page view was.
  • The source and medium for a view of a page we consider an entry page (landing pages, home page, etc), regardless of how long ago that page view was.
  • The source and medium for the very first page view, within a certain window of time (7 days, 30 days, 1 year)
  • The source and medium for the first entry page (landing page, homepage) within a certain window of time (7 days, 30 days, 1 year)
  • The source and medium for the visit that resulted in a signup, rather than the first ever visit.
  • The source and medium for the visit that resulted in a conversion, rather than the first ever visit.
  • The source and medium for an arrival based on some other criteria (first arrival of all time OR first arrival since being idle for 30 days, something like that)

It feels like at some point Source and Medium should go bad, right? If someone came to the site seven years ago, via Friendster or Plurk or something, signed up for a free site, and then came back last week via AdWords, we wouldn’t want to assign Friendster | Referral to that sale, right?

Maybe we have to create more dynamic Source/Medium assignation: have one for “First Arrival,” one for “Signup,” one for “Purchase.” Maybe even something like Source/Medium for “Return After 60+ Days Idle”

In the long run, it feels like having a sense of what sources are driving each of those behaviors more or less effectively would be helpful, and could help build insights – but I also feel a little crazy: does no one else have this problem with Source and Medium?