Data without structure looks like noise

Searching for patterns is a challenge for data detectives. When you find a solid pattern and validate it, the data falls into place. Read: "Everything Starts Out Looking Like a Toy" #191

Mar 25, 2024

Hi, I’m Greg 👋! I write weekly product essays, including system “handshakes”, the expectations for workflow, and the jobs to be done for data. What is Data Operations? was the first post in the series.

This week’s toy: the 3d-printed “Brewintosh”. It’s amazing to see the dedication and care this YouTuber put into building an emulated Mac Plus. Is it the same machine as you could buy in 1986? No. Does it evoke the feeling? Absolutely! Edition 191 of this newsletter is here - it’s March 25, 2024.

Brought to you by Apollo.io, an all-in-one solution providing RevOps teams with access to data, enrichment, outreach, call intelligence, scoring, calendaring, and email automation. They integrate seamlessly with your CRM and existing workflows. The quality of Apollo’s data is unmatched, they are ranked #1 for contact and company data accuracy on G2 with over 6000 reviews and a 4.8 star rating. Learn more …

If you have a comment or are interested in sponsoring, hit reply.

The Big Idea

A short long-form essay about data things

⚙️ Data without structure looks like noise

“These numbers look wrong. Can you take a look?”

When you read these words as an ops practitioner, your brain is already swinging into motion with questions:

have we seen this pattern before?
what exactly is happening?
is this still happening or is it a one-time event?
how important is it to fix, and how soon?
is it related to any other issues?

Approaching the information with a structured inquiry gives you a shot at answering the original question. The goal? Identify, remediate, and fix it so it can’t happen again.

Triage and Diagnosis

Triage is the first step when identifying a data problem.

Ask yourself: is this a critical problem that is impacting production? Is it a transient reporting problem that will be fixed automatically? Or is it a known bug that has cropped up again?

What you’re doing is identifying if you’ve seen the pattern before. When you find an existing pattern and match it with an existing solution, you have a set of next steps to match whether you solved it. For example, when you have one kind of revenue that shows up temporarily during an ETL data ingestion job, you have a time frame after which you know it’s a bug.

Here are some questions to ask when you encounter a “one-off” problem:

What information did you expect?
What did you observe?
Based on observation, what data changed (or unexpectedly stayed the same)?

When you find a familiar pattern, it helps to have a query or procedure to test the outcome and confirm you solved it. If you have a result, you know you need to fix a problem. When you get no results in your debug query, you’re in good shape.

If it’s new, it’s key to find out what’s happening and build a procedure to identify, remediate, and validate the fix.

Using your one-off problem identification steps above, build a query to see if you have more records in that state. Is your “one-off problem” still happening or did it happen only once?

If your query returns the same number of records having the problem that you’ve found so far, you’re in luck.

Move into analysis

Ok, detective, you’ve found a new problem. Since you have a query that finds the issue, you have an idea to validate the conditions records need to be in to cause the problem.

The rest of the data looks like noise at the moment. We’ve found a potential solution but don’t know whether it will stop it from happening again.

What to do? Attempt to solve the problem.

The dumb way to describe this: negate the conditions that cause that record to show in your debugging query.

You want to make the structure of your data obvious to understand what’s going wrong to make it show up in the query. If the consequences are small in causing the problem, you might try to cause it intentionally and provide a test case for your fix. If it’s a bit more important or hard to remove, you might not want to cause another instance of the problem.

What’s the takeaway? When you find a data problem, validate if it matches one you’ve seen before. When it’s a new one, use the same method of detecting, testing a solution, and validating that the solution causes the record to be fixed.

Links for Reading and Sharing

These are links that caught my 👀

1/ Planning vs Strategy - Roger Martin explains very simply: what’s the difference between planning vs strategy?

2/ Startup Wisdom - “What you need in a startup idea, and all you need, is something your friends actually want.” Paul G’s essay on “How to Start Google” is a great read. It’s not just about starting a specific company, but about how to view the world as a maker.

3/ Coding is going away - Or is it? The prevalence of AI-based Copilot tools makes you think traditional coding will disappear. For some very structured examples, that’s probably true - writing a simple script to do a simple thing - but for more complicated logic, read this. Or take a look at the “Developer Autonomy Scale” by Matt Wensing, and start by determining what not to do in a coding situation.

What to do next

Hit reply if you’ve got links to share, data stories, or want to say hello.

The next big thing always starts out being dismissed as a “toy.” - Chris Dixon