Building Data Intuition for Marketers

The more I interact with seemingly sophisticated companies who fat finger their attempts at personalization, the more I appreciate the efforts of companies who really get me. You would think I’m referring to the independent corner coffee shops, but it’s when a behemoth enterprise somehow manages to establish a relationship with me that I’m truly impressed. As a marketer, I know that the first prerequisite to “getting” a customer is a robust, accurate, and densely populated customers database. Increasingly, these data are built on the backs of consumers’ digital activities—the sites they visit, their musical habits, or whether they are a home remodeler or a fly fisherman.

The Face Validity Test

However, even with all of that data, the feeling of “you get me” is still elusive. One reason is that too many sophisticated companies forget the basics because they are so sophisticated. Marketers assume that the models and AI components they get from data scientists are right if they show a lift. Likewise, data scientists assume marketers know their customers at an intuitive level. The real customer is all too often lost in the middle. So, grab some coffees, and ask your data or data science partner out on a “data date.” Your goal is to get to know each other—and the real customer hidden in the data.

Here are three conversation starters:

1. Get and examine a tidy data extract

It’s impossible to get an intuitive understanding of data stored in normalized relational tables, or spread out in JSON files in a data lake. A solution is the tidy data extract:

  • Each row is a customer, and each column is something about that customer. The data will start out in a relational database, and that’s OK. But you want something you can see, touch, feel and eyeball quickly.
  • You don’t want a sample of 1 million records (if your customer base is that large), but you also don’t want a sample of 10 records. Too many records and both you and Excel will be overwhelmed, too few records and you won’t see enough variations of the data to really get it.
  • This should be a point-in-time view, and it doesn’t have to be real-time.

Go left to right, top to bottom. See the column headings. Eyeball the data. Does it mean anything to you? Are you capturing what you thought you were capturing? Do you need to ask for clarification? Are there many blank fields? Producing this kind of extract should be easy for your data partner. It might seem like a “no-brainer” to them, but you’ll immediately see texture that isn’t apparent in models, charts, or deciles.

2. Get the data dictionary

Yes, you just opened the documentation can of sour gummy worms—I guess that’s why data partners often make a face when they hear “data dictionary.” Still, it’s worth asking for one, because if one doesn’t exist, there’s a quick-and-dirty way to get one started.

Take the entire first-row headings of your customer data extract and transpose them on to a new spreadsheet. Start defining as best you can each of the column headings in plain English, with whatever limited knowledge you have. Give it a once-over and pass it on to your data-keeping friend for “review and clarification.” Kindly ask her to fill the remaining blanks, as well as some additional info. You may have to go through a few iterations of this back-and-forth, but do your best to make it thorough and keep it simple.

It’ll be the effort of writing down the definitions that will be valuable. This is college study habits 101—study the information by writing it down. What does this field mean? What is the acceptable range of values? You’ll find yourself full of insights after an exercise like this, ready to test in customer-facing go-to-market efforts.

3. Understand data coverage

Whereas data dictionary is an English-sentence version of what the data is supposed to show, data coverage metrics uncover the truth about the percentage of customer records with data populated for a given field. Any data scientist will tell you that missing data is the bane of their existence. A variable looks solid, and then 20% of the records are NAs. None of the options is good in this case (remove the records, interpolate, take the average, or set to zero for numerics.)

This is why it’s critical to find the gaps and begin to understand the strengths and weaknesses of your data asset. As you work through this understanding, you can start with the basics, as in “what’s filled out vs. what’s not” and put a percentage to it. Here are some examples.

  • Text fields are usually the most basic. For example, [Customer_Email_Address] might have a 76% coverage rate. Coverage doesn’t imply deliverability, but it’s a start.
  • Factor fields, like “segment” or “gender” are also easy to understand. In a database, these will still be stored as text, but they are different in practice—essentially pick-lists that have a set of allowable values. In some cases, you’ll find a variable like “unknown” that is distinct from an actual missing record. This might mean something different, so make sure you understand if “unknown gender” means the customer doesn’t want us to know or we never asked.
  • Continuous variables like income are trickier. You data friend might claim that 100% of records have income populated—but what if most of these are 0? To avoid these kinds of surprises, ask for a distribution of the data, like a histogram, or at the very least statistics like mean, median, and percentiles (usually 10th and 90th percentiles are good.)

These three exercises are just starter topics. Grabbing a coffee—or a conference room—with your data counterpart every month or so to get your collective hands dirty will go a long way to creating customized-feeling, one-on-one interactions with customers. It is this raw feel, more than models or deciles, which will drive the intuition you need as a direct marketer. Remember, it’s not always fancy models and AI that drive authentic, real-feeling experience. Spending time at “ground level” with your customer data isn’t necessarily sexy, but it drives real-world insights.