By necessity, marketers have demanded stats for viewability and guarantees of brand safety for their digital media buys. Now it’s time for them to require similar rigour for their data.
Overall, marketers should be asking if they’re reaching the right audience. To define “right,” you have to answer another question, Who are you targeting? The answer may not be as simple as it seems.
Here's an example: Is your target a 25-44-year old female travel enthusiast or 25-44 female that has indicated strong travel intent? They may sound similar but these are entirely different audience groups. Targeting each yields different campaign results. Marketers who pick the wrong one will be sceptical about data performance. More importantly, the validity of the target audience will remain in question.
So how can marketers can a better grip on data accuracy?
Start asking the right questions.
Typically you hear these condemnations about third-party data providers: a) It's a black-box; b) It doesn't work, and c) it's expensive. These assumptions come from asking the wrong questions. What marketers should not be asking is: Where is the data coming from? Does it work? What kind of data is it?
Let’s look at these queries. One of the most common questions marketers ask is provenance i.e. where the data is derived from. That’s because most unconsciously link data origin and data quality. It’s a common mental shortcut, but it is spurious and misleading.
Knowing where the data came from often has nothing to do with its accuracy. In the previous example, contextual data around travel editorial is much weaker than travel meta-search site data. If someone reads editorial related to travel, then they are very likely to be interested in the topic. If they’re searching for flights though, they may merely be pricing them against other options indicating stronger intent.
Or is it?
That depends. What about location and commuter data? Can that be linked to intent? We don’t know because unless there is a validation layer introduced to measure data sets, we can't guarantee the accuracy of the data with a high degree of probability. It's good to know the cooking ingredients, but the success of the dish will depend on many variables and taste preferences.
To demystify the above, the right and only question to ask is "How accurate is the data?" That demands full transparency and disclosure on measurement upfront. Using our example, if there’s an 80% correlation between people who search flights and those who eventually book them, that gives marketers a tangible framework to work against. It also lets them bake in data cost and correlate performance uplift. There is no need for multi-variance testing or simple A/B bake-offs. Data simply works by design.
But there’s a caveat: it’s hard to find data players who share statistics upfront, not to mention vouch for and guarantee it.
Bake accuracy into your upfront benchmarks.
Marketers should turn the measurement framework upside down. They should input accuracy benchmarks as a probability or on-target percentage, and they should do it before the campaign goes live, instead of in a postmortem analyses.
To mandate accuracy, you need to transform the thinking around an audience construct. That requires the upfront work of getting the facts right. It's looking at how many people out of the entire universe of consumers who intend to travel next week. Is it 2% of the population or 20%? Setting this benchmark is paramount. It defines reach and sets the target to measure the effectiveness of data.
While there are third-party frameworks and methodologies to measure the accuracy of demographic attributes (like age, gender and household income), behavioural data is both difficult and very expensive to measure and guarantee in real-time.
There are no major players doing it at scale, which makes marketers think twice about the feasibility of accurate measurement as a whole. Ideally, you want to have equitable answers for questions like, “Are you still in the market for this product?’ and ‘Do you intend to travel abroad next week?” Validating these questions requires an “always on” assessment to build audience segments and cull stale data.
Technical limitations of real-time interoperability with some activation channels will impact reach, frequency and accuracy of “expired” users. As result, there will be media budget waste and inconsistent targeting. Ouch!
A new framework for accurate measurement.
What’s the solution? The industry needs to redefine the framework for delivering data accuracy measurement.
For this to work it requires three components: a) qualified seed data; b) a scalable and dense training set; and c) data science with the validation component.
Let's look at each. Qualified seed is the most accurate, unambiguous and verified data set possible. Usually, highly paid survey companies obtain it by gathering about 10,000-15,000 online responses (sometimes also called a truth set). In the previous example that would be people who will travel next week.
The training set requires vastness and a dense data marketplace. Having 4 billion global IDs (cookies, TV, mobile device, IoT IDs etc) isn't enough if you have shallow data attributes. Depth at scale is paramount.
You need a robust set of variables plus a large amount of data to build strong data models.
Machine Learning, Artificial Intelligence, Lookalike (ML/AL/LAL) are all used to build bespoke data models with on-target benchmarks as an input. The most important part is validation of the expanded/scaled audience, which is typically done by the panel, first-party deterministic data, or a combination of both.
With the latter, we would pose the same question (“Are you travelling next week?”) to the larger group. With statistical significance (think number of responses versus the total number), you can derive an on-target percentage as an output. This yields better efficacy reduces waste and makes for a happy client.
As an industry, are we ready to shift and start transacting on accuracy as a new currency of 2019?
It all starts with you.
Ask for upfront data accuracy measurement disclosures across the entire data supply chain. It is a data revolution. And it’s one that’s been a long time coming.
Evgeny Popov is vice president, data solutions, Lotame APAC.