Wednesday, November 22, 2017

Do Customer Data Platforms Need Identity Matching? The Answer May Surprise You.

I spend a lot of time with vendors trying to decide whether they are, or should be, a Customer Data Platform. I also spend a lot of time with marketers trying to decide which CDPs might be right for them. One topic that’s common in both discussions is whether a CDP needs to include identity resolution – that is, the ability to decide which identifiers (name/address, phone number, email, cookie ID, etc.) belong to the same person.

It seems like an odd question. After all, the core purpose of a CDP is to build a unified customer database, which requires connecting those identifiers so data about each customer can be brought together. So surely identity resolution is required.

Turns out, not so much. There are actually several reasons.

- Some marketers don’t need it. Companies that deal only in a single channel often have just one identifier per customer.  For example, Web-only companies might use just a cookie ID.  True, channel-specific identifiers sometimes change (e.g., cookies get deleted).  But there may be no practical way to link old and new identifiers when that happens, or marketers may simply not care.  A more common situation is companies have already built an identity resolution process, often because they’re dealing with customers who identify themselves by logging in or who transact through accounts. Financial institutions, for example, often know exactly who they’re dealing with because all transactions are associated with an account that’s linked to a customer's master record (or perhaps not linked because the customer prefers it that way). Even when identity resolution is complicated,  mature companies often (well, sometimes) have mature processes to apply a customer ID to all data before it reaches the CDP. In any of these cases, the CDP can use the ID it’s given and not need an identity resolution process of its own.
- Some marketers can only use it if it’s perfect. Again, think of a financial institution: it can’t afford to guess who’s trying to take money out of an account, so it requires the customer to identify herself before making a transaction. In many other circumstances, absolute certainty isn’t required but a false association could be embarrassing or annoying enough that the company isn’t willing to risk it. In those cases, all that’s needed is an ability to “stitch” together identifiers based on definite connections. That might mean two devices are linked because they both sent emails using the same email address, or an email and phone number linked because someone entered them both into a registration form. Almost every CDP has this sort of “deterministic” linking capability, which is so straightforward that it barely counts as identity resolution in the broader sense.

- Specialized software already exists. The main type of matching that CDPs do internally – beyond simple stitching – is “fuzzy” matching.  This applies rules to decide when two similar-looking records really refer to the same person. It's most commonly applied to names and postal addresses, which are often captured inconsistently from one source to the next. It might sometimes be applied to other types of data, such as different forms of an email address (e.g. and The technology for this sort of matching gets very complicated very quickly, and it’s something that specialized vendors offer either for purchase or as a service. So CDP vendors can quite reasonably argue they needn’t build this for themselves but should simply integrate an external product.

- Much identity resolution requires external data. This is the heart of the matter.  Most of the really interesting identity resolution today involves linking different devices or linking across channels when there’s no known connection. This sort of “probabilistic” linking is generally done by vendors who capture huge amounts of behavioral data by tracking visitors to popular Web sites or users of popular mobile applications, or by gathering deterministic links from many different sources. They then build giant databases (or "graphs" if you want to sound trendy) with these connections.  Even matching of offline names and addresses usually requires external data, both to standardize the inputs (to make fuzzy matching more accurate) and to incorporate information such as address and name changes that cannot be known by inspecting the data itself.  In all these situations, marketers need to use the external vendors’ data to find connections that don’t exist within the marketers’ own, much more limited information. If the external vendor provides matching functions in addition to the data, the CDP is relieved of the need to do the matching internally.

In short, there’s a surprisingly strong case that identity resolution isn’t a required feature in a CDP.  All the CDP really needs is basic stitching and connections to external services for more advanced approaches.  As cross-device and cross-channel matching become more important, CDPs will be more reliant on external vendors no matter what capabilities they’ve built for themselves. One important qualifier is the CDP implementation team still needs expertise in matching, so they can help clients set it up properly. But while it’s great to find a CDP vendor with its own matching technology, lack of that technology shouldn’t exclude a vendor from being considered a CDP.

Thursday, November 09, 2017

No, Users Shouldn't Write Their Own Software

Salesforce this week announced “myEinstein” self-service artificial intelligence features to let non-technical users build predictive models and chatbots. My immediate reaction was that's a bad idea: top-of-the-head objections include duplicated effort, wasted time, and the potential for really bad results. I'm sure I could find other concerns if I thought about it, but today’s world brings a constant stream of new things to worry about, so I didn’t bother. But then today’s news described an “Everyone Can Code” initiative from Apple, which raised essentially the same issue in even clearer terms: should people create their own software?

I thought this idea had died a well-deserved death decades ago. There was a brief period when people thought that “computer literacy” would join reading, writing, and arithmetic as basic skills required for modern life. But soon they realized that you can run a computer using software someone else wrote!* That made the idea of everyone writing their own programs seem obviously foolish – specifically because of duplicated effort, wasted time, and the potential for really bad results. It took IT departments much longer to come around the notion of buying packaged software instead of writing their own but even that battle has now mostly been won. Today, smart IT groups only create systems to do things that are unique to their business and provide significant competitive advantage.

But the idea of non-technical workers creating their own systems isn't just about packaged vs. self-written software. It generally arises from a perception that corporate systems don’t meet workers’ needs: either because the corporate systems are inadequate or because corporate IT is hard to work with and has other priorities. Faced with such obstacles to getting their jobs done, the more motivated and technically adept users will create their own systems, often working with tools like spreadsheets that aren’t really appropriate but have the unbeatable advantage of being available.

Such user-built systems frequently grow to support work groups or even departments, especially at smaller companies. They’re much disliked by corporate IT, sometimes for turf protection but mostly because they pose very real dangers to security, compliance, reliability, and business continuity. Personal development on a platform like myEinstein poses many of the same risks, although the data within Salesforce is probably more secure than data held on someone’s personal computer or mobile phone.

Oddly enough, marketing departments have been a little less prone to this sort of guerilla IT development than some other groups. The main reason is probably that modern marketing revolves around customer data and customer-facing systems, which are still managed by a corporate resource (not necessarily IT: could be Web development, marketing ops, or an outside vendor). In addition, the easy availability of Software as a Service packages has meant that even rogue marketers are using software built by professionals. (Although once you get beyond customer data to things like planning and budgeting, it’s spreadsheets all the way.)

This is what makes the notion of systems like myEinstein so dangerous (and I don’t mean to pick on Salesforce in particular; I’m sure other vendors have similar ideas in development). Because those systems are directly tied into corporate databases, they remove the firewall that (mostly) separated customer data and processes from end-user developers. This opens up all sorts of opportunities for well-intentioned workers to cause damage.

But let’s assume there are enough guardrails in place to avoid the obvious security and customer treatment risks. Personal systems have a more fundamental problem: they’re personal. That means they can only manage processes that are within the developer’s personal control. But customer experiences span multiple users, departments, and systems. This means they must be built cooperatively and deployed across the enterprise. The IT department doesn't have to be in charge but some corporate governance is needed. It also means there’s significant complexity to manage, which requires some sort of trained professionals need to oversee the process. The challenges and risks of building complex systems are simply too great to let individual users create them on their own.

None of this should be interpreted to suggest that AI has no place in marketing technology. AI can definitely help marketers manage greater complexity, for example by creating more detailed segmentations and running more optimization tests than humans can manage by themselves. AI can also help technology professionals by taking over tasks that require much skill but limited creativity: for example, see Qubole, which creates an “autonomous data platform" that is “context-aware, self-managing, and self-learning”. I still have little doubt that AI will eventually manage end-to-end customer experiences with little direct human input (although still under human supervision and, one hopes, with an occasional injection of human insight). Indeed, recent discussions of AI systems that create other AI systems suggest autonomous marketing systems might be closer than it seems.

Of course, self-improving AI is the stuff of nightmares for people like Nick Bostrom, who suspect it poses an existential threat to humanity. He may well be right but it’s still probably inevitable that marketers will unleash autonomous marketing systems as soon as they’re able. At that point, we can expect the AI to quickly lock out any personally developed myEinstein-type systems because they won’t properly coordinate with the AI’s grand scheme. So perhaps that problem will solve itself.

Looking still further ahead, if the computers really take over most of our work, people might take up programming purely as an amusement. The AIs would presumably tolerate this but carefully isolate the human-written programs from systems that do real work, neatly reversing the “AI in a box” isolation that Bostrom and others suggest as a way to keep the AIs from harming us. It doesn’t get much more ironic than that: everyone writing programs that computers ignore completely. Maybe that’s the future Apple’s “Everyone Can Code” is really leading up to.

*Little did we know.  It turned out that far from requiring a new skill, computers reduced the need for reading, writing, and math.

Monday, November 06, 2017

TrenDemon and Adinton Offer Attribution Options

I wrote a couple weeks ago about the importance of attribution as a guide for artificial intelligence-driven marketing. One implication was I should pay more attention to attribution systems. Here’s a quick look at two products that tackle different parts of the attribution problem: content measurement and advertising measurement.


Let’s start with TrenDemon. Its specialty is measuring the impact of marketing content on long B2B sales cycles. It does this by placing a tag on client Web sites to identify visitors and track the content they consume, and then connecting client CRM systems to find which visitor companies ultimately made a purchase (or reached some other user-specified goal). Visitors are identified by company using their IP address and as individuals by tracking cookies.

TrenDemon does a bit more than correlate content consumption and final outcomes. It also identifies when each piece of content is consumed, distinguishing between the start, middle, and end of the buying journey. It also looks at other content metrics such as how many people read an item, how much time they spend with it, and how many read something else after they’re done. These and other inputs are combined to generate an attribution score for each item. The system uses the score to identify the most effective items for each journey stage and to recommend which items should be presented in the future.

Pricing for TrenDemon starts at $800 per month. The system was launched in early 2015 and is currently used by just over 100 companies.


Next we have Adinton, a Barcelona-based firm that specializes in attribution for paid search and social ads. Adinton has more than 55 clients throughout Europe, mostly selling travel and insurance online. Such purchases often involve multiple Web site visits but still have a shorter buying cycle than complex B2B transactions.

Adinton has pixels to capture Web ad impressions as well as Web site visits. Like TrenDemon, it tracks site visitors over time and distinguishes between starting, middle, and finishing clicks. It also distinguishes between attributed and assisted conversions. When possible, it builds a unified picture of each visitor across devices and channels.

The system uses this data to calculate the cost of different types of click types, which it combines to create a “true” cost per action for each ad purchase. It compares this with the clients’ target cost per actions to determine where they are over- or under-investing.

Adinton has API connections to gather data from Google AdWords, Facebook Ads, Bing Ads, AdRoll, RocketFuel, and other advertising channels. An autobidding system can currently adjust bids in AdWords and will add Facebook and Bing adjustments in the near future. The system also does keyword research and click fraud identification. Pricing is based on number of clicks and starts as low as $299 per month for attribution analysis, with additional fees for autobidding and click fraud modules. Adinton was founded in 2013.  It launched its first product in 2014 although attribution came later.

Further Thoughts

These two products are chosen almost at random, so I wouldn’t assign any global significance to their features. But it’s still intriguing that both add a first/middle/last buying stage to the analysis. It’s also interesting that they occupy a middle ground between totally arbitrary attribution methodologies, such as first touch/last touch/fractional credit, and advanced algorithmic methods that attempt to calculate the true incremental impact of each touch. (Note that neither TrenDemon nor Adinton’s summary metric is presented as estimating incremental value.)

 Of course, without true incremental value, neither system can claim to develop an optimal spending allocation. One interpretation might be that few marketers are ready for a full-blown algorithmic approach but many are open to something more than the clearly-arbitrary methods. So perhaps systems like TrenDemon and Adinton offer a transitional stage for marketers (and marketing AI systems) that will eventually move to a more advanced approach.

 An alternative view would be the algorithmic methods will never be reliable enough to be widely accepted.  This would see these intermediate systems as about as far as most marketers ever will or should go towards measuring marketing program impact. Time will tell.