Wednesday, February 27, 2008

ParAccel Enters the Analytical Database Race

As I’ve now written more times than I care to admit, specialized analytical databases are very much in style. In addition to my beloved QlikView, market entrants include Alterian, SmartFocus, QD Technology, Vertica, 1010data, Kognitio, Advizor and Polyhedra, not to mention established standbys including Teradata and Sybase IQ. Plus you have to add appliances like Netezza, Calpont, Greenplum and DATAllegro. Many of these run on massively parallel hardware platforms; several use columnar data structures and in-memory data access. It’s all quite fascinating, but after a while even I tend to lose interest in the details.

None of which dimmed my enthusiasm when I learned about yet another analytical database vendor, ParAccel. Sure enough, ParAccel is a massively parallel, in-memory-capable, SQL-compatible columnar database, which pretty much hits all the tick boxes on my list. Run by industry veterans, the company seems to have refined many of the details that will let it scale linearly with large numbers of processors and extreme data volumes. One point that seemed particularly noteworthy was that the standard data loader can handle 700 GB per hour, which is vastly faster than many columnar systems and can be a major stumbling block. And that’s just the standard loader, which passes all data through a single node: for really large volumes, the work can be shared among multiple nodes.

Still, if ParAccel had one particularly memorable claim to my attention, it was having blown past previous records for several of the TPC-H analytical query benchmarks run by the Transaction Processing Council. The TPC process is grueling and many vendors don’t bother with it, but it still carries some weight as one of the few objective performance standards available. While other winners had beaten the previous marks by a few percentage points, ParAccel's improvement was on the order of 500%.

When I looked at the TPC-H Website for details, it turned out that ParAccel’s winning results have since been bested by yet another massively parallel database vendor, EXASOL, based in Nuremberg, Germany. (Actually, ParAccel is still listed by TPC as best in the 300 GB category, but that’s apparently only because EXASOL has only run the 100 GB and 1 TB tests.) Still, none of the other analytic database vendors seem to have attempted the TPC-H process, so I’m not sure how impressed to be by ParAccel’s performance. Sure it clearly beats the pants off Oracle, DB2 and SQL Server, but any columnar database should be able to do that.

One insight I did gain from my look at ParAccel was that in-memory doesn’t need to mean small. I’ll admit to be used to conventional PC servers, where 16 GB of memory is a lot and 64 GB is definitely pushing it. The massively parallel systems are a whole other ballgame: ParAccel’s 1 TB test ran on a 48 node system. At a cost of maybe $10,000 per node, that’s some pretty serious hardware, so this is not something that will replace QlikView under my desk any time soon. And bear in mind that even a terabyte isn’t really that much these days: as a point of reference, the TPC-H goes up to 30 TB. Try paying for that much memory, massively parallel or not. The goods news is that ParAccel can work with on-disk as well as in-memory data, although the performance won’t be quite as exciting. Hence the term "in-memory-capable".

Hardware aside, ParAccel itself is not especially cheap either. The entry price is $210,000, which buys licenses for five nodes and a terabyte of data. Licenses cost $40,000 for each additional node cost $40,000 and $10,000 for each additional terabyte. An alternative pricing scheme doesn’t charge for nodes but costs $1,000 per GB, which is also a good bit of money. Subscription pricing is available, but any way you slice it, this is not a system for small businesses.

So is ParAccel the cat’s meow of analytical databases? Well, maybe, but only because I’m not sure what “the cat’s meow” really means. It’s surely an alternative worth considering for anyone in the market. Perhaps more significant, the company raised $20 million December 2007, which may make it more commercially viable than most. Even in a market as refined as this one, commercial considerations will ultimately be more important than pure technical excellence.

Thursday, February 14, 2008

What's New at DataFlux? I Thought You'd Never Ask.

What with it being Valentine’s Day and all, you probably didn’t wake up this morning asking yourself, “I wonder what’s new with DataFlux?” That, my friend, is where you and I differ. Except that I actually asked myself that question a couple of weeks ago, and by now have had time to get an answer. Which turns out to be rather interesting.

DataFlux, as anyone still reading this probably knew already, is a developer of data quality software and is owned by SAS. DataFlux’s original core technology was a statistical matching engine that automatically analyzes input files and generates sophisticated keys which are similar for similar records. This has now been supplemented by a variety of capabilities for data profiling, analysis, standardization and verification, using reference data and rules in addition to statistical methods. The original matching engine is now just one component within a much larger set of solutions.

In fact, and this is what I find interesting, much of DataFlux’s focus is now on the larger issue of data governance. This has more to do with monitoring data quality than simple matching. DataFlux tells me the change has been driven by organizations that face increasing pressures to prove they are doing a good job with managing their data, for reasons such as financial reporting and compliance with government regulations.

The new developments also encompass product information and other types of non-name and address data, usually labeled as “master data management”. DataFlux reports that non-customer data is is the fastest growing portion of its business. DataFlux is well suited for non-traditional matching applications because the statistical approach does not rely on topic-specific rules and reference bases. Of course, DataFlux does use rules and reference information when appropriate.

The other recent development at DataFlux has been creation of “accelerators”, which are prepackaged rules, processes and reports for specific tasks. DataFlux started offering these in 2007 and now lists one each for customer data quality, product data quality, and watchlist compliance. More are apparently on the way. Applications like this are a very common development in a maturing industry, as companies that started by providing tools gain enough experience to understand how the applications commonly built with those tools. The next step—which DataFlux hasn’t reached yet—is to become even more specific by developing packages for particular industries. The benefit of these applications is that they save clients work and allow quicker deployments.

Back to governance. DataFlux’s movement in that direction is an interesting strategy because it offers a possible escape from the commoditization of its core data quality functions. Major data quality vendors including Firstlogic and Group 1 Software, plus several of the smaller ones, have been acquired in recent years and matching functions are now embedded within many enterprise software products. Even though there have been some intriguing new technical approaches from vendors like Netrics and Zoomix, this is a hard market to penetrate based on better technology alone. It seems that DataFlux moved into governance more in response to customer requests than due to proactive strategic planning. But even so, they have done well to recognize and seize the opportunity when it presented itself. Not everyone is quite so responsive. The question now is whether other data quality vendors will take a similar approach or this will be a long-term point of differentiation for DataFlux.

Wednesday, February 06, 2008

Red Herring CMO Conference: What Do Marketers Really Want?

I’ve spent the last two days at Red Herring Magazine’s CMO8 conference, which was both excellent and a welcome change from my usual focus on the nuts and bolts of marketing systems. Reflecting Red Herring’s own audience, the speakers and audience were largely from technology companies and many presentations were about business-to-business rather than business to consumer marketing. My particular favorites included AMD’s Stephen DiFranco, Ingram Micro’s Carol Kurimsky, and Lenovo’s Deepak Advani. DiFranco discussed how AMD manages to survive in the face of Intel’s overwhelming market position by focusing on gaining shelf space at retail and through other channel partners. Kurimsky described how her firm uses analytics to identify the best channel partners for her suppliers. Advani gave an overview of how Lenovo very systematically managed the transition of the ThinkPad brand from IBM’s ownership and simulataneously built Lenovo’s own brand. Although each described different challenges and solutions, they all shared a very strategic, analytical approach to marketing issues.

Personally I found this anything but surprising. All the good marketers I’ve ever known have been disciplined, strategic thinkers. Apparently, this is not always the case in the technology industry. Many in this audience seemed to accept the common stereotype of marketers as too often focused on creativity rather than business value. And, despite the prominent exception of Advani’s presentation, there seemed to be considerable residual skepticism of the value provided by brand marketing in particular.

The group did seem heartened at the notion, repeated several times, that a new generation of analytically-minded marketers is on the rise. Yet I have to wonder: the worst attended session I saw was an excellent panel on marketing performance measurement. This featured Alex Eldemir of MarketingNPV, Stephen DiMarco of Compete, Inc., Laura Patterson of VisionEdge Marketing, and Michael Emerson of Aprimo. The gist of the discussion was that CMOs must find out what CEOs want from them before they can build measures that prove they are delivering it. Obvious as this seems, it is apparently news to many marketers. Sadly, the panel had a magic bullet for the most fundamental problems of marketing measurement: that it’s often not possible to measure the actual impact of any particular marketing program, and that sophisticated analytics are too expensive for many marketers to afford. The best advice along those lines boiled down to “get over it” (delivered, of course, much more politely): accept the limitations and do the best you can with the resources at hand.

So, if the attendees voted with their feet to avoid the topic of marketing performance measurement, where did they head instead? I’d say the subject that most engaged them was social networks and how these can be harnessed for marketing purposes. Ironically, that discussion was much more about consumer marketing than business-to-business, although a few panelists did mention using social networks for business contacts (e.g. LinkedIn) and for business-oriented communities. The assembled experts did offer concrete suggestions for taking advantage of social networks, including viral campaigns, “free” market research, building brand visibility, creating inventory for online advertising, and creating specialized communities. They proposed a few new metrics, such as the number of conversations involving a product and the number of exposures to a viral message. But I didn’t sense much connection between the interest in social networks and any desire for marketers to be more analytical. Instead, I think the interest was driven by something more basic—an intuition that social networks present a new way to reach people in a world where finding an engaged audience is increasingly difficult. For now, the challenge is understanding how people behave in this new medium and how you can manage that behavior in useful ways. The metrics will come later.

Anyway, it was a very fun and stimulating two days. We now return to my regularly scheduled reality, which is already in progress.