Sunday, September 16, 2007

Tableau Software Makes Good Visualization Easy

I took a close look recently at Tableau data visualization software. I liked Tableau a lot, even though it wasn’t quite what I expected. I had thought of it as a way to build aesthetically-correct charts, according to the precepts set down by Edward Tufte and like-minded visualization gurus such as Stephen Few. But even though Tableau follows many of these principles, it is less for building charts than interactive data exploration.

This is admittedly a pretty subtle distinction, since the exploration is achieved through charts. What I mean is that Tableau is designed to make it very easy to see the results of changing one data element at a time, for example to find whether a particular variable helps to predict an outcome. (That’s a little vague: the example that Tableau uses is analyzing the price of a condominium, and adding variables like square footage, number of rooms, number of baths, location, etc. to see if they explain differences in the sales price.) What makes Tableau special is it automatically redraws the graphs as the data changes, often producing a totally different format, The formats are selected according to the fore-mentioned visualization theories, and for the most part are quite effective.

It may be worth diving a bit deeper into those visualization techniques, although I don’t claim to be an expert. You’ve probably heard some of the gurus’ common criticisms: ‘three dimensional’ bars that don’t mean anything; pie charts and gauges that look pretty but show little information given the space they take up; radar charts that are fundamentally incomprehensible. The underlying premise is that humans are extremely good at finding patterns in visual data, so that is what charts should be used for—not to display specific information, which belongs in tables of numbers. Building on this premise, research shows that people find patterns more easily in certain types of displays: shapes, graduations of color, and spatial relationships work well, but not reading numbers, making subtle size comparisons (e.g., slices in a pie chart), or looking up colors in a key. This approach also implies avoiding components that convey no information, such as the shadows on those ‘3-d’ bar charts, since these can only distract from pattern identification.

In general, these principles work well, although I have trouble with some of the rules that result. For example, grids within charts are largely forbidden, on the theory that charts should only show relative information (patterns) and you don’t need a grid to know whether one bar is higher than another. My problem with that one, in fact, it’s often difficult to compare two bars that are not immediately adjacent, and a grid can help. A grid can also provide a useful reference point, such as showing ‘freezing’ on a temperature chart. The gurus might well allow grid lines in some of those circumstances.

On the other hand, the point about color is very well taken. Americans and Europeans often use red for danger and green for good, but there is nothing intuitive about those—they depend on cultural norms. In China, red is a positive color. Worse, the gurus point out, a significant portion of the population is color-blind and can’t distinguish red from green anyway. They suggest that color intensity is a better way to show gradations, since people naturally understand a continuum from light to dark (even though it may not be clear which end of the scale is good or bad). They also suggest muted rather than bright colors, since it’s easier to see subtle patterns when there is less color contrast. In general, they recommend against using color to display meaning (say, to identify regions on a bar chart) because it takes conscious effort to interpret. Where different items must be shown on the same chart, they would argue that differences in shape are more easily understood.

As I say, Tableau is consistent with these principles, although it does let users make other choices if they insist. There is apparently some very neat technology inside Tableau that builds the charts using a specification language rather than conventional configuration parameters. But this is largely hidden from users, since the graphs are usually designed automatically. It may have some effect on how easily the system can switch from one format to another, and on the range of display options.

The technical feature that does impact Tableau users is its approach to data storage. Basically, it doesn’t have one: that is, it relies on external data stores to hold the information it requires, and issues queries against those sources as required. This was a bit of a disappointment to me, since it means Tableau’s performance really relies on the external systems. Not that that’s so terrible—you could argue (as Tableau does) that this avoids loading data into a proprietary format, making it easier to access the information you need without pre-planning. But it also means that Tableau can be painfully slow when you’re working with large data sets, particularly if they haven’t been optimized for the queries you’re making. In a system designed to encourage unplanned “speed of thought” data exploration, I consider this a significant drawback.

That said, let me repeat that I really liked Tableau. Query speed will be an issue in only some situations. Most of the time, Tableau will draw the required data into memory and work with it there, giving near-immediate response. And if you really need quick response from a very large database, technical staff can always apply the usual optimization techniques. For people with really high-end needs, Tableau already works with the Hyperion multidimensional database and is building an adapter for the Netezza high speed data appliance.

Of course, looking at Tableau led me to compare it with QlikTech. This is definitely apples to oranges: one is a reporting system and the other is a data exploration tool; one has its own database and the other doesn’t. I found that with a little tweaking I could get QlikView to produce many of the same charts as Tableau, although it was certainly more work to get there. I’d love to see the Tableau interface connected with the QlikView data engine, but suspect the peculiarities of both systems make this unlikely. (Tableau queries rely on advanced SQL features; QlikView is not a SQL database.) If I had to choose just one, I would pick the greater data access power and flexibility of QlikTech over the easy visualizations of Tableau. But Tableau is cheap enough—$999 to $1,799 for a single user license, depending on the data sources permitted—that I see no reason most people who need them couldn’t have both.