Sunday, April 12, 2009

PivotLink: Flexible On-Demand Business Intelligence

I did a Webinar recently (click here for slides) about on-demand business intelligence systems, sponsored by Birst. It boiled to two key points:

- most of the work in business intelligence is in assembling the underlying database, even though the term “BI systems” often refers to the query and reporting tools (a.k.a. the “presentation layer”).

- vendor strategies to simplify BI include: using simpler interfaces, automation or pre-built solutions to make conventional technology easier; or using inherently more efficient alternative technologies such as in-memory and columnar databases.

Naturally, the full Webinar was jammed with much other wisdom (you missed Time-Traveling Elvis and the Reese’s Peanut Butter Cup Fallacy). But those two points alone provide a useful framework for considering business intelligence systems in general.

I bring this up because I’m finally writing about PivotLink, which I looked at more than a month ago. It turns out that my framework helps to put PivotLink into perspective.

Here’s the thing. PivotLink is an on-demand business intelligence product. Its most interesting technical feature is an in-memory columnar database. If you follow the industry, you know that type of database is about the geek-sexiest thing out there right now. I myself find it totally fascinating and had a grand time digging into the details.

But the rest of the world doesn’t care if it’s geek-ilicious:* they want to know how PivotLink can help them. Here’s where the framework comes in, since it clarifies what PivotLink does and doesn’t do. Or, to put that a bit more formally, it shows which elements of a complete business intelligence system PivotLink provides.

The answer to that being, PivotLink works largely at the presentation layer. It can import data from multiple sources and join tables on common keys. But it won’t do the complicated transformations and fuzzy matching needed for serious data integration. This means that PivotLink must either work with data that's already been processed into a conventional data warehouse or can be usefully analyzed in its original state.

There’s actually more of this naturally-analyzable data than you might think. Purchase transactions, a common source for PivotLink, are a good example. The obstacle to working with these has often been the size of the data sets, which meant lots of expensive hardware and lots of (how to put this delicately?) deliberately-paced IT support. These are exactly the barriers that on-demand systems overcome.

This brings us back to PivotLink's technology. In-memory, columnar databases are especially well suited for on-demand business intelligence because they compress data tightly (10% of the original volume. according to PivotLink), read only the columns required for a particular query (providing faster response) and don’t require need special schemas or preaggregated data cubes (requiring less skill to set up and modify).

But even columnar systems vary in their details. PivotLink sits towards the flexible end of the spectrum, with support for incremental updates, many-to-many table relationships, and abilities to add new columns and merge data along different query paths without reloading it. The system also allows calculations during the data load and within queries, caches data in memory and further compresses it after an initial query, and supports user- and group-level security at the row, column or individual cell levels. Not all column-based systems can say the same.

On the other hand, PivotLink does not support standard SQL queries and doesn’t run on a massively parallel (“shared nothing”) architecture. Both limits are typical of older columnar databases, a reminder that PivotLink began life in 1998 as SeaTab Software. Although shared-nothing architectures are generally more scalable, PivotLink is already running data sets with more than 10 billion rows in its current configuration. Response is very fast: according to the company, one client with several billion rows of point-of-sale data runs a nightly update and then executes a summary report in under one minute. Still, PivotLink recognizes the benefits of shared-nothing systems and plans to move to that architecture by the end of 2009.

Lack of SQL compatibility means users must rely on PivotLink’s tools for queries and reports. These let administrators import data from CSV, TXT and Excel files and map them to PivotLink tables. (The actual storage is columnar, with different compression techniques applied to different data types. But to the user, the data looks like it’s organized in tables.) The system reads the source data and makes suggestions about field types and names, which the user can accept or revise.

Users then define queries against the tables. Each query contains a selected set of columns, which are identified either as a header (i.e., dimension) or metric. When queries involve multiple tables, the user also specifies the columns to join on. Each report is written against one query.

The distinction between tables and queries is important in PivotLink, because it provides much of the system’s flexibility. The same column can be a dimension in one query and a metric in another, and the same tables can be related on different keys in different queries. All this happens without reloading the data or running any aggregations. The metadata used to build reports is defined by the associated query, not the underlying tables.

Reports are built by dragging headers and metrics into place on a grid, defining drill paths, and selecting tabular or chart formats. Reports can also rank and sort results and select the top or bottom rows for each rank. For example, a report could rank the top ten stores per region by revenue. Users can combine several reports into a dashboard.

End-users can apply filters and drill into the selected elements within a report. However, PivotLink does not apply filters for one report to the rest of the dashboard, in the way of QlikView or ADVIZOR. This feature is also on the agenda for 2009.

PivotLink clients can import data, define queries and build reports without help from the company. PivotLink said it takes a couple of days to train an administrator to load the data and build queries, a day or two to train a user to build reports and dashboards, and minutes to hours to learn to use the reports. Based on what they showed me, that sounds about right. You can also find out for yourself by signing up for a free trial.

Pricing of PivotLink is based on the size of the deployment, taking into account data volume and types of users. It starts around $3,000 per month for 50 million rows. When I spoke with the company in early March, they had about 60 clients supporting over 6,000 users, and had doubled their volume in the past year.

-----------------------------------------------------------------------------------------
* geek-alicious? is it possible to misspell a word that doesn’t exist?

No comments: