Business Intelligence Products and Trends: in-memoryBusiness Intelligence Products and Trends

Thursday, August 18, 2011

Enterprise BI and agility

I'm talking to QlikTech about their new version, coming soon. I'll be publishing the results in the BI Verdict. They are focusing more and more on these bigger accounts. I think this story fits pretty well in my informal series of posts on the subject of agile BI. By coincidence I have already discussed QlikView in a previous post.

QlikTech has a what they call a "land and expand" policy, which means getting a single department on the tool and expanding from there. Actually, all BI companies that can deliver departmental solutions have something similar. The reason for this is simple: The cost of sales for selling to a company that is already using the tool is much lower than for completely new customers. In fact, a lot of BI tools spread through companies from department to department this way.

Now QlikTech is concentrating more on enterprise accounts. So it's interesting to see that the company is moving away from the previous claim that the tool is a replacement for a data warehouse. I think that any attempt on their part to compete as an enterprise solution would just distract them from their end users.

A lot of BI companies go through a similar life cycle as they grow. Most start out as ways to create departmental solutions, which tend to be faster, more agile projects. As they get bigger management tends to concentrate on larger accounts, which means making sure they are acceptable to the IT department. But IT is more interested in keeping processes running than in agile development. As a result, the products tend to become more complex and less suitable to agile solutions.

This is a big issue for QlikView right now because they have grown so quickly in recent years. But currently the company seems to backing away from radical changes in the tool. But it applies to and BI tool that is growing.

Thursday, June 30, 2011

Column oriented databases are not the same as in-memory databases

In recent years, thanks not least to aggressive marketing by QlikTech (or Qlik Technologies as the are now often called) Tableau and Tibco Spotfire, columnar databases and in-memory databases have become very fashionable. Microsoft's VertiPaq engine, which is behind the PowerPivot product, is a good example of a tool that came in on the wave of this trend.

One of the results of these is that there seems to be some confusion about what the terms "in-memory" and "column oriented" mean, and attributes of one are often attributed to the other.

Just to be perfectly clear: A columnar database is not necessarily in-memory, and an in-memory database is not necessarily columnar.

In-memory is a somewhat vague term, since, as Nigel Pendse likes to point out, all databases have to hold data in memory to process it -- the CPU cannot directly access the hard drive. However, I would say that unlike some other tools, IBM Cognos TM1 and QlikView are in-memory. These products load everything into memory before they do anything. If there is not enough memory to fit the entire data set, the load fails and that's that. The same applies to SAP HANA. But unlike QlikView and HANA, TM1 is a multi-dimensional database.

The loading behavior of an in-memory database is much different to the MOLAP engine in Analysis Services, which is fundamentally disk-based but has sophisticated paging abilities to keep as much as the data as possible in memory, or the column oriented Spotfire, which attempts to load everything but uses paging if there is not enough memory.

Columnar is a much clearer and simpler term. It simply means that the data is stored by column instead of by row. There are a large number of analytic databases with this architecture, such as Exadata, SAND, Greenplum, Aster, or Sybase IQ, just to name a few. Some, like Vertica and VertiPaq, even refer to their columnar architecture in their names. Some columnar databases are in-memory, but many are designed to deal with huge amounts of data, up to the petabyte range, and cannot possibly hold it all in memory.

By the way, what got me off on this rant is actually this blog about Endeca Latitude 2 which actually equates the two technologies, and a Linked-In discussion the author started (which is private, so I can't link it here) with the title "Is Data Modeling Dead?"

The idea in memory databases kill data modelling comes from the fact that columnar databases are often used to discover hierarchies, and a whole generation of so-called "agile" in-memory database tools use this method. But in-memory multi-dimensional databases are still around and still very useful for analyzing data on well defined structures such as financial data.