Thursday, June 30, 2011

Column oriented databases are not the same as in-memory databases

In recent years, thanks not least to aggressive marketing by QlikTech (or Qlik Technologies as the are now often called) Tableau and Tibco Spotfire, columnar databases and in-memory databases have become very fashionable. Microsoft's VertiPaq engine, which is behind the PowerPivot product, is a good example of a tool that came in on the wave of this trend.

One of the results of these is that there seems to be some confusion about what the terms "in-memory" and "column oriented" mean, and attributes of one are often attributed to the other.

Just to be perfectly clear: A columnar database is not necessarily in-memory, and an in-memory database is not necessarily columnar.

In-memory is a somewhat vague term, since, as Nigel Pendse likes to point out, all databases have to hold data in memory to process it -- the CPU cannot directly access the hard drive. However, I would say that unlike some other tools, IBM Cognos TM1 and QlikView are in-memory. These products load everything into memory before they do anything. If there is not enough memory to fit the entire data set, the load fails and that's that. The same applies to SAP HANA. But unlike QlikView and HANA, TM1 is a multi-dimensional database.

The loading behavior of an in-memory database is much different to the MOLAP engine in Analysis Services, which is fundamentally disk-based but has sophisticated paging abilities to keep as much as the data as possible in memory, or the column oriented Spotfire, which attempts to load everything but uses paging if there is not enough memory.

Columnar is a much clearer and simpler term. It simply means that the data is stored by column instead of by row. There are a large number of analytic databases with this architecture, such as Exadata, SAND, Greenplum, Aster, or Sybase IQ, just to name a few. Some, like Vertica and VertiPaq, even refer to their columnar architecture in their names. Some columnar databases are in-memory, but many are designed to deal with huge amounts of data, up to the petabyte range, and cannot possibly hold it all in memory.

By the way, what got me off on this rant is actually this blog about Endeca Latitude 2 which actually equates the two technologies, and a Linked-In discussion the author started (which is private, so I can't link it here) with the title "Is Data Modeling Dead?"

The idea in memory databases kill data modelling comes from the fact that columnar databases are often used to discover hierarchies, and a whole generation of so-called "agile" in-memory database tools use this method. But in-memory multi-dimensional databases are still around and still very useful for analyzing data on well defined structures such as financial data.

7 comments:

Elad Israeli said...

Looks like you and I have very similar ranting styles :-)

http://elasticube.blogspot.com/2010/09/in-memory-bi-is-not-future-its-past.html

Elad
Founder
SiSense

Anonymous said...

Hi Barney, I read your blog with great interest as you seem to be on to something but I am puzzled by your conclusion. I'm not sure what you mean in your last paragraph which says columnar databases discover hierarchies. And not sure how that makes them "model-less" or different from in-memory databases.

Wayne

Unknown said...

Wayne
The point here is that the difference between columnar and other types of databases (including multidimensional databases) has nothing to do with the issue of whether a database in in memory or not. About the hierarchies, I responded here:

http://bi-trends.blogspot.com/2011/07/discovering-hierarchies-in-columnar.html

Anonymous said...

Hey would you mind stating which blog platform you're working with? I'm loοκіng to staгt my oωn blog
soon but I'm having a tough time deciding between BlogEngine/Wordpress/B2evolution and Drupal. The reason I ask is because your design seems different then most blogs and I'm looking fοr sоmеthing cοmpletely unique.

P.S Sorry for being off-topic but І had to
ask!

Мy ωeb page :: http://naturalbreastenlargementstory.weebly.com/

Unknown said...

Hey there, i was able to learn so much about tableau and how to pass the certification test by taking help from the below mentioned website.
https://www.takethiscourse.net/tableau-certification/

IT said...

I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting Qlikview Online Training

Jack sparrow said...

I am so proud of you and your efforts and work make me realize that anything can be done with patience and sincerity. Well I am here to say that your work has inspired me without a doubt. Here is i want to share about c sharp training with Free Bundle videos and c sharp training online .


Post a Comment

Please leave a comment