Tuesday, May 12, 2009

Bill Inmon Vs. Ralph Kimball

There is a continuing debate among Data Warehouse professionals as to which data warehousing approach to take; Inmon or Kimball. Bill Inmon is considered as the father of Data Warehousing and Kimball according to me is the genius in Data Warehousing. I might already sound biased but that's what my experience says.

When Bill Inmon was designing or was coming up with the concept of data warehousing, disk space (the single most important component of data warehousing after the data itself) was quite expensive. So any methodology that he has comeup with has to take disk space into consideration giving more importance to it than to data itself. On the other hand, Kimballs approach ignores (to an extent) the issue of disk space, 'coz the disk space got cheaper, and is greared fully towards data and its usage. This is the reason we see more data warehouse designers flocking to kimball's methodology instead of Inmons. Also, Inmons methodology is more expensive to develop and implement than Kimball methodology.

For those of you who don't know, Inmons methodology means creation of a normalized data warehouse and Kimball's methodology is creation of Dimensional data warehouse. When designing a data warehouse, the Architect has to decide the methodology. I feel that there is still a lot of confusion among the data warehousing community on which approach to take. I personally would always go for creation of a Dimensional Model. The simple reason being that its easy to develop, easy to modularize, and mostly easy to change direction of the data warehouse per business needs. Normalized method, concentrates more on the data integrity than data usability. Dimensional model, concentrates on the usability while providing data integrity. For every Normalized data warehouse there must exist a dimensional data mart to support reporting or a OLAP or MOLAP tool. So, why not create the base model as a dimensional model?

Ok, enough of praising the dimensional modeling. There are data warehouses that failed as a result of designing them as dimensional models. One has to be careful then designing the dimensional model, as there is more room to make mistakes, and more often than not, mistakes do happen. One of the most common mistakes is the use of Surrogate Keys or rather not using Surrogate Keys. There are some designers that believed that surrogate keys are not required because their source system generate a unique key always. Big mistake. This is the first step towards the failure of the dimensional model. Another issue that I have seen is when designing the bridge tables, designers are tempted to use natural keys instead of surrogate keys. This will result in eventual failure of the model over a period of time, and also gives very bad query performance.

One has to be very careful when designing a dimensional model because there are more chances of making mistakes with the dimensional model than with a normalized model. 

Bottom line, for a dimensional model to be successful, a designer has to have a detailed understanding of the underlying subject area. Else, the project is a disaster waiting to happen...

No comments: