Wednesday, May 20, 2009

ETL tools in the world of data warehousing

ETL stands for Extract Transform Load. In a typical data warehouse we Extract the data from the source systems, Transform the data by applying any business rules, and Load the data into the target, hence the word ETL. 

There are many ETL tools in the market, resulting in confusion of which tool to use. Some people who are hardcore programmers believe that there is no need of an ETL tool in the development process of a data warehouse, as any programming language can be used to write ETL code. Though this may be true, there are many downsides to developing a data warehouse without an ETL tool. Many oracle shops tend to use PL/SQL for developing the code. This is feasible. However, over a period of time TCO (Total Cost of Ownership) of code based data warehouse is going to be very high in comparision to an ETL tool. Also, making changes and maintaining a code based data warehouse is going to get very difficult. Hence it is a very prudent approach to identify a ETL tool that fits the budget and then proceed. Any data warehouse developed using code based ETL is a short sighted approach, and is a disaster waiting to happen... for the people maintaining it.

If you are a SQL Server shop, then its a no-brainer to use SSIS (Sql Server Integration Services) which is very close to Informatica in terms of the approach. For other database platform users there are ETL tools like Informatica, Data Stage, ODI (formerly sunopsis), Abnitio e.t.c

Bottom line...If you are designing a data warehouse or planning to, make no mistake... consider an ETL tool. 

Consider yourself warned...

No comments: