Geeks With Blogs
Random Musing Putting it together one piece at a time...

SCD (Slow Changing Dimension) has been a very common scenario for data warehouses and there are many established and proven techniques that can be used to solve the problems. This includes the usage of SCD Type II (Insert only) or SCD Type I (Update only) methodology. When we start to look at designing a system that caters for real time information, it's very common that we will meet the same problems, with the exception that the dimension doesn't change slowly, but rapidly. Moreover, they normally are very large in size.

Adopting the SCD Type II methodology will often mean that we will need to insert new records into the same dimensional table. This is fine and works well if the table is not too large and that the changes doesn't happen frequently. However, if the table is some multi-million size large and changes happens every few minutes, we will need a better solution than storing them in one large SCD table.

The solution to this is to break out that dimensional table into sub dimensional tables which are rapidly changing in nature and add another foreign key into the fact table to reference that key. This technique can be applied to any dimensional attribute or to any more dimensions so that loading and data insertion can be kept to a manageable size.

For example, if there's a Customer Dimension table

Customer Key        Name     Country
1001             John USA

If John changed his state to Australia, we can justify this as a SCD Type II (for historical recording) and insert the table so that we have

Customer Key        Name     Country
1001             John USA
1002            John     Australia

and in the fact table we can have something like

Fact Table
Fact_ID            Customer Key    Field1
1             1001        $10
2            1002        $10

and there we have our typical SCD Type II. However, if this Customer Dimension table is multi-million rows in size with changes happening every few minutes, and that you need to have an ETL process that runs it for every 15 minutes, it will an issue if this table is to be updated as a normal TYPE II. The solution is to split it up into manageable size.

Customer Dimension
Customer Key    Name
1001        John

Country Dimension
Country Key    Country
2001        USA
2002        Australia

Then the fact table will be

Fact ID    Customer Key    Country Key    Field1
1    1001        2001        $10
2    1001        2002        $10

With this technique, you can then further split your dimensions into different groups and have the database manage a smaller group of data insertions for fields that change and fields that doesn't change as often. On top of that, you can subgroup the attributes to record ranges instead of absolute figures.

However, even with this technique, if and when possible, a denormalized table will always be better than an over-normalized tables schema in a data warehouse


Posted on Monday, September 3, 2007 9:34 AM Business Intelligence | Back to top

Comments on this post: Designing for Slowly Changing Dimension and Rapid Changing Dimensions

No comments posted yet.
Your comment:
 (will show your gravatar)

Copyright © Nestor | Powered by: