Join Two Type-2 Tables and Rebuild History

I recently was tasked with building a new table with two Type-2 tables as a source and, not only maintain the history but, rebuild the history. Let’s look at the best way to join two type-2 tables together and then, more specifically, how to join the two historicized tables together while preserving the logical history of changes as though they were one.

Joining two type two tables with history maintained.
Join Two Type-2 Tables

The Setup:

Imagine you have two tables, typically type-2 dimensions, which have persisted historical records of changes: One for capturing your customer’s name over time, and another for tracking your customer’s phone number changes over time. In some cases, you may not have a historical record in one table, or the other, leaving an unknown name or phone number captured for a given customer. When re-building history you have to take care of these new periods of time where something hasn’t happened yet.

Now imagine that a customer changed her name three times and only on that third time did she also provide a phone number. That means, our third historical entry of her name change could potentially turn into five records, depending on when the phone number insert/update occurred.

Continue reading →

Every Other Week Flag for Date Dimension or Calendar Table

Data set of Every Other Week Flag using a Date Dimension
Data set of Every Other Week Flag using a Date Dimension

Regardless of your RDBMS you’ll find more than a handful of scripts online to add a Date Dimension or Calendar Table to your schema. One problem I’ve seen is they lack an every-other-week column. So, let’s explore a common SQL Server script and how to add an Every Other Week flag to our date dimension.

This method is written for Microsoft’s SQL Server, but it is ANSI standard and will work with any date dimension which has an integer day-of-week column, which I’ve never seen one that doesn’t…

Continue reading →

Properly Implementing Dynamic Dimension Descriptions

The topic of dynamic dimension descriptions isn’t new, but the method for which I’ve come up with is a hybrid of several other methodologies, making for a great subject piece. Using codes and descriptions in your warehouse dimensions is standard practice, even in some cases the use-case may require displaying a code versus a description, for example: displaying USA instead of United States of America, or even in situations where the business understands what a “TPS” report type is more so than the formal “Testing Procedure Specification” description. In any case, you’ll want a method that enables the ability to add and modify additional codes and their descriptions without having to perform risky manual updates to the warehouse.

In this post, I will discuss a rigid yet tolerant way for properly implementing dynamic dimension descriptions without ever directly modifying a warehouse dimensional table. Instead, we’ll implement a lookup table that an end user can insert and update freely, along with a robust ETL process that uses this same lookup table to perform description updates and even type 2 historical tracking, if necessary.

Dynamic Dimension Descriptions Using An ETL Lookup Table

Dynamic Dimension Descriptions diagram displaying how to load new, unknown codes.
Dynamic Dimension Descriptions diagram displaying how to load new, unknown codes.

Continue reading →

RI | Referential Integrity In The Data Warehouse

Referential Integrity in the Data Warehouse is a controversial topic amongst BI Professionals; you’re either all for it or all against it. While keeping the focus on RI, solely with regard to the Data Warehouse, I want to discuss some pros and cons to help you make a better, more informed decision that will have long-lasting impacts to your warehouse, development lifecycle, and production maintenance.

Meme handcuffed like database Referential Integrity constraints.
Meme: Handcuffed like database Referential Integrity constraints.

You need to understand that referential integrity is a decision, not a standard. Blindly implementing RI because you’re “supposed to” is a presumptuous, neglectful response to a complex problem. I think that everyone will agree that leaving constraints at the application layer is an apprentice mistake, nevertheless, when it comes to data it may be our best choice. Yet, still, having table-driven constraints is always preferred — if it makes sense!

When does it make sense to have Referential Integrity?

Continue reading →

Catch-All Join To A Lookup Dimension

I recently ran into an interesting problem that I’d like to share and show how I resolved it. The solution involves a catch-all join to a lookup dimension table.

ERD Diagram of wildcard lookup status table.
ERD Diagram of wildcard lookup status table.

Imagine having many employees that work in many departments. Each department has their own way of determining the employee’s status; Some departments use the status code that was given in the source system, other departments rely solely on the department they’re from and others use a combination of both! Oh yeah, the fun bit, this status logic can change…

Continue reading →