I recently was tasked with building a new table with two Type-2 tables as a source and, not only maintain the history but, rebuild the history. Let’s look at the best way to join two type-2 tables together and then, more specifically, how to join the two historized tables together while preserving the logical history of changes as though they were one.
Imagine you have two tables, typically type-2 dimensions, which have persisted historical records of changes: One for capturing your customer’s name over time, and another for tracking your customer’s phone number changes over time. In some cases, you may not have a historical record in one table, or the other, leaving an unknown name or phone number captured for a given customer. When re-building history you have to take care of these new periods of time where something hasn’t happened yet.
Now imagine that a customer changed her name three times and only on that third time did she also provide a phone number. That means, our third historical entry of her name change could potentially turn into five records, depending on when the phone number insert/update occurred.
Look for something ridiculous? Not that long ago I used to think that the best interview questions were one where the candidate could answer the question in many different but accurate ways, rather than just one perfect way. Which, at the time, would have weeded out many incompetent colleagues, consultants, and managers. It seemed fool-proof!
Fast forward to today, after much more experienced in interviewing and matured in my expectations of nervous people. These people are just people – they just mostly suck. Like, foundationally terrible candidates, four out of every five. The cure is that while they may not perform like you expect, they may show other signs – much like how I thought the ability to Open Notepad was a single testament to skill – there is one great trait to look for: Passionate Interest! No matter the questions you ask a candidate if they get excited, geeky, inspired, passionate about the answer, then you’ve got something special.
My argument here is that one who would answer the questions the way you’d ideally-expect versus someone who may underperform, yet still be a contender, will still be equally expressive in a positively geeky way. While the true underperformers will break down in many obvious ways, the common denominator between them will undoubtedly be their disdain for coming to a solution; An eagerness to say something, rather than feeling something. These are the simple clues to look for. Look for something ridiculous and genuine. These are the people who will be there to bounce ridiculous ideas off of and fuel the team.
Nobody wants to work on a team where people only share details when you ask for them. I want to work on teams where people share information because they think it will be helpful even though you didn’t ask for it.
Regardless of your RDBMS you’ll find more than a handful of scripts online to add a Date Dimension or Calendar Table to your schema. One problem I’ve seen is they lack an every-other-week column. So, let’s explore a common SQL Server script and how to add an Every Other Week flag to our date dimension.
This method is written for Microsoft’s SQL Server, but it is ANSI standard and will work with any date dimension which has an integer day-of-week column, which I’ve never seen one that doesn’t…
In ETL, we often have to load many targets from a common set of base tables. Inevitably the targets are different enough that we have to create multiple queries or views to populate the many outputs of data. Which is fine, except now you’ve got yourself a maintenance nightmare, one that is avoidable. I’d like to share a trick with you to take a single query that can be recursively modified to dynamically change its structure to get different outputs. I call it, dynamic querying using block quotes.
The business case:
The sales department would like to take a single report that already exists and split it into two reports. The additional report will require different fitlers, aggregates, columns, and even joins!
When dealing with transactional data often there are many levels of granularity lying within. Finding these granularities exposes how your data is shaped as it accumulates and helps paint a better picture of what I like to call Lifes within the data. In this post, I want to share a technique I use to find data patterns which will be beneficial for everyone from the analyst to the architect.
Why do I refer to these data patterns as Lifes?
I haven’t found anything transactional in nature that doesn’t have some sort of recurring theme, with a distinct beginning and end, that couldn’t tell a story. It is these finite beginning/ends, start/stops, on/offs that paint the picture that is the “life” of the data. The life of these stories often have many sub-narratives and are interwoven within a single holistic life of the data. A great example is the familiar case of a customer purchase history. The customer is the holistic life of the data, their purchase orders, individual line items, and even a particular line item purchased over several purchase orders are all examples of sub-narratives within the story of a single customer.
For my Business Objects (BOBJ) SAP Data Services (BODS) folks out there, this post will indeed be helpful at some point during your exposure to the ETL tool. You’ll eventually find yourself digging through the Data Services repository trying to locate details about your jobs, reports, or anything that makes your dynamic processing easier. There are loads of forum posts on the subject and even a few nice posts that attempt to make sense of what the schema has to offer. But, there’s one thing that I couldn’t find and am sharing with you today: How to find BODS Workflow to Dataflow to Target Metadata for your target tables within BODS.
In my case, I needed to retrieve the number of rows inserted/discarded/deleted/updated for all target tables within all dataflows within a particular workflow. Since my job routine is divided into workflows, I didn’t need to take the granularity any farther but is certainly possible. Still, in my case, I had the need to retrieve metadata about what my workflow just did with having many dataflows within it all running in parallel. This parallelism introduced the need to dynamically pull these statistics from the repository rather than after each dataflow execution.
The topic of dynamic dimension descriptions isn’t new, but the method for which I’ve come up with is a hybrid of several other methodologies, making for a great subject piece. Using codes and descriptions in your warehouse dimensions is standard practice, even in some cases the use-case may require displaying a code versus a description, for example: displaying USA instead of United States of America, or even in situations where the business understands what a “TPS” report type is more so than the formal “Testing Procedure Specification” description. In any case, you’ll want a method that enables the ability to add and modify additional codes and their descriptions without having to perform risky manual updates to the warehouse.
In this post, I will discuss a rigid yet tolerant way for properly implementing dynamic dimension descriptions without ever directly modifying a warehouse dimensional table. Instead, we’ll implement a lookup table that an end user can insert and update freely, along with a robust ETL process that uses this same lookup table to perform description updates and even type 2 historical tracking, if necessary.
Dynamic Dimension Descriptions Using An ETL Lookup Table
Just for fun: Let’s look at some interesting ways to solve a multi-column pivot or unpivot (aka crosstab). Some methods are not ANSI standard and others simply won’t work on certain RDBMSs, but we’ll play around with some methods that I came up with using SQL Server 2016.