SQL Foo | Creative Use of Null Values

One of my favorite uses of set-data manipulation involves using NULL values to my advantage; from NULLIF to COALESCE, we’ll explore some creative use of null values. These tips & tricks aren’t just a way to convert a NULL to another value, they’re a multi-purpose, insanely powerful way to massage and combine data.

NULL Galaxy - Creative use of null values
NULL Galaxy – Creative use of null values

Level Set: What is a NULL?

In SQL a NULL value isn’t a value at all – it’s lack of value. It’s a value that is indicative of not having a value. Think of it as if you asked someone a question but they didn’t respond, their response was a NULL value. So, therefore you cannot compare someone’s non-response to someone else’s non-response, while they seem like the same answer they’re likely not even the same question!

There are some exceptions to this rule (thanks, Microsoft) but, like most of my blog posts, I try to stick with ANSI standard rules. While some RDBMSs treat NULL differently and even have switches that can be set during runtime to alter how NULL logic works, we’re not going to go there. Just assume ANSI 99.

Creative use of NULL within a join:

Since we all know that NULL = NULL doesn’t shouldn’t return true because a NULL isn’t a literal. This creates havoc with old code and with some RDBMSs. Take this join for example:

Here’s the same thing but instead we’ll use coalesce to our advantage:

Notice that we simply just replace any NULLs with a valid value, in this case, the string “<NULL>”. The idea here is to pick a value that isn’t likely in the data otherwise, you’ll cartesian.  For numbers, I’ll often use -1. For dates, I’ll often use 12/31/1899 (the day before 1/1/1900). However, this trick ONLY works if you’re 100% sure the chosen not-null value isn’t in the data.

But what if any value is possible and you just caused a cartesian?

This is the best way to join two data sets when any value is possible on the join condition. Typically this occurs when you join on something that users can input within a free-text field. It’s rare, but this is the solution.

Data Cleansing technique using NULLs to our advantage:

I often find myself needing to convert empty-set data to another value. As we know, NULL in the data warehouse can be a bane, especially when dealing with dim & fact relationships and aggregates. Because of this, we often need to convert missing data into something more consistent, like “UNKNOWN”. But there’s a really cool trick to use instead of writing a nasty case statement. Let’s take a look at the wrong way:

Now, let’s look at the same method using a cleaner approach:

This is one of my favorite uses of simultaneously converting a NULL along with some other value. I find myself using this technique in many different ways but always with the intent to quickly manipulate data back down to a NULL then back up to a single, consistent value again.

Solving division-by-zero errors using NULL values:

Take this simple division as an example:

The first query will result in a “Divide by zero” error, while the second result will return a NULL value. This beats adding a where clause to check if the value will be zero first, which in some cases the optimizer will place outside the order of operations anyway, causing the same error

T-SQL Only – Pivot and Concatenate using Coalesce:

This will take the column names within the database and concatenate them together in a pipe separated list. This is a powerful way to quickly create lists of data which are helpful in SQL Generators and dynamic functionality within SQL server. This is possible within other RDBMSs but is much messier and wanted to call out the simplicity of SQL server’s T-SQL.