Recently, I had a situation where there were a multitude of ways a set of events could occur to a particular data point over time – where I needed to know what every possible pattern of events had occurred to that single data point. In this post, I’ll walk through the scenario of when you’d want to do something like this and how find patterns in a dataset.
Imagine you have a source system which allows a customer to interact with your front-end application, like updating their profile, and you want to know how they update their profile and all the different patterns in which they go about interacting with your system. To do that, you’ll need to recursively join to your data and build that pattern, in a set-based way, achieving superior performance with a very large dataset.
A common gripe I have when reviewing and maintaining other people’s SQL, aside from the obvious readability due to poor formatting, is their lack of aliases. In this post, I’d like to talk about why you should always reference aliases. The one major reason why, and the one minor reason when not to.
When maintaining and improving upon legacy code it’s easy to redefine standards, or define nonexistent standards, which have cascading effects on the pipeline. Take something as simple as an ill-conceived naming convention which, in hindsight, turned out to not make sense over time. At some point the benefits of rewriting lots of code outweigh simply sticking with the bad convention. In these cases consistency over standards become the convention.
One of my favorite uses of set-data manipulation involves using NULL values to my advantage; from NULLIF to COALESCE, we’ll explore some creative use of null values. These tips & tricks aren’t just a way to convert a NULL to another value, they’re a multi-purpose, insanely powerful way to massage and combine data.
Level Set: What is a NULL?
In SQL a NULL value isn’t a value at all – it’s lack of value. It’s a value that is indicative of not having a value. Think of it as if you asked someone a question but they didn’t respond, their response was a NULL value. So, therefore you cannot compare someone’s non-response to someone else’s non-response, while they seem like the same answer they’re likely not even the same question!
Null logical comparison
There are some exceptions to this rule (thanks, Microsoft) but, like most of my blog posts, I try to stick with ANSI standard rules. While some RDBMSs treat NULL differently and even have switches that can be set during runtime to alter how NULL logic works, we’re not going to go there. Just assume ANSI 99.