Recently, I had a situation where there were a multitude of ways a set of events could occur to a particular data point over time – where I needed to know what every possible pattern of events had occurred to that single data point. In this post, I’ll walk through the scenario of when you’d want to do something like this and how find patterns in a dataset.
Imagine you have a source system which allows a customer to interact with your front-end application, like updating their profile, and you want to know how they update their profile and all the different patterns in which they go about interacting with your system. To do that, you’ll need to recursively join to your data and build that pattern, in a set-based way, achieving superior performance with a very large dataset.
CTEs are a powerful feature of modern RDBMSs which allow you to do some very creative things with set-based data. Some systems even allow you to nest them inside of themselves for even more crazy, creative solutions. Let’s discuss when to use a CTE.
The word “common” from the acronym CTE (Common Table Expression) means you want to use a query more than once — because it’s common.
A common gripe I have when reviewing and maintaining other people’s SQL, aside from the obvious readability due to poor formatting, is their lack of aliases. In this post, I’d like to talk about why you should always reference aliases. The one major reason why, and the one minor reason when not to.
When maintaining and improving upon legacy code it’s easy to redefine standards, or define nonexistent standards, which have cascading effects on the pipeline. Take something as simple as an ill-conceived naming convention which, in hindsight, turned out to not make sense over time. At some point the benefits of rewriting lots of code outweigh simply sticking with the bad convention. In these cases consistency over standards become the convention.
In ETL, we often have to load many targets from a common set of base tables. Inevitably the targets are different enough that we have to create multiple queries or views to populate the many outputs of data. Which is fine, except now you’ve got yourself a maintenance nightmare, one that is avoidable. I’d like to share a trick with you to take a single query that can be recursively modified to dynamically change its structure to get different outputs. I call it, dynamic querying using block quotes.
The business case:
The sales department would like to take a single report that already exists and split it into two reports. The additional report will require different fitlers, aggregates, columns, and even joins!
Before! The comma separator before field name is always preferred. There, that was easy.
You’ve come here to either win an argument with a coworker — in which case I hope you’re here to find proof for having the comma separator before field names, or you’re doing it wrong — or you’re here to learn. In either case, the comma comes before field names. So, allow me to justify when and, more importantly, why I use one variation over the other:
Example of both comma separators
Example of a comma before & after the field name.
You’ll notice that both these queries are very much identical, with the exception of the placement of the delimiting comma between each field in the select clause of course.
I’m aware that many of my readers of this site likely already know how to write sql to generate sql, or sql generators, but it’s not really much of a data blog if I don’t mention it. It’s also important to mention the time value this trick represents, with a few minutes of time, you can generate thousands of lines of code that would otherwise take hours to write. Let’s chat about one of the first things I learned about SQL which, no doubt, blew my mind:
Generate SQL with SQL.
Or to put another way, generating SQL statements via a single SQL command. Really cool stuff if you haven’t seen this before, keep reading!
Let’s say you wanted to find the number of badges for each type of badge in a table of badges. This is quite easy with a simple query, but let’s see how we can do it by writing SQL with SQL:
Writing SQL with SQL example INPUT.
+'''AS VARCHAR(50)), COUNT(1) FROM BADGES WHERE BADGES.NAME = '''