SQL | Find Patterns In A Dataset

Find Data Patterns

Recently, I had a situation where there were a multitude of ways a set of events could occur to a particular data point over time – where I needed to know what every possible pattern of events had occurred to that single data point. In this post, I’ll walk through the scenario of when you’d want to do something like this and how find patterns in a dataset.

Imagine you have a source system which allows a customer to interact with your front-end application, like updating their profile, and you want to know how they update their profile and all the different patterns in which they go about interacting with your system. To do that, you’ll need to recursively join to your data and build that pattern, in a set-based way, achieving superior performance with a very large dataset.

Let’s look at the data:

Yup, this example is in MySQL 8…

So, looking at the data set above, we can see that the pattern for ID #1 is “A, B,C” and the pattern for ID #2 is “A,E”. But first, we have to prepare the data and give it some sort of order to work with. Typically, you’d want to order this data by some sort of event-indicator, like a timestamp; For this example, we’ll just use the order of the alphabet. Take a look:

Recursively Find Patterns In A Dataset:

Next, we want to use a recursive CTE (Common Table Expression) to join back to itself to concatenate the previous pattern to the next row of data. So, if we look at ID #1 and it’s 3 rows of data. The first row’s pattern is “A”, the second row is “A,B” and the third and final row is “A,B,C” — which is the pattern we’re looking for at this datapoint. Here’s the query to achieve that:

To get this to work on Postgres, simply change the CONCAT() to ||

The output of the above query will return three records for each of the three IDs in our example, along with its corresponding pattern. Below is the output and if you click here, you can find a Fiddle with this entire example.

Hope someone finds this helpful.