SQL | Find Patterns In A Dataset

Find Data Patterns

Recently, I had a situation where there were a multitude of ways a set of events could occur to a particular data point over time – where I needed to know what every possible pattern of events had occurred to that single data point. In this post, I’ll walk through the scenario of when you’d want to do something like this and how find patterns in a dataset.

Imagine you have a source system which allows a customer to interact with your front-end application, like updating their profile, and you want to know how they update their profile and all the different patterns in which they go about interacting with your system. To do that, you’ll need to recursively join to your data and build that pattern, in a set-based way, achieving superior performance with a very large dataset.

Continue reading →

When To Use A CTE – And When Not To

When to use a CTE

Common Table Expressions

CTEs are a powerful feature of modern RDBMSs which allow you to do some very creative things with set-based data. Some systems even allow you to nest them inside of themselves for even more crazy, creative solutions. Let’s discuss when to use a CTE.

The word “common” from the acronym CTE (Common Table Expression) means you want to use a query more than once — because it’s common.

Continue reading →

ETL, DDL, & Self-Documenting Code Generator

gSheet Code Generator

Use Google Sheets to automate your data pipeline development:

  • ETL Generator
  • DDL Generator
  • DML Generator
  • Documentation Generator
  • Code Generator

This isn’t the first time I’ve talked about a code generator or SQL Generation on this blog, but it’s worth discussing again because I wanted to talk about a recent project where I upped the ante on not just generating SQL, but generating the DDL & DML to support an entire ETL pipeline — all while self-documenting everything!

Throughout the project I was able to quickly test different indexing strategies without writing a single line of code.

Continue reading →

It’s Performance Review Time!

Unimpressed cat is unimpressed.
Unimpressed cat is unimpressed.

It’s Performance Review Time

It’s almost that wonderful time of year when you’ll wait till the very last minute and blab on about all things you did, some excuses why you didn’t, and hope you didn’t leave out anything important. It’s performance review time so, I wanted to share a couple things that I do which help me be prepared and draft the best possible self-evaluation to maximize my reward.

Continue reading →

Consistency Over Standards

If our number one standard is consistency over standards then reviewing code becomes clearer.
Our number one standard is consistency over standards.

When maintaining and improving upon legacy code it’s easy to redefine standards, or define nonexistent standards, which have cascading effects on the pipeline. Take something as simple as an ill-conceived naming convention which, in hindsight, turned out to not make sense over time. At some point the benefits of rewriting lots of code outweigh simply sticking with the bad convention. In these cases consistency over standards become the convention.

Continue reading →

Women In Tech | Speed-Mentoring

Women In Tech Speed-Mentoring
Women In Tech Speed-Mentoring

I recently went to a forum hosted by Women In Technology where they invited a large group of young women, some in high school up to some in their first internship out of college, who were there to meet one-on-one with tech professionals to ask questions and interact with some of the industry’s best (not self-promoting; there were many fine, far-more-seasoned-engineers at this event). Think speed-dating but for gathering knowledge, which I’ve dubbed: Women In Tech Speed-Mentoring.

I, of course, promoted data; I promoted the hell out of data!

The event itself was unfortunately quite short, for the mentees anyway, for which we all only spoke one-on-one with three aspiring tech geeks (plus a few chats while waiting in the pizza line). Some were declared computer science majors while others were still unsure of a direction to go in this broad world of tech. I, of course, promoted data; I promoted the hell out of data! However, in doing so, I was taken back by a couple things and pleasantly surprised at others — all of which gave me a chance to really connect with them, I hope. These surprises made me want to write this post…

Continue reading →