Use Google Sheets to automate your data pipeline development:
This isn’t the first time I’ve talked about a code generator or SQL Generation on this blog, but it’s worth discussing again because I wanted to talk about a recent project where I upped the ante on not just generating SQL, but generating the DDL & DML to support an entire ETL pipeline — all while self-documenting everything!
Throughout the project I was able to quickly test different indexing strategies without writing a single line of code.
It’s almost that wonderful time of year when you’ll wait till the very last minute and blab on about all things you did, some excuses why you didn’t, and hope you didn’t leave out anything important. It’s performance review time so, I wanted to share a couple things that I do which help me be prepared and draft the best possible self-evaluation to maximize my reward.
A common gripe I have when reviewing and maintaining other people’s SQL, aside from the obvious readability due to poor formatting, is their lack of aliases. In this post, I’d like to talk about why you should always reference aliases. The one major reason why, and the one minor reason when not to.
When maintaining and improving upon legacy code it’s easy to redefine standards, or define nonexistent standards, which have cascading effects on the pipeline. Take something as simple as an ill-conceived naming convention which, in hindsight, turned out to not make sense over time. At some point the benefits of rewriting lots of code outweigh simply sticking with the bad convention. In these cases consistency over standards become the convention.
I recently went to a forum hosted by Women In Technology where they invited a large group of young women, some in high school up to some in their first internship out of college, who were there to meet one-on-one with tech professionals to ask questions and interact with some of the industry’s best (not self-promoting; there were many fine, far-more-seasoned-engineers at this event). Think speed-dating but for gathering knowledge, which I’ve dubbed: Women In Tech Speed-Mentoring.
I, of course, promoted data; I promoted the hell out of data!
The event itself was unfortunately quite short, for the mentees anyway, for which we all only spoke one-on-one with three aspiring tech geeks (plus a few chats while waiting in the pizza line). Some were declared computer science majors while others were still unsure of a direction to go in this broad world of tech. I, of course, promoted data; I promoted the hell out of data! However, in doing so, I was taken back by a couple things and pleasantly surprised at others — all of which gave me a chance to really connect with them, I hope. These surprises made me want to write this post…
After a lot of swearing I decided to make this post, dynamic column names in SSRS 2005, to help the internet and give back to the huddled masses who still have to support this old reporting services instance from 2005. In this version, as all of my dead-end googling revealed, there is no way to dynamically or ordinal-positionally set the column names in your reports — which is insane! I mean, what year were Microsoft in, in 2005?
Anway, like I said, after a lot of swearing, I came up with a creative way to achieve dynamic column names in SSRS 2005 and this post will share how you can implement the same solution.
Don’t even get me started on dynamic data sources…
I’ve recently challenged myself to build a few extensions for the Chrome browser. I recently finished my first one, still working on the others, and was so proud of the results I thought I would make a post about it. The extensions is called Tab Colorizer and, if it wasn’t obvious, is a chrome extension which allows you to customize and change the color of your open tabs.
I recently was tasked with building a new table with two Type-2 tables as a source and, not only maintain the history but, rebuild the history. Let’s look at the best way to join two type-2 tables together and then, more specifically, how to join the two historicized tables together while preserving the logical history of changes as though they were one.
Imagine you have two tables, typically type-2 dimensions, which have persisted historical records of changes: One for capturing your customer’s name over time, and another for tracking your customer’s phone number changes over time. In some cases, you may not have a historical record in one table, or the other, leaving an unknown name or phone number captured for a given customer. When re-building history you have to take care of these new periods of time where something hasn’t happened yet.
Now imagine that a customer changed her name three times and only on that third time did she also provide a phone number. That means, our third historical entry of her name change could potentially turn into five records, depending on when the phone number insert/update occurred.