Interview Question | Weed Out Non Technical Candidates

Weed out non technical candidates by asking how to open notepad.
Weed out non technical candidates by asking how to open notepad.

I’ve since evolved a little since I posted this and would like to hone my focus of this post to a clearer target: Look For Ridiculous instead.

This is my hands-down, go-to interview question to weed out non-technical candidates. I have had such great success with this interview question and have shared it with other colleagues who have experienced equal success that I think it deserves a place on this blog. I’ve talked about this before, a little over a year ago is when I made my first public announcement of my new-found favorite interview question:

How do you open notepad on windows?

I typically preface the question with a short disclaimer like, “This question is a bit unorthodox and there really isn’t a wrong answer, having more than one answer is also equally acceptable…” This sets the stage for the actual question which I’ll phrase like “On a windows machine, regardless of version specifics, how do YOU open windows notepad?”

Continue reading →

SQL Foo | Count within a case statement

Just the other day I was asked to help a peer solve an SQL problem they were having; they were trying to count within a case statement which was proving to be problematic. They had several columns within the query, most using an analytical function to find the sum, min or max of various fields – but there was one instance where she needed to count the number of occurrences of a situation, a situation that required several other fields to determine, and without adding those other fields to the GROUP BY would break the logic.

Counting like a ninja with SQL.
Like a ninja, count within a case statement.

How to count within a case statement?

Continue reading →

Interview Question | How to find all duplicates in a table?

One of my favorite interview questions is to hand the candidate a marker and ask them to write out how to find all duplicates in a table. This should be straightforward and weeds out anyone who struggles with SQL; and even if they don’t struggle with SQL, this will be a good way to gauge where they’re at. But, it doesn’t stop there!

I’ve since evolved a little since I posted this and would like to hone my focus of this post to a clearer target: Look For Ridiculous instead.

After they successfully give an answer, typically one involving grouping by the business key having a count greater than one, which I’ll show in the first example below, that is a go-to correct response to this question. But, I throw them a curve ball, I’ll say “Great! Show me another way.” Then, I ask for another, and another, and another… with great power comes great responsibility being the interviewer is nefariously fun!

How to find all duplicates in a table.
How to find all duplicates in a table.

So, let’s take a look at some ways on how to find all duplicates in a table by exploring all the ways that I’ve come up with. Here’s our test data we’ll be working with, which has 2 sets of duplicates:

The goal is to learn how to find all the duplicates in a table and return only 2 rows of data, ie: that have more than one identical row, even though there are 5 rows of duplicates (three dupes of one row and two of another).

Continue reading →

RI | Referential Integrity In The Data Warehouse

Referential Integrity in the Data Warehouse is a controversial topic amongst BI Professionals; you’re either all for it or all against it. While keeping the focus on RI, solely with regard to the Data Warehouse, I want to discuss some pros and cons to help you make a better, more informed decision that will have long-lasting impacts to your warehouse, development lifecycle, and production maintenance.

Meme handcuffed like database Referential Integrity constraints.
Meme: Handcuffed like database Referential Integrity constraints.

You need to understand that referential integrity is a decision, not a standard. Blindly implementing RI because you’re “supposed to” is a presumptuous, neglectful response to a complex problem. I think that everyone will agree that leaving constraints at the application layer is an apprentice mistake, nevertheless, when it comes to data it may be our best choice. Yet, still, having table-driven constraints is always preferred — if it makes sense!

When does it make sense to have Referential Integrity?

Continue reading →

Row Number Window Function | First Occurrence In A Series

Using the Row Number Window Function to flag records if that row is the first occurrence in a series is a performant way to extract more value from your data. By using a Partition By clause we can group chunks of data together while ordering them to figure out which is the first in that series of data. Row Number, with the addition of the Over clause, allows us to achieve this without the use of a subquery by simply wrapping the function in a case statement.

A meme of how we put the fun in window functions!
A meme of how we put the fun in row number window functions!

An example of how to find the first occurrence in a series

Continue reading →

How To Scrape Links From Reddit Comments

I recently had the need to retrieve all the links from a particular subreddit, consolidate them, and keep a running list of new links that get added. Knowing how to scrape links from Reddit comments versus a Reddit’s posting is what this article is about. While I won’t go into the detail of storing the links into a database, I want to instead focus on the meat of the python script:

How to scrape links from Reddit Comments?

Python script used to scrape links from subreddit comments.
Python script used to scrape links from subreddit comments.

Continue reading →

Catch-All Join To A Lookup Dimension

I recently ran into an interesting problem that I’d like to share and show how I resolved it. The solution involves a catch-all join to a lookup dimension table.

ERD Diagram of wildcard lookup status table.
ERD Diagram of wildcard lookup status table.

Imagine having many employees that work in many departments. Each department has their own way of determining the employee’s status; Some departments use the status code that was given in the source system, other departments rely solely on the department they’re from and others use a combination of both! Oh yeah, the fun bit, this status logic can change…

Continue reading →