One of my favorite interview questions is to hand the candidate a marker and ask them to write out how to find all duplicates in a table. This should be straightforward and weeds out anyone who struggles with SQL; and even if they don’t struggle with SQL, this will be a good way to gauge where they’re at. But, it doesn’t stop there!
I’ve since evolved a little since I posted this and would like to hone my focus of this post to a clearer target: Look For Ridiculous instead.
After they successfully give an answer, typically one involving grouping by the business key having a count greater than one, which I’ll show in the first example below, that is a go-to correct response to this question. But, I throw them a curve ball, I’ll say “Great! Show me another way.” Then, I ask for another, and another, and another… with great power comes great responsibility being the interviewer is nefariously fun!

So, let’s take a look at some ways on how to find all duplicates in a table by exploring all the ways that I’ve come up with. Here’s our test data we’ll be working with, which has 2 sets of duplicates:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE TABLE FACT_ORDERS ( CUSTOMER_KEY INTEGER, ORDER_DATE_KEY INTEGER, ORDER_AMOUNT DECIMAL(15,2), ETL_LOAD_DATE DATE ); INSERT INTO FACT_ORDERS SELECT 1, 20160901, 812.19, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 4, 20160901, 5169.23, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 3, 20160901, 5169.23, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 4, 20160902, 5169.23, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 3, 20160901, 123.46, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 2, 20160901, 5169.23, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 3, 20160901, 123.46, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 1, 20160902, 812.19, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 1, 20160901, 5169.23, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 2, 20160901, 5169.23, CAST('2016-09-17' AS DATE); INSERT INTO FACT_ORDERS SELECT 2, 20160901, 5169.23, CAST('2016-09-17' AS DATE); |
The goal is to learn how to find all the duplicates in a table and return only 2 rows of data, ie: that have more than one identical row, even though there are 5 rows of duplicates (three dupes of one row and two of another).