Let’s chat about the distinct vs group by clause: I often see the misuse of GROUP BY with no aggregates when a DISTINCT would suffice. However, I more often see the misuse of a DISTINCT when a GROUP BY is more practical. I wanted to talk about some, maybe not-so-obvious, reasons to use one over the other. I won’t go into what the functional differences are, but more so the practical differences.
But first, is there really a technological difference between using DISTINCT vs a GROUP BY (sans aggregates, of course)? The quick answer is that it depends on the system, there could be differences between the optimizer on one of the many DBMS out there. However, in my tests, the likely more accurate answer is simply, No.
SELECT DISTINCT AGE FROM USERS;
SELECT AGE FROM USERS GROUP BY AGE;
These two queries not only produce the exact same result set, they produce the exact same query plans too. To see for yourself, take a look here. (Tested on SQL Server, Oracle, and Teradata)
Functionally, if used properly, either method is acceptable. But, allow me to create two scenarios that I think highlight two probable use cases and arguments for using one over the other:
- Conveying a thought through a query example without the end-user having to execute the query to understand said thought.
- Sharing a query with the intent to compound upon.
The first example is a method I often use when communicating with people where I need to illustrate a point, back it up with some evidence and move forward with the dialogue. By giving the recipient a query that says SELECT DISTINCT UTENSILS FROM UTENSIL_DRAWER; while explaining that we have no more spoons is a great way to convey your point. This is even clearer if your audience is less technical.
The second example is more likely, and one I try to use as a preference over the other. If you send a query to someone for them to expand upon, you don’t want them misguided by a DISTINCT when a GROUP BY makes the query writing less error-prone. Let’s say I shared this query instead SELECT UTENSILS FROM UNTENSIL_DRAWER GROUP BY UTENSILS; and then added: “We may be low on forks too.” This query is ripe for the user to begin modifying and easily determine a count of each utensil.
I know, it’s pedantic. However, I’ve seen the error of beginner SQL writers to simply “Just do a distinct…” and I’ve seen their eyes opened when properly using a group by to ever-evolving queries.
Have you ever had to troubleshoot a query to simply find they left a DISTINCT clause?