Let’s chat about the distinct vs group by clause: I often see the misuse of GROUP BY with no aggregates when a DISTINCT would suffice. However, I more often see the misuse of a DISTINCT when a GROUP BY is more practical. I wanted to talk about some, maybe not-so-obvious, reasons to use one over the other. I won’t go into what the functional differences are, but more so the practical differences.
But first, is there really a technological difference between using DISTINCT vs a GROUP BY (sans aggregates, of course)? The quick answer is that it depends on the system, there could be differences between the optimizer on one of the many DBMS out there. However, in my tests, the likely more accurate answer is simply, No.
0 1 2 3 |
SELECT DISTINCT AGE FROM USERS; SELECT AGE FROM USERS GROUP BY AGE; |
These two queries not only produce the exact same result set, they produce the exact same query plans too. To see for yourself, take a look here. (Tested on SQL Server, Oracle, and Teradata)
