DISTINCT – Not as Sensitive as You Think

If you’ve ever used DISTINCT in a mysql query, you’ll want to pay very close attention to what I am going to say next.  It is not case-sensitive. “This” and “this” are equal, and it will only return one of them.  When you’re hashing strings, these two strings produce different results.  If you want hashes for both strings, that creates a problem.

This came to my attention as I was searching through passwords.  Every query was returning exactly one result.  It was mixed-case, as well.  I knew that in the original list, there was a password with different case.  For example, I’d look for “password” and the only result returned would be “Password”.  After some research, it was clear that DISTINCT is not case-sensitive.

The solution to this problem appears to come from this page, where it is suggested to use a query such as the following:

SELECT DISTINCT(BINARY name) AS Name FROM X;

or

select distinct colname COLLATE sql_latin1_general_cp1_cs_as 
from tablename;

So now, I’m re-importing all of the data.  Once that’s done, I’ll re-create my unique password table and re-generate the hashes.  When I am finished with that, we’ll take a look at how the hashes look. I’d also be curious to see what different cases people use.  That would be an interesting exercise in anticipating permutations.