Solution to DISTINCT Case Problem

First off, take a look at my post from yesterday.  I was given to understand that DISTINCT is not case-sensitive.  The issue at hand actually has less to do with DISTINCT itself, and more to do with the character set and collation that are used when comparing strings.  The best way to describe this comes from the MySQL documentation:

Suppose that we have an alphabet with four letters: A, B, a, b. We give each letter a number: A = 0, B = 1, a = 2, b = 3. The letter A is a symbol, the number 0 is the encoding for A, and the combination of all four letters and their encodings is a character set.
Suppose that we want to compare two string values, A and B. The simplest way to do this is to look at the encodings: 0 for A and 1 for B. Because 0 is less than 1, we say A is less than B. What we’ve just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): “compare the encodings.” We call this simplest of all possible collations a binary collation.
But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters a and b as equivalent to A and B; (2) then compare the encodings. We call this a case-insensitive collation. It is a little more complex than a binary collation.
In real life, most character sets have many characters: not just A and B but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents (an “accent” is a mark attached to a character as in German Ö), and for multiple-character mappings (such as the rule that Ö = OE in one of the two German collations).

So it’s more about how the table is configured to sort strings.  This is the collation.  In my case, the table is set to have a collation of “utf8_unicode_ci”.  When I grab all of the unique passwords, then, I can use a utf8 binary collation:

select distinct password collate utf8_bin from table;

This will maintain all of the upper-case and lower-case permutations of a given password.  By doing this, I can generate all of the hashes of the absolutely unique passwords known to have been used by someone from the data set that I have.

So the next time you use DISTINCT in a query, you may want to make sure you’re getting what you think you are.

DISTINCT – Not as Sensitive as You Think

If you’ve ever used DISTINCT in a mysql query, you’ll want to pay very close attention to what I am going to say next.  It is not case-sensitive. “This” and “this” are equal, and it will only return one of them.  When you’re hashing strings, these two strings produce different results.  If you want hashes for both strings, that creates a problem.

This came to my attention as I was searching through passwords.  Every query was returning exactly one result.  It was mixed-case, as well.  I knew that in the original list, there was a password with different case.  For example, I’d look for “password” and the only result returned would be “Password”.  After some research, it was clear that DISTINCT is not case-sensitive.

The solution to this problem appears to come from this page, where it is suggested to use a query such as the following:



select distinct colname COLLATE sql_latin1_general_cp1_cs_as 
from tablename;

So now, I’m re-importing all of the data.  Once that’s done, I’ll re-create my unique password table and re-generate the hashes.  When I am finished with that, we’ll take a look at how the hashes look. I’d also be curious to see what different cases people use.  That would be an interesting exercise in anticipating permutations.

More and More Data

Man, this server screams.  I’ve been able to do in a week on the new server what it took two months to do on the old one.  In the old server, I had about 1.7 Billion records.  I’ve moved those over to the new server.  Additionally, I have acquired and imported about 984 Million additional records from 34 other data breaches.  I’m still missing some of the ones I’d like, but I’ll keep looking.  Of the total of 2.6 Billion records, there are about 500 Million that are unique.  In terms of hashes, I have MD5, NTLM, SHA1, and the MySQL PASSWORD() hash for 400 Million of those records.  Now, I just have to do those hashes for the remaining 100 Million.  Once those are all done, I’ll take a look at what other types of hashes would be useful.  I was also thinking about taking a look at the most common usernames.

Getting excited for BSidesSLC and HackWest which are coming up soon.  I was also approached by 74rku5 of HackWest to be a Pros vs Joes team captain.  I told him I would, so we’ll see how much trouble we can get ourselves into.  Sounds fun.  And a little scary (see Gabriel Ryan’s tweet).

Lots of Data Dumps – Server Finally Finished

Wow, there have been a lot of breaches.  I’ve been processing data from the Breach Compilation database.  As I’ve been doing that, I have also been finding, researching, and downloading others.  320 Million here, 100 Million there.  So far, I have 56 databases to work with.  Who knows how many hundreds of millions of records that is going to be.  Then, I have to pull out just the unique passwords.  That’s really what I’m interested in.  I have zero interest in usernames, email addresses, or really anything else associated with each password.  For the time being, I’m just interested in hashing all the unique passwords I can find that have been dumped.  After that, I’ll generate all the hashes I can think of for weak password audits and hash lookups.

As one example of a huge database, Troy Hunt from Have I Been Pwned has recently released a collection of just about half a billion sha1-hashed passwords.  He previously released one containing over 300 Million passwords, also hashed with sha1.  Many databases available right now have hashed passwords in them.  But for a quick and easy way to get plain-text versions of some of them, you could take a look at what has to offer.  They have lists of plain-text versions of some of the hashed passwords in many of these databases.

But as for me, I’m finally finished building my database server.  It’s a Dell PowerEdge r710 with 96GB RAM, 6 x 500GB Mushkin Reactor SSDs in a RAID 10, and 16 x Intel(R) Xeon(R) X5560 @ 2.80GHz cores.  My other database server has created NTLM hashes for 382M of the 400M passwords that I currently have.  Once it’s finished, I’ll move everything over to the new server, and things should go much more quickly from there.  For the moment, though, I’m prepping the other 50+ data dumps to be imported into the new server.

CTF Practice

Last year, at BSidesSLC, there was a capture-the-flag (CTF) contest.  Having neither practiced nor ever done one before, I was hesitant to enter.  However, 74rku5 invited me to try anyway.  Even having started 4 hours late, my efforts were awarded with third place and a Bash Bunny.  Very fun toy.  It does some amazingly cool things.  There’s even a github repo for all the payloads.

This year, I’m practicing to try and get ready for it ahead of time.  To do this, I’m working through some challenges from  Their challenges in many cases feel similar to the ones at last year’s CTF.  Going through the challenges is entertaining… sometimes it’s incredibly frustrating, but getting the answer correct is quite rewarding.  Another great site for this is

If you’re interested in CTF, you might want to take a look at those sites.  As you go through challenges, you’ll find useful tools online.  As you do that, bookmark the tools you’ve found most useful.  ASCII tables, different types of converters, and other tools that help you solve the challenges can be useful in a CTF contest.  Try it out!

The Easiest Metasploit Guide You’ll Ever Read

It’s finally finished!  “The Easiest Metasploit Guide You’ll Ever Read” is a guide for folks who are “good with computers.”  It targets those who would like to know how to use Metasploit, but haven’t really much direction in where to start.  This guide covers the installation of Kali Linux, Metasploitable 2, and Nessus.  It explains how to use Metasploit and Nessus together to exploit Metasploitable 2.  We also look at how to determine which Metasploit module to use to exploit vulnerabilities in Metasploitable 2.

Download Your Copy Here

For folks who don’t care for PDFs,

View the HTML Version Here

AI to Identify Cipher from Ciphertext

One of the most difficult parts of cracking a ciphertext is knowing which cipher was used to generate it.  What if there was a tool that would use text analysis to attempt to identify the cipher?  One of the interesting projects I’m working on at the moment is just that: an artificial intelligence that uses statistical probability to try and identify the cipher that was used to generate a given ciphertext.

The basic idea is that it will perform character frequency analysis and calculate the index of coincidence.  I will be adding other types of data points later.  It uses these data points to calculate the probability that it is each cipher given the particular data point.  I’m using Naive Bayes to make the decisions.  When it’s operational, and if it has any kind of accuracy, I might put it up somewhere for folks to use.

It’s been quite the challenging project, though.  Right now, it does base64, railfence, patristocrat, and vigenere.  Obviously, I have quite a ways to go for adding ciphers.  But I also have many other data points that I would like to add, as well.

I’ll have to put it up on github when it’s more user-friendly.  Then, people can play with it, help me improve its accuracy, add more ciphers, and add more data points.

Strengths and Weaknesses of Authentication Factors


In preparation for HackWest, I was reading through some of the talks that will be given.  One stood out to me as interesting: Sherrie Cowley’s talk on breaking multi-factor authentication.  I wanted to see what strengths and weaknesses I could come up with about the different types.  I’m sure much more research could be put into this, and I’m betting that Sherrie has done quite a bit.  But here’s what I came up with.

Something You Know

This generally refers to passwords, so I’ll focus on that.  Here are a couple of things that I feel are weaknesses of passwords:

  1. Password complexity policies are generally insufficient.  Companies attempt to create complex passwords through requiring that they be at least eight characters, and include 3 of 4 categories, usually:
    • Upper-case letters
    • Lower-case letters
    • Numbers
    • Special characters (like !@#$%^&*()_+-=)

The problem here is that “P@ssword!” is acceptable to this policy.  However, it’s probably one of the first words in a brute-force password list.  If I had to pick between complexity and length, I’d pick length.  Many are familiar with the XKCD comic:

Password Strength


2.  As the comic suggests, people often pick passwords that are hard to remember, but easy for computers to guess.  One thing that will help is to select a complete sentence.  This way, you have a long password which also complies with the 3-of-4 rule, as set in point 1.

3. People often re-use passwords in multiple places.  To solve this one, use a password management tool, such as LastPass or KeePass.  Make sure you use a different password for each and every thing.

4.  Passwords have to be stored.  And even if they are hashed, that means that they can be dumped.  In my opinion, the only effective way to store them is by using a salted hash.  The salt does not get stored, but is computed and unique for each user.

Strengths I came up with for using passwords:

  1. Passwords can usually be changed rather easily.  And I would recommend that you do.  It’s not so much that someone will guess it, in my opinion.  It’s that it may have been compromised through a password hash hack or something similar.  I know that this happens all the time, because I have a database of 1.3 Billion dumped passwords.  All I have to do is hash them however I want, and then compare hashes, and I have the original password.  So, change it often.

Something You Are

This generally refers to biometrics.  It includes things like fingerprints, retinal scans, facial recognition, or even DNA.

Here are some weaknesses that I came up with for this:

  1. The biggie here is that you cannot change them.  If someone finds out how to replicate any of these, you have a big problem.
  2. It’s not as hard as you might think.  Take a look at the following:

To be honest, I could not readily come up with strengths for biometrics, except that people think it’s hard to do, so they don’t try.  But that one’s pretty weak, as we can see that it is easy.

Something You Have

This would be something like a private SSH key, a one-time-password, a physical key, or an RFID key.  I felt like this was probably the strongest one.

Here are the strengths that I came up with:

  1. They are very easy to change.  As a matter of fact, the one-time-password changes every 30-60 seconds. You lose a physical key, you do have to change the pins on the locks, but it’s doable.
  2. You do not have to remember anything.  You just pull them out and use them when you need them.
  3. They are fairly difficult to figure out, especially the longer they are.  If you use a 8192-bit RSA private SSH key, you are in very good shape.  And you’d have to brute-force the one-time password in around 1.5 to 3 minutes.  That may not be hard, but usually, it’s used as part of a multi-factor setup, which is what makes it very hard.

Weaknesses for this one:

  1. It’s only as good as the security you use to protect it.  If you post your private key online, as Adobe recently did, well, that isn’t good.  You have to keep it locked down.
  2. Physical keys are very easy to duplicate.
  3. RFID cards are very easy to duplicate.
  4. If you’re dealing with keyed locks, they can be easy to pick (but not always).

So, that’s what I came up with.  We’ll have to see what Sherrie says on the matter.

160M NTLM Hashes & Intro to Metasploit Progress

This hashing project could probably use a sponsor.  My hardware is taking quite awhile to generate the hashes.  It’s been hashing for just over a week, and we’re at about 160M NTLM hashes.  At that rate, it’ll be finished in 2 more weeks.  When it’s done, we’ll have to see what we can do with it.  But for the time being, I plan on using it to test for weak passwords.

Also, I think I’m getting pretty close to completing the first draft of “The Easiest Metasploit Guide You Will Ever Read.”  I’ll post it here and mirror it somewhere for folks who might be interested in getting familiar with Metasploit.  I’ll tell you up front that I do not consider myself any kind of expert with it.  However, writing this guide (and making sure it’s correct) have taught me quite a bit about how to use Metasploit.  At the time of this writing, it weighs in at about 78 or so pages.  It has links to downloads, exact commands to run, explanations of what we’re doing, and screenshots.

73.2M Hashes and Metasploit for Beginners

The BreachCompilation database weighs in at 1.4 Billion records.  Out of that, there are about 400 Million unique passwords.  That is a lot of password reuse.  The goal right now is to create NTLM hashes for those 400M passwords.  This will aid in weak password audits.  So far, we’ve hashed about 73.2M of those passwords.  So, we have a ways to go, but we’re making some good progress.

Also, once upon a time, I wrote “The Easiest Linux Guide You’ll Ever Read.”  Now, I’m working on one called “The Easiest Metasploit Guide You’ll Ever Read.”  In it, I will walk the reader through setting up a lab in VMWare Workstation, including Metasploitable 2, Nessus, and Kali Linux.  We’ll look at Nessus, some basic scanning, and how to use that information to better help us use Metasploit.  Then, we will go through compiling that information and researching which exploits to use.  We’ll even  get a root shell or two along the way.  When it’s ready, I’ll post it for anyone who would like to learn Metasploit from the ground up.