BreachCompilation Analysis

Database dumps containing user credentials are not a new thing.  One of the more recent contains 1.4 Billion email addresses and passwords.  This database is referred to as “BreachCompilation.”  More information about this database can be found at the end of this post.  A few weeks ago, I started to wonder what kinds of interesting information I could derive from this database.  Was this for hacking?  Of course not.  It was for research.  I wanted to answer the following questions:

  1. How much password reuse occurs?
  2. Could I build a password list similar to RockYou?
  3. Could I build hash lists to test for weak passwords?

Because it was 1.4 Billion credentials, that would mean that I’d need a pretty beefy server to import them into.  Since I really only had one choice, I went with my Dell PowerEdge 2950.  At the time, it had 16 GB RAM in it.   When I finally did finish getting everything imported 7 days later, my order of 64 GB RAM arrived.  I was able to re-optimize my MariaDB installation, and throw an index on the password field.  That only took a few hours compared to the week it took to import the data.

At this point, I am pulling out all of the unique passwords along with how many times each one was used.  That’s all going into its own table.  This will answer questions number 1 and 2 (above) for me.  It will also give a foundation from which I can answer question number 3.

The idea there is that initially, I will have the list of unique passwords along with the count of how many time each is used.  However, I can take each password and generate an md5sum hash from it.  Or a sha1sum hash.  Or sha256sum.  Or any hash for which I can find an algorithm.

Then, when doing a pentest, if I get a password hash dump, I could look up the hashes in my database.  Should I find a matching hash, I can then match it back to the actual plain-text password.

So we’ll see how it goes.  Wish me luck.

BreachCompilation details: