Wow, there have been a lot of breaches. I’ve been processing data from the Breach Compilation database. As I’ve been doing that, I have also been finding, researching, and downloading others. 320 Million here, 100 Million there. So far, I have 56 databases to work with. Who knows how many hundreds of millions of records that is going to be. Then, I have to pull out just the unique passwords. That’s really what I’m interested in. I have zero interest in usernames, email addresses, or really anything else associated with each password. For the time being, I’m just interested in hashing all the unique passwords I can find that have been dumped. After that, I’ll generate all the hashes I can think of for weak password audits and hash lookups.
As one example of a huge database, Troy Hunt from Have I Been Pwned has recently released a collection of just about half a billion sha1-hashed passwords. He previously released one containing over 300 Million passwords, also hashed with sha1. Many databases available right now have hashed passwords in them. But for a quick and easy way to get plain-text versions of some of them, you could take a look at what hashes.org has to offer. They have lists of plain-text versions of some of the hashed passwords in many of these databases.
But as for me, I’m finally finished building my database server. It’s a Dell PowerEdge r710 with 96GB RAM, 6 x 500GB Mushkin Reactor SSDs in a RAID 10, and 16 x Intel(R) Xeon(R) X5560 @ 2.80GHz cores. My other database server has created NTLM hashes for 382M of the 400M passwords that I currently have. Once it’s finished, I’ll move everything over to the new server, and things should go much more quickly from there. For the moment, though, I’m prepping the other 50+ data dumps to be imported into the new server.