Sunday, July 15, 2012

154 Billion NTLM/sec on 10 hashes

It's a good day when you see the following on 10 hashes:

Yes, that's 154B - as in Billion.  It was done entirely with AMD hardware, and involved 9x6990, 4x6970, 4x5870, 2x5970, and 1x7970 - for a total of 31 GPU cores in 6 physical systems.  We had another 11 cards with 15 GPU cores left over - we didn't have systems to put them in (mostly nVidia).

For more details, read on...

This morning, @jmgosney and I met up to work on the networking code for the new Multiforcer framework, and do some serious stress testing.  I've been working on the networking code recently, and it needed some serious testing.  I can do some testing in my development environment, but it usually takes going big to expose some types of bugs (which I most certainly did find).

This is a good way to start a day:

After putting all the GPUs we had into the systems we had (one board & one power supply were acting up and were unable to be used), this was the stack left over:

The original plan was to use all the AMD cards, and fill in space with nVidia, but we unfortunately did not have enough room for all the AMD cards.  Amusingly, one of my boards wouldn't find the hard drive controllers with 4 dual GPU cards installed.

We also had a few remote systems that were helping out.  There were supposed to be a few more, but they didn't pan out, so we were roughly 8 GPUs/12 cores short of where we were hoping to be.

The server was an EC2 m1.small node, since I wanted to test the server at internet-scale latencies, and on a relatively low resource platform.  We did not use any EC2 GPU nodes for this test, but may in the future...

After a good bit of troubleshooting, code updates, and pushing binaries around, we finally hit success, as observed above.  Also, as noted below.

Please remember, these are NOT single hash speeds - these are on a list of 1000 hashes, over the internet...

There are still a few improvements left to make, including some (surprise) threading issues & mutex issues.  But other than a few edge cases, thing worked amazingly well!

Also, there's no reason at all that nVidia cards couldn't have been helping.  Even though they're slower, an nVidia GPU is still better than a CPU!

If you want to play with this, it's currently in SVN.  I'll be polishing off a few more bugs with the network code and then doing a release before Defcon.

If you have more questions, you can also find me in my talk at Defcon - I will be presenting!


  1. create repo on github? i am eager to see the code!

  2. The code is in SVN at Sourceforge.

  3. Awesome. Can you please post some pics of the equipment and additional hardware details? I'm one of the many not lucky enough to go to defcon this year.

    1. Mike - I posted some pictures here: