Yes, that's 154B - as in Billion. It was done entirely with AMD hardware, and involved 9x6990, 4x6970, 4x5870, 2x5970, and 1x7970 - for a total of 31 GPU cores in 6 physical systems. We had another 11 cards with 15 GPU cores left over - we didn't have systems to put them in (mostly nVidia).
This is a good way to start a day:
After putting all the GPUs we had into the systems we had (one board & one power supply were acting up and were unable to be used), this was the stack left over:
The original plan was to use all the AMD cards, and fill in space with nVidia, but we unfortunately did not have enough room for all the AMD cards. Amusingly, one of my boards wouldn't find the hard drive controllers with 4 dual GPU cards installed.
We also had a few remote systems that were helping out. There were supposed to be a few more, but they didn't pan out, so we were roughly 8 GPUs/12 cores short of where we were hoping to be.
The server was an EC2 m1.small node, since I wanted to test the server at internet-scale latencies, and on a relatively low resource platform. We did not use any EC2 GPU nodes for this test, but may in the future...
After a good bit of troubleshooting, code updates, and pushing binaries around, we finally hit success, as observed above. Also, as noted below.
Please remember, these are NOT single hash speeds - these are on a list of 1000 hashes, over the internet...
There are still a few improvements left to make, including some (surprise) threading issues & mutex issues. But other than a few edge cases, thing worked amazingly well!
Also, there's no reason at all that nVidia cards couldn't have been helping. Even though they're slower, an nVidia GPU is still better than a CPU!
If you want to play with this, it's currently in SVN. I'll be polishing off a few more bugs with the network code and then doing a release before Defcon.
If you have more questions, you can also find me in my talk at Defcon - I will be presenting!