Getting sighax
Presentation by Myria
 We know the format of the decrypted signature, but now we need to find a signature matching that pattern.
 This is bruteforcing RSA, but astronomically easier than it normally would be, thanks to Nintendo fails.
Brute Force
 Backofnapkin math suggests approximately $2^{43}$ signatures searched per match.
 sighax was said by derrek to take six months per signature, and there are six signature types.
 Screw that.
RSA  A Review
 Signatures are made by powering by secret $d$:
 Verification is done by powering by $e$ (65537):
 If $v = m$, $s$ is a valid signature for $m$.
The Naive Method
 Calculate $m = s^{65537} \mod n$ for random $s$ until you get a useful $m$.
 Very slow.
 17 multiplies and 17 mods per attempt.
The Negation Trick
 When checking $m$, also check $m$.
 If $m$ is a match, then $s$ is its signature.
 Subtraction is much faster than multiplication.
 This works because:
 $s=(1)s$
 ${(ab)}^e=(a^e)(b^e)$ for any $a$, $b$, $e$
 ${(s)}^{65537}={(1)}^{65537}s^{65537}$
$=(s^{65537})=m$.
 ~1.9x performance boost.
RSA Multiplicative Attack
 If $s_a$ and $s_b$ are signatures for $a$ and $b$, then $s_as_b \pmod n$ is a signature for $ab \pmod n$.
 ${(s_as_b)}^e = {(s_a)}^e{(s_b)}^e = ab$
 Such RSA attacks normally stopped by "padding".
 Getting a validlypadded message this way is extremely unlikely.
 But padding is what Nintendo messed up!
Using Multiplicative Attack
 Select random root $r$ such that $1 < r < n$.
 Calculate multiplier $k = r^{65537} \mod n$.
 Set $x = r$ and $y = k$.
 Do forever:
 $x$ = ($x$ * $r$) $\mod n$;
 $y$ = ($y$ * $k$) $\mod n$;
 if (IsMatch($y$)) return $x$;
 if (IsMatch($n  y$)) return $n  x$;
Doubling Performance Again
We only need $x$ if we found a match, so defer it:
 Select random root $r$ such that $1 < r < n$.
 Calculate multiplier $k = r^{65537} \mod n$.
 Set $y = k$ and $z = 0$.

Do forever:
 $z$++;
 $y$ = ($y$ * $k$) $\mod n$;
 if (IsMatch($y$)) return $r^z \mod n$;
 if (IsMatch($n  y$)) return $r^z \mod n$;
Are We There Yet?
 Final powmod much faster than calculating $x$.
 So now we're down to 1 mulmod per 2 checks.
 Can we do better?
 Of course!
Shenanigans
 We require any $k$ such that $k = r^{65537} \mod n$ and we know $r$.
 Multiplying by any $k$ whose $r$ is known works.
 (Negation trick: for $k=1$, $r=1$.)
 Do we have a $k$ for which multiplication is easier?
 It turns out we do: legitimate signatures.
Fun with PKCS #1
 PKCS #1 signature format:
 0001FFFFFFFF...FFFF00<garbage>
 Fast multiplication:
 Fact: $(a  b)x = ax  bx$
 0x0001FFFFFFFF...FFFF00...00 + garbage = 0x000200000000...000  inv_garbage
 Multiplying by 0x00020...000 is a left shift: fast!
 inv_constant is 408 bits.
 Replace 2048x2048 multiply with a 2048x408 one.
GPUs Are Space Heaters
This brute force is perfect for GPUs.
 Algorithm is same thing repeated massively.
 Few conditionals during main loop.
 No communication needed among operations.
 No need to "cover" the keyspace.
 Entire algorithm fits in GPU registers.
 My cat likes the warmth.
GPU Implementation
 CUDA or OpenCL?
 CUDA tools are better.
 The current best compute cards are Nvidia.
 Amazon AWS and Windows Azure use Nvidia.
 CUDA has addwithcarry asm!
GPU Implementation
 Keeping register count down
 Algorithm is registerbound.
 Use Montgomery reduction GPU algorithm by Kaiyong Zhao:
 Can't use PKCS #1 trick, but GPUs win bigly.
Hardware
 Home Desktops
 3x GeForce GTX 1080 Ti
 1x GeForce GTX 1080
 1x GeForce GTX 980 Ti
Hardware
 The Cloudâ„˘
 Amazon AWS "p2.8xlarge" instance
 8 Tesla K80 GPUs
 32 Xeon CPU cores (might as well use!)
 K80 is half as fast as 1080, but there are 8!
 ~230M checks per second combined GPU + CPU
 US$7/hour (yikes!)
Results
 All 6 signatures in ~9 days.
 ~8,000,000,000,000 checks per signature.
 So much for 6 months.
 Bruteforcer program is opensource: