Getting sighax

Presentation by Myria

  • We know the format of the decrypted signature, but now we need to find a signature matching that pattern.
  • This is brute-forcing RSA, but astronomically easier than it normally would be, thanks to Nintendo fails.

Brute Force

  • Back-of-napkin math suggests approximately $2^{43}$ signatures searched per match.
  • sighax was said by derrek to take six months per signature, and there are six signature types.
  • Screw that.

RSA - A Review

  • Signatures are made by powering by secret $d$:
    • $s = m^d \mod n$
  • Verification is done by powering by $e$ (65537):
    • $v = s^e \mod n$
  • If $v = m$, $s$ is a valid signature for $m$.

The Naive Method

  • Calculate $m = s^{65537} \mod n$ for random $s$ until you get a useful $m$.
  • Very slow.
    • 17 multiplies and 17 mods per attempt.

The Negation Trick

  • When checking $m$, also check $-m$.
  • If $-m$ is a match, then $-s$ is its signature.
  • Subtraction is much faster than multiplication.
  • This works because:
    • $-s=(-1)s$
    • ${(ab)}^e=(a^e)(b^e)$ for any $a$, $b$, $e$
    • ${(-s)}^{65537}={(-1)}^{65537}s^{65537}$
      $=-(s^{65537})=-m$.
  • ~1.9x performance boost.

RSA Multiplicative Attack

  • If $s_a$ and $s_b$ are signatures for $a$ and $b$, then $s_as_b \pmod n$ is a signature for $ab \pmod n$.
    • ${(s_as_b)}^e = {(s_a)}^e{(s_b)}^e = ab$
  • Such RSA attacks normally stopped by "padding".
    • Getting a validly-padded message this way is extremely unlikely.
    • But padding is what Nintendo messed up!

Using Multiplicative Attack

  1. Select random root $r$ such that $1 < r < n$.
  2. Calculate multiplier $k = r^{65537} \mod n$.
  3. Set $x = r$ and $y = k$.
  4. Do forever:
    1. $x$ = ($x$ * $r$) $\mod n$;
    2. $y$ = ($y$ * $k$) $\mod n$;
    3. if (IsMatch($y$)) return $x$;
    4. if (IsMatch($n - y$)) return $n - x$;

Doubling Performance Again

We only need $x$ if we found a match, so defer it:

  1. Select random root $r$ such that $1 < r < n$.
  2. Calculate multiplier $k = r^{65537} \mod n$.
  3. Set $y = k$ and $z = 0$.
  4. Do forever:
    1. $z$++;
    2. $y$ = ($y$ * $k$) $\mod n$;
    3. if (IsMatch($y$)) return $r^z \mod n$;
    4. if (IsMatch($n - y$)) return $-r^z \mod n$;

Are We There Yet?

  • Final powmod much faster than calculating $x$.
  • So now we're down to 1 mul-mod per 2 checks.
  • Can we do better?
  • Of course!

Shenanigans

  • We require any $k$ such that $k = r^{65537} \mod n$ and we know $r$.
    • Multiplying by any $k$ whose $r$ is known works.
    • (Negation trick: for $k=-1$, $r=-1$.)
  • Do we have a $k$ for which multiplication is easier?
  • It turns out we do: legitimate signatures.

Fun with PKCS #1

  • PKCS #1 signature format:
    • 0001FFFFFFFF...FFFF00<garbage>
  • Fast multiplication:
    • Fact: $(a - b)x = ax - bx$
    • 0x0001FFFFFFFF...FFFF00...00 + garbage = 0x000200000000...000 - inv_garbage
    • Multiplying by 0x00020...000 is a left shift: fast!
    • inv_constant is 408 bits.
    • Replace 2048x2048 multiply with a 2048x408 one.

GPUs Are Space Heaters

This brute force is perfect for GPUs.

  • Algorithm is same thing repeated massively.
  • Few conditionals during main loop.
  • No communication needed among operations.
  • No need to "cover" the keyspace.
  • Entire algorithm fits in GPU registers.
  • My cat likes the warmth.

GPU Implementation

  • CUDA or OpenCL?
    • CUDA tools are better.
    • The current best compute cards are Nvidia.
    • Amazon AWS and Windows Azure use Nvidia.
    • CUDA has add-with-carry asm!

GPU Implementation

Hardware

  • Home Desktops
    • 3x GeForce GTX 1080 Ti
    • 1x GeForce GTX 1080
    • 1x GeForce GTX 980 Ti

Hardware

  • The Cloud™
    • Amazon AWS "p2.8xlarge" instance
    • 8 Tesla K80 GPUs
    • 32 Xeon CPU cores (might as well use!)
    • K80 is half as fast as 1080, but there are 8!
    • ~230M checks per second combined GPU + CPU
    • US$7/hour (yikes!)

Results

vitcory shibe doge

@SciresM, @Myriachan, Normmatt, @TuxSH, @Hedgeberg