Getting sighax
Presentation by Myria
- We know the format of the decrypted signature, but now we need to find a signature matching that pattern.
- This is brute-forcing RSA, but astronomically easier than it normally would be, thanks to Nintendo fails.
Brute Force
- Back-of-napkin math suggests approximately $2^{43}$ signatures searched per match.
- sighax was said by derrek to take six months per signature, and there are six signature types.
- Screw that.
RSA - A Review
- Signatures are made by powering by secret $d$:
- Verification is done by powering by $e$ (65537):
- If $v = m$, $s$ is a valid signature for $m$.
The Naive Method
- Calculate $m = s^{65537} \mod n$ for random $s$ until you get a useful $m$.
- Very slow.
- 17 multiplies and 17 mods per attempt.
The Negation Trick
- When checking $m$, also check $-m$.
- If $-m$ is a match, then $-s$ is its signature.
- Subtraction is much faster than multiplication.
- This works because:
- $-s=(-1)s$
- ${(ab)}^e=(a^e)(b^e)$ for any $a$, $b$, $e$
- ${(-s)}^{65537}={(-1)}^{65537}s^{65537}$
$=-(s^{65537})=-m$.
- ~1.9x performance boost.
RSA Multiplicative Attack
- If $s_a$ and $s_b$ are signatures for $a$ and $b$, then $s_as_b \pmod n$ is a signature for $ab \pmod n$.
- ${(s_as_b)}^e = {(s_a)}^e{(s_b)}^e = ab$
- Such RSA attacks normally stopped by "padding".
- Getting a validly-padded message this way is extremely unlikely.
- But padding is what Nintendo messed up!
Using Multiplicative Attack
- Select random root $r$ such that $1 < r < n$.
- Calculate multiplier $k = r^{65537} \mod n$.
- Set $x = r$ and $y = k$.
- Do forever:
- $x$ = ($x$ * $r$) $\mod n$;
- $y$ = ($y$ * $k$) $\mod n$;
- if (IsMatch($y$)) return $x$;
- if (IsMatch($n - y$)) return $n - x$;
Doubling Performance Again
We only need $x$ if we found a match, so defer it:
- Select random root $r$ such that $1 < r < n$.
- Calculate multiplier $k = r^{65537} \mod n$.
- Set $y = k$ and $z = 0$.
-
Do forever:
- $z$++;
- $y$ = ($y$ * $k$) $\mod n$;
- if (IsMatch($y$)) return $r^z \mod n$;
- if (IsMatch($n - y$)) return $-r^z \mod n$;
Are We There Yet?
- Final powmod much faster than calculating $x$.
- So now we're down to 1 mul-mod per 2 checks.
- Can we do better?
- Of course!
Shenanigans
- We require any $k$ such that $k = r^{65537} \mod n$ and we know $r$.
- Multiplying by any $k$ whose $r$ is known works.
- (Negation trick: for $k=-1$, $r=-1$.)
- Do we have a $k$ for which multiplication is easier?
- It turns out we do: legitimate signatures.
Fun with PKCS #1
- PKCS #1 signature format:
- 0001FFFFFFFF...FFFF00<garbage>
- Fast multiplication:
- Fact: $(a - b)x = ax - bx$
- 0x0001FFFFFFFF...FFFF00...00 + garbage = 0x000200000000...000 - inv_garbage
- Multiplying by 0x00020...000 is a left shift: fast!
- inv_constant is 408 bits.
- Replace 2048x2048 multiply with a 2048x408 one.
GPUs Are Space Heaters
This brute force is perfect for GPUs.
- Algorithm is same thing repeated massively.
- Few conditionals during main loop.
- No communication needed among operations.
- No need to "cover" the keyspace.
- Entire algorithm fits in GPU registers.
- My cat likes the warmth.
GPU Implementation
- CUDA or OpenCL?
- CUDA tools are better.
- The current best compute cards are Nvidia.
- Amazon AWS and Windows Azure use Nvidia.
- CUDA has add-with-carry asm!
GPU Implementation
- Keeping register count down
- Algorithm is register-bound.
- Use Montgomery reduction GPU algorithm by Kaiyong Zhao:
- Can't use PKCS #1 trick, but GPUs win bigly.
Hardware
- Home Desktops
- 3x GeForce GTX 1080 Ti
- 1x GeForce GTX 1080
- 1x GeForce GTX 980 Ti
Hardware
- The Cloud™
- Amazon AWS "p2.8xlarge" instance
- 8 Tesla K80 GPUs
- 32 Xeon CPU cores (might as well use!)
- K80 is half as fast as 1080, but there are 8!
- ~230M checks per second combined GPU + CPU
- US$7/hour (yikes!)
Results
- All 6 signatures in ~9 days.
- ~8,000,000,000,000 checks per signature.
- So much for 6 months.
- Brute-forcer program is open-source: