Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
duyuefeng0708
GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/04-number-theory-rsa/sage/04b-extended-euclidean-algorithm.ipynb
483 views
unlisted
Kernel: SageMath 10.5

The Extended Euclidean Algorithm

Module 04b | Number Theory and RSA

You can find the gcd. But can you express it as a linear combination of the inputs?

Motivating Question: You know from 04a that gcd(17,60)=1\gcd(17, 60) = 1. But can you find integers s,ts, t such that 17s+60t=117s + 60t = 1? And if you can, why would that be enormously useful?

Here's a hint: if 17s+60t=117s + 60t = 1, then reducing both sides modulo 60 gives 17s1(mod60)17s \equiv 1 \pmod{60}. That means ss is the multiplicative inverse of 17 mod 60. In Module 01b we saw that such inverses exist whenever gcd(a,n)=1\gcd(a, n) = 1, but we never showed how to compute them. This notebook fills that gap.

Objectives

By the end of this notebook you will be able to:

  1. State Bezout's identity and explain why it guarantees that gcd(a,b)\gcd(a,b) is a linear combination of aa and bb

  2. Execute the extended Euclidean algorithm using a table (forward pass + backward substitution)

  3. Use the extended GCD to compute modular inverses

  4. Connect this to RSA: the private key is literally computed by the extended GCD

Prerequisites

  • Completion of 04a: Divisibility and the Euclidean Algorithm, you should be comfortable with the division algorithm and running Euclid's algorithm step by step.

  • Familiarity with modular arithmetic from Module 01, in particular, the concept of multiplicative inverses and units.

Bezout's Identity

Theorem (Bezout's Identity). For any integers a,ba, b (not both zero), there exist integers s,ts, t such that

gcd(a,b)=sa+tb.\gcd(a, b) = s \cdot a + t \cdot b.

The integers ss and tt are called Bezout coefficients.

This is a remarkable existence result: the gcd of aa and bb can always be written as a linear combination of aa and bb with integer coefficients. The extended Euclidean algorithm is a constructive proof, it doesn't just tell you ss and tt exist, it computes them.

Let's see this in action before we learn the algorithm.

# SageMath's xgcd(a, b) returns (gcd, s, t) such that gcd = s*a + t*b a, b = 252, 105 g, s, t = xgcd(a, b) print(f'gcd({a}, {b}) = {g}') print(f'Bezout coefficients: s = {s}, t = {t}') print(f'Verification: ({s})*{a} + ({t})*{b} = {s*a + t*b}')
# Let's try several pairs to build intuition pairs = [(48, 18), (35, 15), (17, 60), (100, 37), (56, 21)] for a, b in pairs: g, s, t = xgcd(a, b) print(f'gcd({a}, {b}) = {g} = ({s})*{a} + ({t})*{b} = {s*a + t*b}')

Checkpoint: Look at the row for (17,60)(17, 60). You should see gcd(17,60)=1\gcd(17, 60) = 1 with some coefficients s,ts, t. Before moving on, verify by hand that s17+t60=1s \cdot 17 + t \cdot 60 = 1.

Now reduce both sides mod 60. What do you get? What does ss represent in modular arithmetic?

The Algorithm: Forward Pass (Review)

The extended Euclidean algorithm has two phases. The first phase is just the regular Euclidean algorithm you learned in 04a, we compute a sequence of divisions:

a=q1b+r1a = q_1 \cdot b + r_1b=q2r1+r2b = q_2 \cdot r_1 + r_2r1=q3r2+r3r_1 = q_3 \cdot r_2 + r_3\vdots

until we reach a remainder of 0. The last non-zero remainder is gcd(a,b)\gcd(a, b).

Let's trace through gcd(252,105)\gcd(252, 105) as a refresher.

# Forward pass: regular Euclidean algorithm, saving all steps a, b = 252, 105 steps = [] while b != 0: q, r = a // b, a % b steps.append((a, q, b, r)) print(f'{a} = {q} * {b} + {r}') a, b = b, r print(f'\ngcd = {a}')

The Algorithm: Backward Substitution

The key insight: every remainder rir_i in the forward pass can be rewritten as ri=aprevqibprevr_i = a_{\text{prev}} - q_i \cdot b_{\text{prev}}. Since apreva_{\text{prev}} and bprevb_{\text{prev}} are themselves remainders from earlier steps, we can substitute backwards until we express gcd(a,b)\gcd(a, b) as a combination of the original aa and bb.

Worked example: gcd(252,105)\gcd(252, 105)

Forward pass gave us:

  1. 252=2105+42252 = 2 \cdot 105 + 42

  2. 105=242+21105 = 2 \cdot 42 + 21

  3. 42=221+042 = 2 \cdot 21 + 0

The gcd is 2121, appearing in step 2. Now substitute backwards:

From step 2: 21=10524221 = 105 - 2 \cdot 42

From step 1: 42=252210542 = 252 - 2 \cdot 105

Substitute: 21=1052(2522105)=1052252+4105=(2)252+510521 = 105 - 2 \cdot (252 - 2 \cdot 105) = 105 - 2 \cdot 252 + 4 \cdot 105 = (-2) \cdot 252 + 5 \cdot 105

So s=2s = -2, t=5t = 5, and indeed (2)252+5105=504+525=21(-2) \cdot 252 + 5 \cdot 105 = -504 + 525 = 21.

# Verify the backward substitution s, t = -2, 5 print(f'({s}) * 252 + ({t}) * 105 = {s * 252 + t * 105}') # Cross-check with SageMath g, s_sage, t_sage = xgcd(252, 105) print(f'SageMath gives: s = {s_sage}, t = {t_sage}') print(f'Note: Bezout coefficients are NOT unique (different valid pairs exist).')

Common mistake: "The extended GCD is just the GCD plus extra bookkeeping." The insight is deeper than that. The extended GCD proves constructively that gcd(a,b)\gcd(a,b) is always a linear combination of aa and bb. This existence result is what makes modular inverses computable, and it's not obvious at all that such a representation should exist.

The Table Method

Backward substitution works but is error-prone for large examples. The table method computes the Bezout coefficients alongside the forward pass, avoiding back-substitution entirely.

We maintain a table with columns qiq_i, rir_i, sis_i, tit_i where each row satisfies the invariant:

ri=sia+tibr_i = s_i \cdot a + t_i \cdot b

Initialization (two special rows before the algorithm starts):

StepqqrrssttInvariant check
0,aa10a=1a+0ba = 1 \cdot a + 0 \cdot b
1,bb01b=0a+1bb = 0 \cdot a + 1 \cdot b

Recurrence: At each step i2i \geq 2, compute qi=ri2/ri1q_i = \lfloor r_{i-2} / r_{i-1} \rfloor and then:

ri=ri2qiri1r_i = r_{i-2} - q_i \cdot r_{i-1}si=si2qisi1s_i = s_{i-2} - q_i \cdot s_{i-1}ti=ti2qiti1t_i = t_{i-2} - q_i \cdot t_{i-1}

When ri=0r_i = 0, the previous row gives gcd=ri1\gcd = r_{i-1} with coefficients si1s_{i-1}, ti1t_{i-1}.

Why does this work?

The invariant ri=sia+tibr_i = s_i \cdot a + t_i \cdot b holds at every step. Proof by induction:

  • Base cases: Row 0: r0=a=1a+0br_0 = a = 1 \cdot a + 0 \cdot b. Row 1: r1=b=0a+1br_1 = b = 0 \cdot a + 1 \cdot b.

  • Inductive step: If ri2=si2a+ti2br_{i-2} = s_{i-2} \cdot a + t_{i-2} \cdot b and ri1=si1a+ti1br_{i-1} = s_{i-1} \cdot a + t_{i-1} \cdot b, then

ri=ri2qiri1=(si2qisi1)a+(ti2qiti1)b=sia+tib.r_i = r_{i-2} - q_i \cdot r_{i-1} = (s_{i-2} - q_i s_{i-1}) \cdot a + (t_{i-2} - q_i t_{i-1}) \cdot b = s_i \cdot a + t_i \cdot b. \quad \checkmark

So the ss and tt columns are built by the exact same recurrence as the rr column, just with different starting values. That's the entire trick.

# Extended GCD: table method with full trace def extended_gcd_table(a, b): """Compute gcd(a, b) and Bezout coefficients using the table method. Prints the full table and verifies the invariant at each step.""" # Header print(f'Extended GCD of {a} and {b}') print('Step q r s t Check: s*a + t*b') # Initialize: row 0 and row 1 r_prev, r_curr = a, b s_prev, s_curr = 1, 0 t_prev, t_curr = 0, 1 # Print initial rows print(f'{0} -- {r_prev} {s_prev} {t_prev} {s_prev*a + t_prev*b}') print(f'{1} -- {r_curr} {s_curr} {t_curr} {s_curr*a + t_curr*b}') step = 2 while r_curr != 0: q = r_prev // r_curr r_new = r_prev - q * r_curr s_new = s_prev - q * s_curr t_new = t_prev - q * t_curr check = s_new * a + t_new * b print(f'{step} {q} {r_new} {s_new} {t_new} {check}') assert check == r_new, 'Invariant violated!' r_prev, r_curr = r_curr, r_new s_prev, s_curr = s_curr, s_new t_prev, t_curr = t_curr, t_new step += 1 print(f'Result: gcd({a}, {b}) = {r_prev}') print(f'Bezout: ({s_prev}) * {a} + ({t_prev}) * {b} = {s_prev*a + t_prev*b}') return r_prev, s_prev, t_prev g, s, t = extended_gcd_table(252, 105)

Checkpoint: Look at the table above. For each row, verify mentally that the "Check" column equals rr. This is the invariant ri=si252+ti105r_i = s_i \cdot 252 + t_i \cdot 105 in action.

Now, before running the next cell, try filling in the table for gcd(17,60)\gcd(17, 60) by hand:

Stepqqrrsstt
0,1710
1,6001
2????
3????

Wait, 17<6017 < 60, so the first quotient is 0 and the algorithm swaps them. Keep going from there.

# Check your hand computation! g, s, t = extended_gcd_table(17, 60)

From Extended GCD to Modular Inverses

This is where everything connects. Suppose gcd(a,n)=1\gcd(a, n) = 1. Then the extended GCD gives us:

sa+tn=1s \cdot a + t \cdot n = 1

Reduce both sides modulo nn:

sa1(modn)s \cdot a \equiv 1 \pmod{n}

So smodns \bmod n is the multiplicative inverse of aa modulo nn.

That's it. No trial and error. No searching. The extended GCD directly computes the modular inverse.

Back in Module 01b, we established that aa is a unit in Z/nZ\mathbb{Z}/n\mathbb{Z} if and only if gcd(a,n)=1\gcd(a, n) = 1. Now we finally have the algorithm that computes the inverse: run the extended GCD and read off ss.

# Concrete example: find 17^(-1) mod 60 a, n = 17, 60 g, s, t = xgcd(a, n) print(f'xgcd({a}, {n}) = ({g}, {s}, {t})') print(f'So: ({s}) * {a} + ({t}) * {n} = {s*a + t*n}') print() inv = s % n # reduce s modulo n to get the canonical representative print(f'{a}^(-1) mod {n} = {inv}') print(f'Verification: {a} * {inv} = {a * inv} = {a * inv // n} * {n} + {a * inv % n}') print(f'So {a} * {inv}{a * inv % n} (mod {n})')

So 17153(mod60)17^{-1} \equiv 53 \pmod{60}, because 17×53=901=15×60+117 \times 53 = 901 = 15 \times 60 + 1.

SageMath also provides inverse_mod(a, n) as a convenience function.

# SageMath's convenience function for modular inverse print(f'inverse_mod(17, 60) = {inverse_mod(17, 60)}') print(f'inverse_mod(3, 26) = {inverse_mod(3, 26)}') print(f'inverse_mod(7, 11) = {inverse_mod(7, 11)}') print() # What happens when no inverse exists? try: inverse_mod(6, 15) except ZeroDivisionError as e: print(f'inverse_mod(6, 15) raises an error: {e}') print(f'Because gcd(6, 15) = {gcd(6, 15)} ≠ 1')

Common mistake: "I can just try all values from 1 to n1n-1 until I find the inverse." For small nn, sure. But RSA uses nn with 2048+ bits (617\approx 617 digits). The extended GCD runs in O(log2n)O(\log^2 n), it's practically instant even for cryptographic sizes. Brute force would take longer than the age of the universe.

Crypto Connection: RSA Private Keys

Foreshadowing: In RSA, you choose two large primes p,qp, q, set n=pqn = p \cdot q, and pick a public exponent ee with gcd(e,φ(n))=1\gcd(e, \varphi(n)) = 1. The private key is

d=e1modφ(n)d = e^{-1} \bmod \varphi(n)

The extended Euclidean algorithm is literally the algorithm that generates RSA private keys. When you generate an RSA key pair, under the hood, the software runs xgcd(e, phi_n) and extracts the Bezout coefficient.

We'll see this in full detail in 04e. For now, let's see a tiny example.

# Toy RSA key generation, the extended GCD in action p, q = 61, 53 n = p * q phi_n = (p - 1) * (q - 1) e = 17 # public exponent (must satisfy gcd(e, phi_n) = 1) print(f'p = {p}, q = {q}') print(f'n = p*q = {n}') print(f'phi(n) = (p-1)(q-1) = {phi_n}') print(f'e = {e}, gcd(e, phi(n)) = {gcd(e, phi_n)}') print() # Compute private key d = e^(-1) mod phi(n) using extended GCD g, d, _ = xgcd(e, phi_n) d = d % phi_n print(f'Private key d = e^(-1) mod phi(n) = {d}') print(f'Verify: e * d mod phi(n) = {e * d} mod {phi_n} = {(e * d) % phi_n}') print() # Quick encryption/decryption test m = 42 # plaintext c = power_mod(m, e, n) # encrypt: c = m^e mod n m_dec = power_mod(c, d, n) # decrypt: m = c^d mod n print(f'Plaintext: {m}') print(f'Ciphertext: {c}') print(f'Decrypted: {m_dec}')

Exercises

Exercise 1 (Worked)

Use the extended Euclidean algorithm (table method) to find gcd(161,28)\gcd(161, 28) and express it as a linear combination s161+t28s \cdot 161 + t \cdot 28.

# Exercise 1: Worked solution # Step-by-step extended GCD table for gcd(161, 28) g, s, t = extended_gcd_table(161, 28) print(f'\nFinal answer: gcd(161, 28) = {g}') print(f'Bezout coefficients: s = {s}, t = {t}') print(f'Check: ({s}) * 161 + ({t}) * 28 = {s * 161 + t * 28}')

Since gcd(161,28)=7\gcd(161, 28) = 7 (not 1), 161 and 28 are not coprime, and neither has an inverse modulo the other. But the Bezout equation s161+t28=7s \cdot 161 + t \cdot 28 = 7 still holds, the extended GCD always works, whether or not the gcd is 1.

Exercise 2 (Guided)

Find 231mod8923^{-1} \bmod 89 using the extended GCD. Then verify your answer.

Hint: 8989 is prime, so gcd(23,89)=1\gcd(23, 89) = 1 is guaranteed.

# Exercise 2: Guided a, n = 23, 89 # Step 1: Run the extended GCD # TODO: use xgcd(a, n) to find g, s, t # g, s, t = ... # Step 2: Extract the modular inverse # TODO: compute inv = s % n (reduce to canonical representative in {0, ..., n-1}) # inv = ... # Step 3: Verify # TODO: check that a * inv ≡ 1 (mod n) # print(f'{a} * {inv} mod {n} = {(a * inv) % n}') # Step 4: Cross-check with SageMath's inverse_mod # TODO: print(f'inverse_mod({a}, {n}) = {inverse_mod(a, n)}')

Exercise 3 (Independent)

In a toy RSA setup, you're given:

  • p=47p = 47, q=71q = 71

  • Public exponent e=79e = 79

Tasks:

  1. Compute n=pqn = p \cdot q and φ(n)=(p1)(q1)\varphi(n) = (p-1)(q-1)

  2. Verify that gcd(e,φ(n))=1\gcd(e, \varphi(n)) = 1 (so ee is a valid public exponent)

  3. Compute the private key d=e1modφ(n)d = e^{-1} \bmod \varphi(n) using the extended GCD

  4. Encrypt the message m=100m = 100 as c=memodnc = m^e \bmod n

  5. Decrypt cc as m=cdmodnm' = c^d \bmod n and verify m=mm' = m

# Exercise 3: Your code here

Summary

ConceptKey idea
Bezout's identityFor any integers a,ba, b, there exist s,ts, t with gcd(a,b)=sa+tb\gcd(a,b) = s \cdot a + t \cdot b
Backward substitutionExpress the gcd as a linear combination by substituting remainders back through the Euclidean algorithm steps
The table methodComputes Bezout coefficients alongside the gcd using the same recurrence (ri=ri2qiri1r_i = r_{i-2} - q_i r_{i-1}) applied to all three columns (r,s,t)(r, s, t)
Modular inversesIf gcd(a,n)=1\gcd(a, n) = 1, then sa+tn=1s \cdot a + t \cdot n = 1 implies sa1(modn)s \equiv a^{-1} \pmod{n}
RSA connectionThe private key d=e1modφ(n)d = e^{-1} \bmod \varphi(n) is computed by running the extended GCD on ee and φ(n)\varphi(n)

Crypto preview: We now know how to compute modular inverses. In the next notebook, we'll learn about Euler's totient function φ(n)\varphi(n) and Fermat's little theorem, the why behind RSA's correctness: why does medm(modn)m^{ed} \equiv m \pmod{n}?

Next: Euler's Totient and Fermat's Theorem