GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/04-number-theory-rsa/sage/04b-extended-euclidean-algorithm.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.5

The Extended Euclidean Algorithm

Module 04b | Number Theory and RSA

You can find the gcd. But can you express it as a linear combination of the inputs?

Motivating Question: You know from 04a that $\gcd(17, 60) = 1$ . But can you find integers $s, t$ such that $17s + 60t = 1$ ? And if you can, why would that be enormously useful?
Here's a hint: if $17s + 60t = 1$ , then reducing both sides modulo 60 gives $17s \equiv 1 \pmod{60}$ . That means $s$ is the multiplicative inverse of 17 mod 60. In Module 01b we saw that such inverses exist whenever $\gcd(a, n) = 1$ , but we never showed how to compute them. This notebook fills that gap.

Objectives

By the end of this notebook you will be able to:

State Bezout's identity and explain why it guarantees that $\gcd(a,b)$ is a linear combination of $a$ and $b$
Execute the extended Euclidean algorithm using a table (forward pass + backward substitution)
Use the extended GCD to compute modular inverses
Connect this to RSA: the private key is literally computed by the extended GCD

Prerequisites

Completion of 04a: Divisibility and the Euclidean Algorithm, you should be comfortable with the division algorithm and running Euclid's algorithm step by step.
Familiarity with modular arithmetic from Module 01, in particular, the concept of multiplicative inverses and units.

Bezout's Identity

Theorem (Bezout's Identity). For any integers $a, b$ (not both zero), there exist integers $s, t$ such that

\gcd(a, b) = s \cdot a + t \cdot b.

The integers $s$ and $t$ are called Bezout coefficients.

This is a remarkable existence result: the gcd of $a$ and $b$ can always be written as a linear combination of $a$ and $b$ with integer coefficients. The extended Euclidean algorithm is a constructive proof, it doesn't just tell you $s$ and $t$ exist, it computes them.

Let's see this in action before we learn the algorithm.

In [ ]:

# SageMath's xgcd(a, b) returns (gcd, s, t) such that gcd = s*a + t*b
a, b = 252, 105
g, s, t = xgcd(a, b)
print(f'gcd({a}, {b}) = {g}')
print(f'Bezout coefficients: s = {s}, t = {t}')
print(f'Verification: ({s})*{a} + ({t})*{b} = {s*a + t*b}')

In [ ]:

# Let's try several pairs to build intuition
pairs = [(48, 18), (35, 15), (17, 60), (100, 37), (56, 21)]
for a, b in pairs:
    g, s, t = xgcd(a, b)
    print(f'gcd({a}, {b}) = {g}  =  ({s})*{a} + ({t})*{b}  =  {s*a + t*b}')

Checkpoint: Look at the row for $(17, 60)$ . You should see $\gcd(17, 60) = 1$ with some coefficients $s, t$ . Before moving on, verify by hand that $s \cdot 17 + t \cdot 60 = 1$ .
Now reduce both sides mod 60. What do you get? What does $s$ represent in modular arithmetic?

The Algorithm: Forward Pass (Review)

The extended Euclidean algorithm has two phases. The first phase is just the regular Euclidean algorithm you learned in 04a, we compute a sequence of divisions:

a = q_1 \cdot b + r_1

b = q_2 \cdot r_1 + r_2

r_1 = q_3 \cdot r_2 + r_3

\vdots

until we reach a remainder of 0. The last non-zero remainder is $\gcd(a, b)$ .

Let's trace through $\gcd(252, 105)$ as a refresher.

In [ ]:

# Forward pass: regular Euclidean algorithm, saving all steps
a, b = 252, 105
steps = []
while b != 0:
    q, r = a // b, a % b
    steps.append((a, q, b, r))
    print(f'{a} = {q} * {b} + {r}')
    a, b = b, r
print(f'\ngcd = {a}')

The Algorithm: Backward Substitution

The key insight: every remainder $r_i$ in the forward pass can be rewritten as $r_i = a_{\text{prev}} - q_i \cdot b_{\text{prev}}$ . Since $a_{\text{prev}}$ and $b_{\text{prev}}$ are themselves remainders from earlier steps, we can substitute backwards until we express $\gcd(a, b)$ as a combination of the original $a$ and $b$ .

Worked example: $\gcd(252, 105)$

Forward pass gave us:

$252 = 2 \cdot 105 + 42$
$105 = 2 \cdot 42 + 21$
$42 = 2 \cdot 21 + 0$

The gcd is $21$ , appearing in step 2. Now substitute backwards:

From step 2: $21 = 105 - 2 \cdot 42$

From step 1: $42 = 252 - 2 \cdot 105$

Substitute: $21 = 105 - 2 \cdot (252 - 2 \cdot 105) = 105 - 2 \cdot 252 + 4 \cdot 105 = (-2) \cdot 252 + 5 \cdot 105$

So $s = -2$ , $t = 5$ , and indeed $(-2) \cdot 252 + 5 \cdot 105 = -504 + 525 = 21$ .

In [ ]:

# Verify the backward substitution
s, t = -2, 5
print(f'({s}) * 252 + ({t}) * 105 = {s * 252 + t * 105}')

# Cross-check with SageMath
g, s_sage, t_sage = xgcd(252, 105)
print(f'SageMath gives: s = {s_sage}, t = {t_sage}')
print(f'Note: Bezout coefficients are NOT unique (different valid pairs exist).')

Common mistake: "The extended GCD is just the GCD plus extra bookkeeping." The insight is deeper than that. The extended GCD proves constructively that $\gcd(a,b)$ is always a linear combination of $a$ and $b$ . This existence result is what makes modular inverses computable, and it's not obvious at all that such a representation should exist.

The Table Method

Backward substitution works but is error-prone for large examples. The table method computes the Bezout coefficients alongside the forward pass, avoiding back-substitution entirely.

We maintain a table with columns $q_i$ , $r_i$ , $s_i$ , $t_i$ where each row satisfies the invariant:

r_i = s_i \cdot a + t_i \cdot b

Initialization (two special rows before the algorithm starts):

Step	$q$	$r$	$s$	$t$	Invariant check
0	,	$a$	1	0	$a = 1 \cdot a + 0 \cdot b$
1	,	$b$	0	1	$b = 0 \cdot a + 1 \cdot b$

Recurrence: At each step $i \geq 2$ , compute $q_i = \lfloor r_{i-2} / r_{i-1} \rfloor$ and then:

r_i = r_{i-2} - q_i \cdot r_{i-1}

s_i = s_{i-2} - q_i \cdot s_{i-1}

t_i = t_{i-2} - q_i \cdot t_{i-1}

When $r_i = 0$ , the previous row gives $\gcd = r_{i-1}$ with coefficients $s_{i-1}$ , $t_{i-1}$ .

Why does this work?

The invariant $r_i = s_i \cdot a + t_i \cdot b$ holds at every step. Proof by induction:

Base cases: Row 0: $r_0 = a = 1 \cdot a + 0 \cdot b$ . Row 1: $r_1 = b = 0 \cdot a + 1 \cdot b$ .
Inductive step: If $r_{i-2} = s_{i-2} \cdot a + t_{i-2} \cdot b$ and $r_{i-1} = s_{i-1} \cdot a + t_{i-1} \cdot b$ , then

r_i = r_{i-2} - q_i \cdot r_{i-1} = (s_{i-2} - q_i s_{i-1}) \cdot a + (t_{i-2} - q_i t_{i-1}) \cdot b = s_i \cdot a + t_i \cdot b. \quad \checkmark

So the $s$ and $t$ columns are built by the exact same recurrence as the $r$ column, just with different starting values. That's the entire trick.

In [ ]:

# Extended GCD: table method with full trace
def extended_gcd_table(a, b):
    """Compute gcd(a, b) and Bezout coefficients using the table method.
    Prints the full table and verifies the invariant at each step."""
    # Header
    print(f'Extended GCD of {a} and {b}')
    print('Step  q  r  s  t  Check: s*a + t*b')    
    # Initialize: row 0 and row 1
    r_prev, r_curr = a, b
    s_prev, s_curr = 1, 0
    t_prev, t_curr = 0, 1
    
    # Print initial rows
    print(f'{0}  --  {r_prev}  {s_prev}  {t_prev}  {s_prev*a + t_prev*b}')
    print(f'{1}  --  {r_curr}  {s_curr}  {t_curr}  {s_curr*a + t_curr*b}')
    
    step = 2
    while r_curr != 0:
        q = r_prev // r_curr
        r_new = r_prev - q * r_curr
        s_new = s_prev - q * s_curr
        t_new = t_prev - q * t_curr
        
        check = s_new * a + t_new * b
        print(f'{step}  {q}  {r_new}  {s_new}  {t_new}  {check}')
        assert check == r_new, 'Invariant violated!'
        
        r_prev, r_curr = r_curr, r_new
        s_prev, s_curr = s_curr, s_new
        t_prev, t_curr = t_curr, t_new
        step += 1
    
    print(f'Result: gcd({a}, {b}) = {r_prev}')
    print(f'Bezout: ({s_prev}) * {a} + ({t_prev}) * {b} = {s_prev*a + t_prev*b}')
    return r_prev, s_prev, t_prev

g, s, t = extended_gcd_table(252, 105)

Checkpoint: Look at the table above. For each row, verify mentally that the "Check" column equals $r$ . This is the invariant $r_i = s_i \cdot 252 + t_i \cdot 105$ in action.
Now, before running the next cell, try filling in the table for $\gcd(17, 60)$ by hand:
Step $q$ $r$ $s$ $t$
0 , 17 1 0
1 , 60 0 1
2 ? ? ? ?
3 ? ? ? ?
Wait, $17 < 60$ , so the first quotient is 0 and the algorithm swaps them. Keep going from there.

Step	$q$	$r$	$s$	$t$
0	,	17	1	0
1	,	60	0	1
2	?	?	?	?
3	?	?	?	?

In [ ]:

# Check your hand computation!
g, s, t = extended_gcd_table(17, 60)

From Extended GCD to Modular Inverses

This is where everything connects. Suppose $\gcd(a, n) = 1$ . Then the extended GCD gives us:

s \cdot a + t \cdot n = 1

Reduce both sides modulo $n$ :

s \cdot a \equiv 1 \pmod{n}

So $s \bmod n$ is the multiplicative inverse of $a$ modulo $n$ .

That's it. No trial and error. No searching. The extended GCD directly computes the modular inverse.

Back in Module 01b, we established that $a$ is a unit in $\mathbb{Z}/n\mathbb{Z}$ if and only if $\gcd(a, n) = 1$ . Now we finally have the algorithm that computes the inverse: run the extended GCD and read off $s$ .

In [ ]:

# Concrete example: find 17^(-1) mod 60
a, n = 17, 60
g, s, t = xgcd(a, n)
print(f'xgcd({a}, {n}) = ({g}, {s}, {t})')
print(f'So: ({s}) * {a} + ({t}) * {n} = {s*a + t*n}')
print()

inv = s % n  # reduce s modulo n to get the canonical representative
print(f'{a}^(-1) mod {n} = {inv}')
print(f'Verification: {a} * {inv} = {a * inv} = {a * inv // n} * {n} + {a * inv % n}')
print(f'So {a} * {inv} ≡ {a * inv % n} (mod {n})')

So $17^{-1} \equiv 53 \pmod{60}$ , because $17 \times 53 = 901 = 15 \times 60 + 1$ .

SageMath also provides inverse_mod(a, n) as a convenience function.

In [ ]:

# SageMath's convenience function for modular inverse
print(f'inverse_mod(17, 60) = {inverse_mod(17, 60)}')
print(f'inverse_mod(3, 26)  = {inverse_mod(3, 26)}')
print(f'inverse_mod(7, 11)  = {inverse_mod(7, 11)}')
print()

# What happens when no inverse exists?
try:
    inverse_mod(6, 15)
except ZeroDivisionError as e:
    print(f'inverse_mod(6, 15) raises an error: {e}')
    print(f'Because gcd(6, 15) = {gcd(6, 15)} ≠ 1')

Common mistake: "I can just try all values from 1 to $n-1$ until I find the inverse." For small $n$ , sure. But RSA uses $n$ with 2048+ bits ( $\approx 617$ digits). The extended GCD runs in $O(\log^2 n)$ , it's practically instant even for cryptographic sizes. Brute force would take longer than the age of the universe.

Crypto Connection: RSA Private Keys

Foreshadowing: In RSA, you choose two large primes $p, q$ , set $n = p \cdot q$ , and pick a public exponent $e$ with $\gcd(e, \varphi(n)) = 1$ . The private key is
$d = e^{-1} \bmod \varphi(n)$
The extended Euclidean algorithm is literally the algorithm that generates RSA private keys. When you generate an RSA key pair, under the hood, the software runs xgcd(e, phi_n) and extracts the Bezout coefficient.
We'll see this in full detail in 04e. For now, let's see a tiny example.

In [ ]:

# Toy RSA key generation, the extended GCD in action
p, q = 61, 53
n = p * q
phi_n = (p - 1) * (q - 1)
e = 17  # public exponent (must satisfy gcd(e, phi_n) = 1)

print(f'p = {p}, q = {q}')
print(f'n = p*q = {n}')
print(f'phi(n) = (p-1)(q-1) = {phi_n}')
print(f'e = {e}, gcd(e, phi(n)) = {gcd(e, phi_n)}')
print()

# Compute private key d = e^(-1) mod phi(n) using extended GCD
g, d, _ = xgcd(e, phi_n)
d = d % phi_n
print(f'Private key d = e^(-1) mod phi(n) = {d}')
print(f'Verify: e * d mod phi(n) = {e * d} mod {phi_n} = {(e * d) % phi_n}')
print()

# Quick encryption/decryption test
m = 42  # plaintext
c = power_mod(m, e, n)  # encrypt: c = m^e mod n
m_dec = power_mod(c, d, n)  # decrypt: m = c^d mod n
print(f'Plaintext:  {m}')
print(f'Ciphertext: {c}')
print(f'Decrypted:  {m_dec}')

Exercises

Exercise 1 (Worked)

Use the extended Euclidean algorithm (table method) to find $\gcd(161, 28)$ and express it as a linear combination $s \cdot 161 + t \cdot 28$ .

In [ ]:

# Exercise 1: Worked solution
# Step-by-step extended GCD table for gcd(161, 28)
g, s, t = extended_gcd_table(161, 28)
print(f'\nFinal answer: gcd(161, 28) = {g}')
print(f'Bezout coefficients: s = {s}, t = {t}')
print(f'Check: ({s}) * 161 + ({t}) * 28 = {s * 161 + t * 28}')

Since $\gcd(161, 28) = 7$ (not 1), 161 and 28 are not coprime, and neither has an inverse modulo the other. But the Bezout equation $s \cdot 161 + t \cdot 28 = 7$ still holds, the extended GCD always works, whether or not the gcd is 1.

Exercise 2 (Guided)

Find $23^{-1} \bmod 89$ using the extended GCD. Then verify your answer.

Hint: $89$ is prime, so $\gcd(23, 89) = 1$ is guaranteed.

In [ ]:

# Exercise 2: Guided
a, n = 23, 89

# Step 1: Run the extended GCD
# TODO: use xgcd(a, n) to find g, s, t
# g, s, t = ...

# Step 2: Extract the modular inverse
# TODO: compute inv = s % n (reduce to canonical representative in {0, ..., n-1})
# inv = ...

# Step 3: Verify
# TODO: check that a * inv ≡ 1 (mod n)
# print(f'{a} * {inv} mod {n} = {(a * inv) % n}')

# Step 4: Cross-check with SageMath's inverse_mod
# TODO: print(f'inverse_mod({a}, {n}) = {inverse_mod(a, n)}')

Exercise 3 (Independent)

In a toy RSA setup, you're given:

$p = 47$ , $q = 71$
Public exponent $e = 79$

Tasks:

Compute $n = p \cdot q$ and $\varphi(n) = (p-1)(q-1)$
Verify that $\gcd(e, \varphi(n)) = 1$ (so $e$ is a valid public exponent)
Compute the private key $d = e^{-1} \bmod \varphi(n)$ using the extended GCD
Encrypt the message $m = 100$ as $c = m^e \bmod n$
Decrypt $c$ as $m' = c^d \bmod n$ and verify $m' = m$

In [ ]:

# Exercise 3: Your code here

Summary

Concept	Key idea
Bezout's identity	For any integers $a, b$ , there exist $s, t$ with $\gcd(a,b) = s \cdot a + t \cdot b$
Backward substitution	Express the gcd as a linear combination by substituting remainders back through the Euclidean algorithm steps
The table method	Computes Bezout coefficients alongside the gcd using the same recurrence ( $r_i = r_{i-2} - q_i r_{i-1}$ ) applied to all three columns $(r, s, t)$
Modular inverses	If $\gcd(a, n) = 1$ , then $s \cdot a + t \cdot n = 1$ implies $s \equiv a^{-1} \pmod{n}$
RSA connection	The private key $d = e^{-1} \bmod \varphi(n)$ is computed by running the extended GCD on $e$ and $\varphi(n)$

Crypto preview: We now know how to compute modular inverses. In the next notebook, we'll learn about Euler's totient function $\varphi(n)$ and Fermat's little theorem, the why behind RSA's correctness: why does $m^{ed} \equiv m \pmod{n}$ ?

Next: Euler's Totient and Fermat's Theorem