GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/05-discrete-log-diffie-hellman/sage/05f-pohlig-hellman.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.5

Notebook 05f: The Pohlig-Hellman Algorithm

Module 05. The Discrete Logarithm and Diffie-Hellman

Motivating Question. BSGS solves the DLP in $O(\sqrt{n})$ time for any group of order $n$ . But what if $n$ factors into small primes, say $n = 2^4 \cdot 3^3 \cdot 5 = 2160$ ? Could we somehow solve the DLP in each small prime-power subgroup separately and then combine the results? The Pohlig-Hellman algorithm does exactly this, reducing the DLP in a group of order $n$ to DLPs in groups of order equal to each prime factor of $n$ . The cost is $O(\sum_i e_i(\log n + \sqrt{q_i}))$ where $n = \prod q_i^{e_i}$ , dramatically faster when all $q_i$ are small.

Prerequisites. You should be comfortable with:

The Chinese Remainder Theorem (Module 04d)
Baby-step giant-step (notebook 05e)
Subgroups and element order (notebook 05b)

Learning objectives. By the end of this notebook you will be able to:

Explain why smooth group orders make the DLP easy.
Project a DLP instance into prime-order subgroups.
Solve sub-DLPs and combine with CRT.
Implement Pohlig-Hellman from scratch.
Explain why safe primes resist this attack.

1. Smooth Numbers and Vulnerable Groups

Definition. An integer $n$ is $B$ -smooth if all its prime factors are $\le B$ .

If the group order $n = p - 1$ is smooth, the DLP can be broken efficiently. Let us compare two primes of similar size.

In [ ]:

# A smooth-order group vs a safe-prime group
p_smooth = 433    # p-1 = 432 = 2^4 * 3^3
p_safe = 467      # p-1 = 466 = 2 * 233 (233 is prime)

print(f"Smooth:     p = {p_smooth}, p-1 = {p_smooth-1} = {factor(p_smooth-1)}")
print(f"  Largest prime factor: {max(q for q, _ in factor(p_smooth-1))}")
print()
print(f"Safe prime: p = {p_safe}, p-1 = {p_safe-1} = {factor(p_safe-1)}")
print(f"  Largest prime factor: {max(q for q, _ in factor(p_safe-1))}")
print()
print(f"Pohlig-Hellman cost (smooth): O(sqrt(2) + sqrt(3)) = O(tiny)")
print(f"Pohlig-Hellman cost (safe):   O(sqrt(233)) = O({isqrt(233)})")

Checkpoint 1. If $p - 1 = 2^{10} \cdot 3^5 \cdot 5^3 \cdot 7^2 = 259,459,200$ , what is the largest prime factor? What would the BSGS cost be for the full group vs the largest subgroup?

2. The Key Idea: Projection into Subgroups

Suppose $n = |G| = q_1^{e_1} \cdot q_2^{e_2} \cdots q_k^{e_k}$ . We want to find $x$ such that $g^x = h$ .

Step 1: Project. For each prime power $q_i^{e_i}$ , define: $g_i = g^{n / q_i^{e_i}}, \quad h_i = h^{n / q_i^{e_i}}$

Then $g_i$ has order $q_i^{e_i}$ , and $g_i^{x_i} = h_i$ where $x_i = x \bmod q_i^{e_i}$ .

Step 2: Solve. Solve each small DLP: $g_i^{x_i} = h_i$ in a group of order $q_i^{e_i}$ .

Step 3: Combine. Use the Chinese Remainder Theorem to recover $x$ from $\{x_i\}$ .

Bridge from Module 04. This is a direct application of the CRT from notebook 04d! There we used CRT to solve systems of congruences. Here we use it to combine DLP solutions from different subgroups.

In [ ]:

# Demonstrate the projection step
p = 433   # p-1 = 432 = 2^4 * 3^3
n = p - 1
g = Mod(primitive_root(p), p)
x_secret = 137
h = g^x_secret

print(f"p = {p}, n = p-1 = {n} = {factor(n)}")
print(f"Secret x = {x_secret}")
print(f"g = {int(g)}, h = g^x = {int(h)}")
print()

# Project into each prime-power subgroup
for q, e in factor(n):
    qi_ei = q^e
    exp = n // qi_ei
    gi = g^exp
    hi = h^exp
    xi = x_secret % qi_ei
    
    print(f"Prime power {q}^{e} = {qi_ei}:")
    print(f"  g_i = g^({n}/{qi_ei}) = g^{exp} = {int(gi)}  [order {multiplicative_order(gi)}]")
    print(f"  h_i = h^{exp} = {int(hi)}")
    print(f"  x mod {qi_ei} = {xi}")
    print(f"  Check: g_i^{xi} = {int(gi^xi)} == h_i? {gi^xi == hi}")
    print()

3. Solving Sub-DLPs

Each projected DLP is in a group of order $q_i^{e_i}$ . For prime $q_i$ (when $e_i = 1$ ), we just run BSGS in a group of order $q_i$ . For prime powers ( $e_i > 1$ ), we can further decompose using a "lifting" technique, but for simplicity we'll use BSGS on the full prime-power subgroup.

In [ ]:

def baby_step_giant_step(g, h, n):
    """Solve g^x = h in a group of order n."""
    m = isqrt(n) + 1
    table = {}
    gj = g^0
    for j in range(m):
        table[gj] = j
        gj = gj * g
    g_inv_m = g^(-m)
    gamma = h
    for i in range(m):
        if gamma in table:
            return (i * m + table[gamma]) % n
        gamma = gamma * g_inv_m
    return None

# Solve each sub-DLP
p = 433
n = p - 1
g = Mod(primitive_root(p), p)
x_secret = 137
h = g^x_secret

print(f"Solving sub-DLPs for x = {x_secret} mod {n} = {factor(n)}")
print()

residues = []   # (x_i, q_i^e_i) pairs for CRT
for q, e in factor(n):
    qi_ei = q^e
    exp = n // qi_ei
    gi = g^exp
    hi = h^exp
    
    xi = baby_step_giant_step(gi, hi, qi_ei)
    residues.append((xi, qi_ei))
    
    print(f"  x mod {qi_ei} ({q}^{e}) = {xi}  "
          f"[BSGS in group of order {qi_ei}, cost O(sqrt({qi_ei})) = O({isqrt(qi_ei)+1})]")

print(f"\nResidues for CRT: {residues}")

Misconception alert. "Pohlig-Hellman always makes the DLP easy." No! It only helps when the group order has small prime factors. If $n$ is prime (or has a large prime factor), Pohlig-Hellman reduces to a single BSGS in a group of that large order, no speedup.

4. CRT Combination

We now have $x \equiv x_i \pmod{q_i^{e_i}}$ for each prime power. The CRT gives us the unique $x \in [0, n)$ .

In [ ]:

# Combine using CRT
remainders = [r for r, m in residues]
moduli = [m for r, m in residues]

x_recovered = CRT_list(remainders, moduli)

print(f"CRT input:")
for r, m in residues:
    print(f"  x ≡ {r} (mod {m})")
print(f"\nCRT output: x = {x_recovered}")
print(f"Secret was: x = {x_secret}")
print(f"Match? {x_recovered == x_secret}")
print(f"Verify: g^{x_recovered} = {int(g^x_recovered)} == h = {int(h)}? {g^x_recovered == h}")

5. Complete Implementation

In [ ]:

def pohlig_hellman(g, h, n):
    """
    Solve g^x = h in a group of order n using Pohlig-Hellman.
    Requires: n is known and g has order n.
    """
    residues = []
    moduli = []
    
    for q, e in factor(n):
        qi_ei = q^e
        exp = n // qi_ei
        gi = g^exp      # project generator
        hi = h^exp      # project target
        xi = baby_step_giant_step(gi, hi, qi_ei)
        residues.append(xi)
        moduli.append(qi_ei)
    
    return CRT_list(residues, moduli)

# Test on several examples
p = 433
g = Mod(primitive_root(p), p)

print(f"p = {p}, p-1 = {factor(p-1)}")
for x_test in [0, 1, 42, 137, 431]:
    h = g^x_test
    x_found = pohlig_hellman(g, h, p - 1)
    print(f"  x = {x_test}: Pohlig-Hellman returns {x_found}, correct? {x_found == x_test}")

Checkpoint 2. What happens if you run pohlig_hellman on a safe prime like $p = 467$ (where $p - 1 = 2 \cdot 233$ )? What are the subgroup orders? Is there a speedup over plain BSGS?

In [ ]:

# Checkpoint 2, safe prime
p = 467
g = Mod(primitive_root(p), p)
x_test = 200
h = g^x_test

print(f"p = {p}, p-1 = {factor(p-1)}")
print(f"Subgroups: order 2 and order 233")
print(f"BSGS in order-233 subgroup: O(sqrt(233)) = O({isqrt(233)+1}) operations")
print(f"BSGS on full group: O(sqrt(466)) = O({isqrt(466)+1}) operations")
print(f"Speedup: only ~{isqrt(466)//isqrt(233)}x. Pohlig-Hellman barely helps!")
print()
x_found = pohlig_hellman(g, h, p - 1)
print(f"Result: x = {x_found}, correct? {x_found == x_test}")

6. Performance Comparison

Let us compare Pohlig-Hellman, BSGS, and brute force on smooth vs safe primes.

In [ ]:


# Find a smooth prime and a safe prime of similar size
# Smooth: p-1 has only small factors
# We'll use p = 2^16 * 3^2 * 5 + 1 = 2949121? Let's find one
def find_smooth_prime(target_bits, B):
    """Find a prime p near 2^target_bits where p-1 is B-smooth."""
    small_primes = list(primes(2, B + 1))
    for _ in range(10000):
        n = 1
        while n.nbits() < target_bits:
            n *= small_primes[randint(0, len(small_primes)-1)]
        if is_prime(n + 1):
            return n + 1
    return None

p_smooth = find_smooth_prime(20, 7)
if p_smooth is None:
    p_smooth = 746497  # 746496 = 2^7 * 3^2 * 11 * 59, fallback

# Safe prime of similar size
p_safe = next_prime(p_smooth)
while not is_prime((p_safe - 1) // 2):
    p_safe = next_prime(p_safe + 1)

print(f"Smooth prime: p = {p_smooth}, p-1 = {factor(p_smooth - 1)}")
print(f"Safe prime:   p = {p_safe}, p-1 = {factor(p_safe - 1)}")
print()

for label, p_test in [("Smooth", p_smooth), ("Safe", p_safe)]:
    g = Mod(primitive_root(p_test), p_test)
    x = randint(2, p_test - 2)
    h = g^x
    
    start = walltime()
    x_ph = pohlig_hellman(g, h, p_test - 1)
    t_ph = (walltime() - start) * 1000
    
    start = walltime()
    x_bsgs = baby_step_giant_step(g, h, p_test - 1)
    t_bsgs = (walltime() - start) * 1000
    
    print(f"{label}: PH = {t_ph:.2f} ms, BSGS = {t_bsgs:.2f} ms, "
          f"PH speedup = {t_bsgs/t_ph:.1f}x" if t_ph > 0 else f"{label}: both fast")

7. Why Safe Primes Resist Pohlig-Hellman

For a safe prime $p = 2q + 1$ with $q$ prime:

$p - 1 = 2q$ has only two prime factors: $2$ and $q$ .
The DLP in the order- $2$ subgroup is trivial (only check 2 elements).
The DLP in the order- $q$ subgroup requires $O(\sqrt{q})$ work.
Since $q \approx p/2$ , the overall cost is $O(\sqrt{p/2})$ , essentially no better than generic BSGS.

This is why safe primes are used for Diffie-Hellman parameters in practice (e.g., RFC 3526).

Crypto foreshadowing. The same principle applies to elliptic curves (Module 06): we choose curves where the group order is prime (or nearly prime), so Pohlig-Hellman provides no advantage. A curve with a smooth group order would be cryptographically useless.

In [ ]:

# Visualise: DLP difficulty vs smoothness of p-1

primes_list = list(primes(100, 1000))
largest_factors = []
for p in primes_list:
    largest_factors.append(max(q for q, _ in factor(p - 1)))

G = point(list(zip(primes_list, largest_factors)), size=20, alpha=0.7, color='blue')
G += plot((x-1)/2, (x, 100, 1000), color='red', linestyle='--', alpha=0.5,
          legend_label='$(p-1)/2$ (safe prime line)')
G.show(figsize=[12, 5],
       axes_labels=['Prime p', 'Largest prime factor of p-1'],
       title='Largest prime factor of $p-1$: higher = more resistant to Pohlig-Hellman',
       gridlines=True)

print("Primes near the red line are safe primes (most resistant).")
print("Primes near the bottom have smooth p-1 (most vulnerable).")

Checkpoint 3. Among primes $p < 100$ , which one has the smoothest $p - 1$ (smallest largest prime factor)? Which is a safe prime?

8. Exercises

Exercise 1 (Worked): Pohlig-Hellman by Hand

Problem. Solve $2^x \equiv 41 \pmod{97}$ using Pohlig-Hellman.

Solution. $p - 1 = 96 = 2^5 \cdot 3$ .

Subgroup of order $32$ ( $2^5$ ): project $g' = 2^{96/32} = 2^3$ , $h' = 41^3$ . Solve $g'^{x_1} = h'$ .
Subgroup of order $3$ : project $g' = 2^{96/3} = 2^{32}$ , $h' = 41^{32}$ . Solve $g'^{x_2} = h'$ .
CRT: $x \equiv x_1 \pmod{32}$ , $x \equiv x_2 \pmod{3}$ .

In [ ]:

# Exercise 1, verification
p = 97; g = Mod(2, p); h = Mod(41, p)
n = 96
print(f"p = {p}, n = {n} = {factor(n)}")

# Find secret for reference
x_true = discrete_log(h, g)
print(f"True answer: x = {x_true}")
print()

# Pohlig-Hellman
x_ph = pohlig_hellman(g, h, n)
print(f"Pohlig-Hellman: x = {x_ph}")
print(f"Match? {x_ph == x_true}")
print(f"Verify: 2^{x_ph} mod 97 = {int(g^x_ph)}")

Exercise 2 (Guided): Attack on a Smooth-Order Group

Problem. A careless implementer uses $p = 2^{20} \cdot 3 \cdot 5 + 1 = 15728641$ for Diffie-Hellman (this is prime!).

Factor $p - 1$ and identify why this is vulnerable.
Generate a DH exchange (pick random $a, b$ , compute $A = g^a, B = g^b$ ).
As Eve, use Pohlig-Hellman to recover Alice's secret $a$ from $A = g^a$ .
Compute the shared secret.

Hint: p - 1 = 15728640 = 2^20 * 3 * 5. The largest prime factor is just 5!

In [ ]:

# Exercise 2, fill in the TODOs
p = 15728641
assert is_prime(p)

# TODO 1: Factor p-1 and show it's smooth
# print(f"p-1 = {p-1} = {factor(p-1)}")

# TODO 2: Set up DH
# g = Mod(primitive_root(p), p)
# a = randint(2, p-2); b = randint(2, p-2)
# A = g^a; B = g^b
# true_secret = B^a

# TODO 3: Eve recovers a using Pohlig-Hellman
# a_eve = pohlig_hellman(g, A, p - 1)
# print(f"Eve recovered a = {a_eve}, true a = {a}")

# TODO 4: Eve computes shared secret
# eve_secret = B^a_eve
# print(f"Eve's shared secret:  {eve_secret}")
# print(f"True shared secret:   {true_secret}")
# print(f"Match? {eve_secret == true_secret}")

Exercise 3 (Independent): Measuring Smoothness Vulnerability

Problem.

Write a function pohlig_hellman_cost(n) that estimates the total BSGS cost: $\sum_{q^e \| n} \lceil\sqrt{q^e}\rceil$ .
For primes $p$ in the range $[10^6, 10^6 + 1000]$ , compute this cost and the "full BSGS" cost $\lceil\sqrt{p-1}\rceil$ .
Plot the ratio. Which primes are most vulnerable? Can you identify the safe primes?

In [ ]:

# Exercise 3, write your solution here

Summary

Concept	Key Fact
Smooth order	If $n = p-1$ has only small prime factors, DLP is easy
Projection	Map DLP into subgroups: $g_i = g^{n/q_i^{e_i}}$ , $h_i = h^{n/q_i^{e_i}}$
Sub-DLP	Solve $g_i^{x_i} = h_i$ in group of order $q_i^{e_i}$ via BSGS
CRT	Combine $x \equiv x_i \pmod{q_i^{e_i}}$ to recover $x$
Cost	$O(\sum \sqrt{q_i^{e_i}})$ , fast when all $q_i$ are small
Defence	Use safe primes $p = 2q+1$ so $p-1$ has a large prime factor

This completes Module 05! We have built a full picture of the discrete logarithm: from definition (05a), through generators (05b), to the Diffie-Hellman protocol (05c), its theoretical security (05d), and the two main attacks (05e, 05f).

Next module: Module 06. Elliptic Curves