Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
duyuefeng0708
GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/05-discrete-log-diffie-hellman/sage/05a-the-discrete-log-problem.ipynb
483 views
unlisted
Kernel: SageMath 10.5

Notebook 05a: The Discrete Logarithm Problem

Module 05. The Discrete Logarithm and Diffie-Hellman


Motivating Question. In the real numbers, if someone tells you 2x=10242^x = 1024, you solve x=log21024=10x = \log_2 1024 = 10 instantly. But what if someone tells you 3x13(mod17)3^x \equiv 13 \pmod{17}? You can't just take a logarithm, the group structure of Z/pZ\mathbb{Z}/p\mathbb{Z}^* scrambles the usual patterns. This "discrete logarithm" problem is believed to be fundamentally hard for large primes, and that hardness is the foundation of Diffie-Hellman, ElGamal, DSA, and elliptic curve cryptography.


Prerequisites. You should be comfortable with:

  • Modular arithmetic and cyclic groups (Module 01)

  • The multiplicative group Z/pZ\mathbb{Z}/p\mathbb{Z}^* and Euler's totient (Module 04)

Learning objectives. By the end of this notebook you will be able to:

  1. State the discrete logarithm problem precisely.

  2. Distinguish the DLP from ordinary (real-valued) logarithms.

  3. Compute discrete logs by brute force and understand why brute force fails for large groups.

  4. Use SageMath's discrete_log() to solve DLP instances.

  5. Appreciate the asymmetry that makes DLP useful for cryptography: exponentiation is fast, but inversion (the discrete log) is slow.

1. Ordinary vs Discrete Logarithms

Over the real numbers, the exponential function f(x)=bxf(x) = b^x is a bijection from R\mathbb{R} to R>0\mathbb{R}_{>0}, so it has a well-defined inverse: the logarithm logb\log_b.

SettingGivenFindMethod
R\mathbb{R}2x=10242^x = 1024x=10x = 10x=log2(1024)x = \log_2(1024), closed-form formula
Z/pZ\mathbb{Z}/p\mathbb{Z}^*gxh(modp)g^x \equiv h \pmod{p}x=?x = ?No known efficient general method!

The key difference: in Z/pZ\mathbb{Z}/p\mathbb{Z}^*, the exponential function "wraps around" modulo pp. The smooth, monotone structure that makes real logarithms easy is destroyed by the modular reduction.

# Real logarithm: smooth, predictable # Left: continuous exponential G1 = plot(2.0^x, (x, 0.1, 16), color='blue', thickness=2) G1 += text('Real: $2^x$ (smooth, invertible)', (8, 50000), fontsize=12) # Right: discrete exponential mod p, scrambled! p = 23 g = Mod(5, p) x_disc = list(range(0, p - 1)) y_disc = [int(g^xi) for xi in x_disc] G2 = point(list(zip(x_disc, y_disc)), size=30, color='red') G2 += text(f'Discrete: $5^x \\mod {p}$ (scrambled!)', ((p-2)/2, p+1), fontsize=12) # Show side by side graphics_array([G1, G2]).show(figsize=[12, 4])

Look at the right plot: the values jump around unpredictably. Given a yy-value, you cannot "read off" the xx from the plot, you would have to check each point. This visual scrambling is the heart of the DLP's difficulty.

2. The Formal Definition

Definition (Discrete Logarithm Problem). Let GG be a cyclic group of order nn, and let gg be a generator. Given hGh \in G, find an integer xx with 0x<n0 \le x < n such that gx=h.g^x = h.

We write x=logghx = \log_g h and call xx the discrete logarithm of hh with respect to base gg.

The most common concrete setting is G=Z/pZG = \mathbb{Z}/p\mathbb{Z}^* (the multiplicative group modulo a prime pp), which has order p1p - 1.


Misconception alert. "The discrete log might not exist." If gg is a generator of the cyclic group GG, then every element hGh \in G is a power of gg, so the discrete log always exists and is unique modulo G|G|. The problem is not existence, it is finding it efficiently.

# Concrete example: DLP in Z/23Z* p = 23 g = Mod(5, p) # 5 is a primitive root mod 23 (we'll prove this in 05b) # Build the complete "discrete log table" print(f"Powers of g = {g} in Z/{p}Z*:") print("x | g^x")print("-" * 14) for x in range(p - 1): print(f"{x} | {int(g^x)}")

Checkpoint 1. Using the table above, find log58(mod23)\log_5 8 \pmod{23} (i.e., find xx such that 5x8(mod23)5^x \equiv 8 \pmod{23}). Scan the right column for 8 and read off xx. Could you do this if pp had 300 digits?

The simplest approach: try x=0,1,2,x = 0, 1, 2, \ldots until gx=hg^x = h. This requires at most G|G| multiplications.

For G=p122|G| = p - 1 \approx 22, this is instant. For G22048|G| \approx 2^{2048}, the number of operations exceeds the number of atoms in the observable universe (2266\approx 2^{266}).

def discrete_log_brute(g, h, group_order): """ Find x such that g^x = h by trying all x in [0, group_order). Returns x if found, None otherwise. """ power = g^0 # start at g^0 = 1 for x in range(group_order): if power == h: return x power = power * g return None # Small example p = 23 g = Mod(5, p) h = Mod(8, p) x = discrete_log_brute(g, h, p - 1) print(f"log_{5}({8}) mod {p} = {x}") print(f"Verification: 5^{x} mod {p} = {int(g^x)}")
# Brute force works fine for small groups p = 23 g = Mod(5, p) print("Solving all DLPs in Z/23Z*:") for target in range(1, p): h = Mod(target, p) x = discrete_log_brute(g, h, p - 1) print(f" log_5({target}) = {x} [check: 5^{x} = {int(g^x)}]")

4. The One-Way Function: Fast Forward, Slow Backward

The cryptographic magic is the asymmetry between exponentiation and discrete log:

OperationComplexity2048-bit prime
Compute gxmodpg^x \bmod p (square-and-multiply)O(logx)O(\log x) multiplications~2048 multiplications, milliseconds
Find xx from gxmodpg^x \bmod p (best known)O(exp(c(logp)1/3(loglogp)2/3))O(\exp(c \cdot (\log p)^{1/3} (\log \log p)^{2/3}))Sub-exponential but still infeasible

This makes exponentiation a one-way function (under the DLP assumption): easy to compute, hard to invert.

Let us see this asymmetry in action.

# Exponentiation: fast even for large numbers p = next_prime(2^64) g = Mod(primitive_root(p), p) x_secret = randint(2, p - 2) start = walltime() h = g^x_secret exp_time = walltime() - start print(f"p has {p.ndigits()} digits") print(f"Exponentiation: g^x mod p computed in {exp_time*1000:.3f} ms") # Brute force: painfully slow even for "small" cryptographic sizes p_small = next_prime(10^7) g_small = Mod(primitive_root(p_small), p_small) x_small = randint(2, p_small - 2) h_small = g_small^x_small start = walltime() x_found = discrete_log_brute(g_small, h_small, p_small - 1) brute_time = walltime() - start print(f"\np_small has {p_small.ndigits()} digits ({p_small - 1} elements in group)") print(f"Brute force: found x in {brute_time*1000:.1f} ms") print(f"Correct? {x_found == x_small}")

Checkpoint 2. If brute force on a 7-digit prime takes about 1 second, roughly how long would brute force take on a 20-digit prime? (Hint: the group order grows with pp, and brute force is O(p)O(p).)

5. Visualising the Scramble

Let us build a more detailed picture of why discrete logs are hard. We will map out which exponent maps to which group element and look for patterns.

# Visualise the permutation induced by x -> g^x mod p p = 47 g = Mod(primitive_root(p), p) exponents = list(range(p - 1)) values = [int(g^xi) for xi in exponents] # Plot 1: x -> g^x (the "scramble") G1 = point(list(zip(exponents, values)), size=15, color='blue') G1 = G1.plot() G1 += text(f'$g^x \\mod {p}$ (g={int(g)})', ((p-2)/2, p+3), fontsize=11) # Plot 2: histogram of values (should be uniform) # Build bar chart manually G2 = Graphics() counts = {} for v in values: counts[v] = counts.get(v, 0) + 1 for val in sorted(counts.keys()): G2 += polygon2d([(val-0.4, 0), (val-0.4, counts[val]), (val+0.4, counts[val]), (val+0.4, 0)], fill=True, color='green', alpha=0.7, edgecolor='black') G2 += text('Distribution of $g^x \\mod p$', (p/2, 1.5), fontsize=11) # Plot 3: differences between consecutive powers diffs = [(values[i+1] - values[i]) % p for i in range(len(values)-1)] G3 = point(list(zip(range(len(diffs)), diffs)), size=8, color='red') G3 += text('Consecutive differences $g^{x+1} - g^x$', (len(diffs)/2, p+3), fontsize=11) graphics_array([G1, G2, G3]).show(figsize=[15, 4])

Observations:

  • The scatter plot looks essentially random, no visible pattern linking xx to gxmodpg^x \bmod p.

  • The histogram is perfectly uniform: every value 1,2,,p11, 2, \ldots, p-1 appears exactly once (since gg is a generator).

  • The consecutive differences are also erratic, knowing gxg^x gives no useful information about gx+1g^{x+1} without computing it.

This "pseudorandom" behaviour is what makes the DLP hard: there is no shortcut that exploits structure in the output.

6. SageMath's discrete_log()

SageMath has a built-in discrete_log() function that uses a combination of algorithms (baby-step giant-step, Pohlig-Hellman, index calculus) to solve DLPs much faster than brute force. We will study these algorithms individually in notebooks 05e and 05f.


Bridge from Module 04. In Module 04 we used power_mod(g, x, p) to compute gxmodpg^x \bmod p efficiently. Now discrete_log() does the inverse: given the result hh, it recovers xx. Notice the asymmetry, power_mod is always fast, while discrete_log becomes slow as pp grows.

# discrete_log(h, g) finds x such that g^x = h p = next_prime(10^9) g = Mod(primitive_root(p), p) x_secret = randint(2, p - 2) h = g^x_secret print(f"p = {p} ({p.ndigits()} digits)") print(f"g = {g}") print(f"h = g^x = {h}") print(f"(x_secret = {x_secret})") start = walltime() x_found = discrete_log(h, g) elapsed = walltime() - start print(f"\ndiscrete_log found: x = {x_found}") print(f"Correct? {x_found == x_secret}") print(f"Time: {elapsed*1000:.1f} ms")
# Scaling experiment: how does solve time grow with the group size? print("Bits p digits Time (ms)") for bits in [16, 20, 24, 28, 32, 36, 40]: p_test = next_prime(2^bits) g_test = Mod(primitive_root(p_test), p_test) x_test = randint(2, p_test - 2) h_test = g_test^x_test start = walltime() x_result = discrete_log(h_test, g_test) t = (walltime() - start) * 1000 assert x_result == x_test, "discrete_log returned wrong answer!" print(f"{bits} {p_test.ndigits()} {t:>12.1f}")

Notice how the time grows rapidly with the number of bits. At cryptographic sizes (2048 bits), even SageMath's sophisticated algorithms cannot solve the DLP in reasonable time.


Crypto foreshadowing. This one-way property is the bedrock of:

  • Diffie-Hellman key exchange (notebook 05c): Alice publishes gag^a, Bob publishes gbg^b; neither can recover the other's secret exponent, yet both can compute gabg^{ab}.

  • ElGamal encryption: the public key is h=gxh = g^x; encrypting is easy, but decrypting without xx requires solving the DLP.

  • DSA / Schnorr signatures (Module 09): signing uses the secret exponent, verification uses only the public power.

  • Elliptic curve cryptography (Module 06): the same DLP idea, but in elliptic curve groups where the problem is even harder per bit.

7. Exercises

Exercise 1 (Worked): Manual DLP in a Small Group

Problem. In Z/11Z\mathbb{Z}/11\mathbb{Z}^* with generator g=2g = 2, find log27\log_2 7 (i.e., find xx such that 2x7(mod11)2^x \equiv 7 \pmod{11}).

Solution. Compute successive powers of 2 modulo 11:

xx2xmod112^x \bmod 11
01
12
24
38
45
510
69
77

So log27=7\log_2 7 = 7 in Z/11Z\mathbb{Z}/11\mathbb{Z}^*.

# Exercise 1, verification p = 11 g = Mod(2, p) h = Mod(7, p) x = discrete_log_brute(g, h, p - 1) print(f"log_2(7) mod 11 = {x}") print(f"Check: 2^{x} mod 11 = {int(g^x)}")

Exercise 2 (Guided): Timing the One-Way Function

Problem. For a 50-bit prime pp:

  1. Compute h=gxmodph = g^x \bmod p for a random xx, and time it.

  2. Recover xx using discrete_log(), and time it.

  3. Compute the ratio: how many times slower is the "backward" direction?

Hint: Use next_prime(2^50) for pp and walltime() for timing.

# Exercise 2, fill in the TODOs p = next_prime(2^50) g = Mod(primitive_root(p), p) x_secret = randint(2, p - 2) # TODO 1: Time the forward direction (exponentiation) # start = walltime() # h = ??? # forward_time = walltime() - start # TODO 2: Time the backward direction (discrete log) # start = walltime() # x_found = ??? # backward_time = walltime() - start # TODO 3: Print results and compute the ratio # print(f"Forward: {forward_time*1000:.3f} ms") # print(f"Backward: {backward_time*1000:.1f} ms") # print(f"Ratio: {backward_time/forward_time:.0f}x slower")

Exercise 3 (Independent): DLP Existence and Uniqueness

Problem.

  1. In Z/13Z\mathbb{Z}/13\mathbb{Z}^*, let g=2g = 2. Build the complete discrete log table (i.e., for each h{1,2,,12}h \in \{1, 2, \ldots, 12\}, find log2h\log_2 h). Verify that every element hh has a unique discrete log in {0,1,,11}\{0, 1, \ldots, 11\}.

  2. Now try g=3g = 3 in Z/13Z\mathbb{Z}/13\mathbb{Z}^*. Is g=3g = 3 a generator? Build its power table and explain what you observe.

  3. What happens if you try to compute log32(mod13)\log_3 2 \pmod{13} using brute force with only the powers of 3? Does it exist?

# Exercise 3, write your solution here

Summary

ConceptKey Fact
Discrete log problemGiven g,hg, h in a cyclic group, find xx such that gx=hg^x = h
ExistenceIf gg generates the group, loggh\log_g h always exists and is unique mod $
Brute forceTry all xx: $O(
One-way asymmetryExponentiation is O(logx)O(\log x); discrete log is believed hard (sub-exponential best known for Z/pZ\mathbb{Z}/p\mathbb{Z}^*)
Cryptographic useDLP hardness underlies DH, ElGamal, DSA, and ECC

The DLP is only hard when the group is chosen carefully. In the next notebook, we will study primitive roots and generators, the elements that make Z/pZ\mathbb{Z}/p\mathbb{Z}^* cyclic and give the DLP its full strength.


Next: 05b. Primitive Roots and Generators