Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
duyuefeng0708
GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/05-discrete-log-diffie-hellman/sage/05e-baby-step-giant-step.ipynb
483 views
unlisted
Kernel: SageMath 10.5

Notebook 05e: Baby-Step Giant-Step

Module 05. The Discrete Logarithm and Diffie-Hellman


Motivating Question. Brute-force DLP takes O(n)O(n) time where n=Gn = |G|. Can we do better without exploiting any special structure of the group? Shanks' baby-step giant-step (BSGS) algorithm answers yes: it solves any DLP in O(n)O(\sqrt{n}) time and O(n)O(\sqrt{n}) space. This is the classic time-space tradeoff in cryptanalysis.


Prerequisites. You should be comfortable with:

  • The discrete logarithm problem (notebook 05a)

  • Modular arithmetic (Module 01)

Learning objectives. By the end of this notebook you will be able to:

  1. Explain the key idea behind BSGS: decompose x=im+jx = im + j.

  2. Build the baby-step lookup table and perform giant steps.

  3. Implement BSGS from scratch and verify correctness.

  4. Analyse the O(n)O(\sqrt{n}) time and space complexity.

  5. Compare BSGS performance to brute force experimentally.

1. The Key Idea

We want to find xx such that gx=hg^x = h in a group of order nn.

Set m=nm = \lceil \sqrt{n} \rceil. Write x=im+jx = im + j where 0j<m0 \le j < m and 0i<m0 \le i < m.

Then: gx=gim+j=hgj=hgim=h(gm)ig^x = g^{im + j} = h \quad \Longleftrightarrow \quad g^j = h \cdot g^{-im} = h \cdot (g^{-m})^i

Strategy:

  1. Baby steps: Precompute and store gjg^j for j=0,1,,m1j = 0, 1, \ldots, m-1 in a hash table.

  2. Giant steps: For i=0,1,,m1i = 0, 1, \ldots, m-1, compute h(gm)ih \cdot (g^{-m})^i and check if it appears in the table.

  3. If h(gm)i=gjh \cdot (g^{-m})^i = g^j, then x=im+jx = im + j.

PhaseOperationsStorage
Baby stepsmm multiplicationsmm table entries
Giant stepsup to mm multiplications + lookups0 extra
TotalO(n)O(\sqrt{n})O(n)O(\sqrt{n})
# Visualise the decomposition x = i*m + j n = 100 # group order m = isqrt(n) + 1 # = 11 print(f"Group order n = {n}") print(f"Step size m = ceil(sqrt({n})) = {m}") print() # Show the grid: every x in [0, n) maps to a unique (i, j) print(f"x = i * {m} + j") for x in [0, 1, 10, 11, 42, 99]: i, j = x // m, x % m print(f"{x} = {i} * {m} + {j}")

Checkpoint 1. If n=10000n = 10000, what is mm? How many baby steps and (at most) how many giant steps are needed? Compare with brute force's 10000 steps.

2. Worked Example: p=101p = 101

Let us trace BSGS step by step for p=101p = 101, g=2g = 2 (primitive root), and a target h=g42h = g^{42}.

# Setup p = 101 n = p - 1 # = 100 g = Mod(primitive_root(p), p) x_secret = 42 h = g^x_secret m = isqrt(n) + 1 # = 11 print(f"p = {p}, g = {int(g)}, h = g^{x_secret} = {int(h)}") print(f"Group order n = {n}, step size m = {m}") # Phase 1: Baby steps, build table {g^j : j} print(f"\n=== Phase 1: Baby Steps ===") baby = {} gj = Mod(1, p) # g^0 for j in range(m): baby[gj] = j print(f" g^{j} = {int(gj)} [stored]") gj *= g print(f"\nBaby-step table: {len(baby)} entries")
# Phase 2: Giant steps, check h * (g^{-m})^i print("=== Phase 2: Giant Steps ===") g_inv_m = g^(-m) # precompute g^{-m} print(f"g^(-m) = g^(-{m}) = {int(g_inv_m)}") print() gamma = h # start with h * (g^{-m})^0 = h found = False for i in range(m): in_table = gamma in baby status = f"MATCH! j = {baby[gamma]}" if in_table else "not in table" print(f" i={i}: h * g^(-{m}*{i}) = h * g^(-{m*i}) = {int(gamma)} [{status}]") if in_table: j = baby[gamma] x = i * m + j print(f"\n => x = i*m + j = {i}*{m} + {j} = {x}") print(f" Verify: g^{x} = {int(g^x)} (should be {int(h)})") found = True break gamma *= g_inv_m if not found: print("No match found (shouldn't happen if g is a generator)")

Notice: we found x=42x = 42 after only m+i=11+3=14m + i = 11 + 3 = 14 total group operations, not 42.

3. Complete Implementation

Let us package BSGS into a clean function.

def baby_step_giant_step(g, h, n): """ Solve g^x = h in a group of order n using BSGS. Returns x, or None if no solution found. """ m = isqrt(n) + 1 # Baby steps: build table g^j -> j table = {} gj = g^0 for j in range(m): table[gj] = j gj = gj * g # Giant steps: check h * (g^{-m})^i g_inv_m = g^(-m) gamma = h for i in range(m): if gamma in table: x = i * m + table[gamma] return x % n gamma = gamma * g_inv_m return None # Test p = 101 g = Mod(primitive_root(p), p) for x_test in [0, 1, 42, 73, 99]: h = g^x_test x_found = baby_step_giant_step(g, h, p - 1) print(f" x = {x_test}: BSGS returns {x_found}, correct? {x_found == x_test}")

Misconception alert. "BSGS needs to know the exact group order nn." Technically, an upper bound suffices. If you use m=Nm = \lceil \sqrt{N} \rceil for any NnN \ge n, the algorithm still works (the table might be slightly larger than necessary, but correctness is preserved).

4. Performance: BSGS vs Brute Force

Let us compare the actual running times.

def discrete_log_brute(g, h, n): """Brute-force discrete log: try all x in [0, n).""" power = g^0 for x in range(n): if power == h: return x power = power * g return None print("Bits n sqrt(n) Brute (ms) BSGS (ms) Speedup") for bits in [10, 14, 18, 20, 22]: p_test = next_prime(2^bits) n_test = p_test - 1 g_test = Mod(primitive_root(p_test), p_test) x_test = randint(2, p_test - 2) h_test = g_test^x_test # Brute force start = walltime() x_brute = discrete_log_brute(g_test, h_test, n_test) t_brute = (walltime() - start) * 1000 # BSGS start = walltime() x_bsgs = baby_step_giant_step(g_test, h_test, n_test) t_bsgs = (walltime() - start) * 1000 assert x_brute == x_bsgs, f"Mismatch: brute={x_brute}, bsgs={x_bsgs}" speedup = t_brute / t_bsgs if t_bsgs > 0 else float('inf') print(f"{bits} {n_test} {isqrt(n_test)} {t_brute:>12.1f} {t_bsgs:>12.1f} {speedup:>8.1f}x")

Checkpoint 2. At 22 bits, how does the speedup compare to n/12222048\sqrt{n}/1 \approx \sqrt{2^{22}} \approx 2048? Why might the actual speedup differ from the theoretical ratio?

5. Visualising Baby and Giant Steps

Let us visualise how the two sets of steps "meet" to find the answer.

p = 47 n = p - 1 g = Mod(primitive_root(p), p) x_secret = 31 h = g^x_secret m = isqrt(n) + 1 # Compute baby steps baby_vals = [] gj = g^0 for j in range(m): baby_vals.append((j, int(gj))) gj *= g # Compute giant steps giant_vals = [] g_inv_m = g^(-m) gamma = h match_i, match_j = None, None baby_dict = {v: j for j, v in baby_vals} for i in range(m): giant_vals.append((i, int(gamma))) if int(gamma) in baby_dict.values(): # check if gamma matches any baby step for j_val, g_val in baby_vals: if g_val == int(gamma) and match_i is None: match_i, match_j = i, j_val gamma *= g_inv_m G = Graphics() # Plot baby steps baby_j = [j for j, v in baby_vals] baby_v = [v for j, v in baby_vals] G += point(list(zip(baby_j, baby_v)), color='blue', size=40, legend_label='Baby steps $g^j$', zorder=3) # Plot giant steps (offset x by m+1 to separate visually) giant_i = [i for i, v in giant_vals] giant_v = [v for i, v in giant_vals] G += point([(i + m + 1, v) for i, v in zip(giant_i, giant_v)], color='red', size=40, marker='s', legend_label='Giant steps $h \\cdot g^{-im}$', zorder=3) # Mark the match if match_i is not None: match_val = int(g^match_j) G += line([(-1, match_val), (2*m + 2, match_val)], color='green', linestyle='--', alpha=0.5, legend_label=f'Match at value {match_val}') G += text(f'x = {match_i}*{m}+{match_j} = {match_i*m+match_j}', (match_j + 1, match_val + 2), fontsize=10, color='green') G.show(figsize=[12, 5], axes_labels=['Step index', 'Group element value'], title=f'BSGS for $g^x = h$ in $\\mathbb{{Z}}/{p}\\mathbb{{Z}}^*$ (x={x_secret})')

6. Memory vs Time

BSGS uses O(n)O(\sqrt{n}) memory for the hash table. For a 128-bit group, that is 2128=264\sqrt{2^{128}} = 2^{64} entries, about 128 exabytes. This is infeasible!

Group sizen\sqrt{n}Memory (16 bytes/entry)
2322^{32}2162^{16} = 65,5361 MB
2642^{64}2322^{32} = 4 billion64 GB
21282^{128}2642^{64}256 EB (infeasible)
22562^{256}21282^{128}absurdly large

For cryptographic sizes, BSGS alone is not enough. More advanced methods like Pollard's rho algorithm achieve O(n)O(\sqrt{n}) time with O(1)O(1) space, and index calculus methods achieve sub-exponential time for Z/pZ\mathbb{Z}/p\mathbb{Z}^*.


Bridge to Module 06. For elliptic curve groups, the best known generic algorithm is still O(n)O(\sqrt{n}) (Pollard's rho). There is no index calculus for general elliptic curves! This is why ECC can use much smaller keys than Z/pZ\mathbb{Z}/p\mathbb{Z}^*-based crypto: 256-bit ECC \approx 3072-bit DH.

# BSGS on progressively larger groups (measure memory) import sys print("Bits n m=sqrt(n) Table size (KB)")for bits in [10, 16, 20, 24, 28]: p_test = next_prime(2^bits) n_test = p_test - 1 m_test = isqrt(n_test) + 1 # Build baby-step table to measure memory g_test = Mod(primitive_root(p_test), p_test) table = {} gj = g_test^0 for j in range(m_test): table[gj] = j gj *= g_test mem_kb = sys.getsizeof(table) / 1024 print(f"{bits} {n_test} {m_test} {mem_kb:>16.1f}")

Checkpoint 3. If you have 8 GB of RAM, roughly how many bits can your BSGS table handle? (Assume each entry takes about 100 bytes.)

7. Exercises

Exercise 1 (Worked): BSGS by Hand

Problem. Solve 5x20(mod23)5^x \equiv 20 \pmod{23} using BSGS.

Solution. n=22n = 22, m=22=5m = \lceil\sqrt{22}\rceil = 5.

Baby steps (gjmod23g^j \bmod 23 for j=0,,4j = 0, \ldots, 4):

jj5jmod235^j \bmod 23
01
15
22
310
44

Giant steps: gm=55mod23g^{-m} = 5^{-5} \bmod 23. Since 55=3125312513523=31253105=20(mod23)5^5 = 3125 \equiv 3125 - 135 \cdot 23 = 3125 - 3105 = 20 \pmod{23}, so gm=201mod23g^{-m} = 20^{-1} \bmod 23.

Let's compute with SageMath.

# Exercise 1, verification p = 23; g = Mod(5, p); h = Mod(20, p) n = 22; m = 5 # Baby steps print("Baby steps:") baby = {} gj = Mod(1, p) for j in range(m): print(f" g^{j} = {int(gj)}") baby[gj] = j gj *= g # Giant steps print(f"\ng^(-m) = g^(-{m}) = {int(g^(-m))}") g_inv_m = g^(-m) gamma = h print("\nGiant steps:") for i in range(m): in_table = gamma in baby print(f" i={i}: gamma = {int(gamma)} {'<-- MATCH j=' + str(baby[gamma]) if in_table else ''}") if in_table: x = i * m + baby[gamma] print(f"\n x = {i}*{m} + {baby[gamma]} = {x}") print(f" Check: 5^{x} mod 23 = {int(g^x)}") break gamma *= g_inv_m

Exercise 2 (Guided): BSGS with a Larger Group

Problem. Use your baby_step_giant_step function to solve the DLP in Z/pZ\mathbb{Z}/p\mathbb{Z}^* for p=10007p = 10007.

  1. Pick a random secret xx and compute h=gxh = g^x.

  2. Solve using BSGS and verify.

  3. Time it and compare with discrete_log().

Hint: The step size should be m100m \approx 100.

# Exercise 2, fill in the TODOs p = 10007 g = Mod(primitive_root(p), p) x_secret = randint(2, p - 2) h = g^x_secret # TODO 1: Solve with your BSGS # start = walltime() # x_bsgs = baby_step_giant_step(g, h, p - 1) # t_bsgs = (walltime() - start) * 1000 # print(f"BSGS: x = {x_bsgs}, time = {t_bsgs:.2f} ms") # TODO 2: Solve with SageMath's discrete_log # start = walltime() # x_sage = discrete_log(h, g) # t_sage = (walltime() - start) * 1000 # print(f"Sage: x = {x_sage}, time = {t_sage:.2f} ms") # TODO 3: Verify both are correct # print(f"Secret: {x_secret}, BSGS correct? {x_bsgs == x_secret}")

Exercise 3 (Independent): Generalised BSGS

Problem.

  1. Modify baby_step_giant_step so that it works even when the group order nn is not exactly known, instead, accept an upper bound NnN \ge n and use m=Nm = \lceil \sqrt{N} \rceil.

  2. Test your modified version on Z/pZ\mathbb{Z}/p\mathbb{Z}^* where you pass N=pN = p (which is slightly larger than n=p1n = p - 1).

  3. What is the overhead of using an upper bound instead of the exact order?

# Exercise 3, write your solution here

Summary

ConceptKey Fact
Key ideaWrite x=im+jx = im + j with m=nm = \lceil\sqrt{n}\rceil; precompute baby steps, search with giant steps
TimeO(n)O(\sqrt{n}) group operations
SpaceO(n)O(\sqrt{n}) hash table entries
GenericWorks for any group, no special structure needed
LimitationMemory-bound: infeasible for n>260n > 2^{60} or so
Crypto impactSets the baseline: DLP in a group of order nn can always be solved in O(n)O(\sqrt{n})

BSGS is generic, it works for any group. But if the group order has special structure (many small prime factors), we can do even better. The next notebook introduces the Pohlig-Hellman algorithm, which exploits smooth-order groups.


Next: 05f. Pohlig-Hellman Algorithm