GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/05-discrete-log-diffie-hellman/sage/05e-baby-step-giant-step.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.5

Notebook 05e: Baby-Step Giant-Step

Module 05. The Discrete Logarithm and Diffie-Hellman

Motivating Question. Brute-force DLP takes $O(n)$ time where $n = |G|$ . Can we do better without exploiting any special structure of the group? Shanks' baby-step giant-step (BSGS) algorithm answers yes: it solves any DLP in $O(\sqrt{n})$ time and $O(\sqrt{n})$ space. This is the classic time-space tradeoff in cryptanalysis.

Prerequisites. You should be comfortable with:

The discrete logarithm problem (notebook 05a)
Modular arithmetic (Module 01)

Learning objectives. By the end of this notebook you will be able to:

Explain the key idea behind BSGS: decompose $x = im + j$ .
Build the baby-step lookup table and perform giant steps.
Implement BSGS from scratch and verify correctness.
Analyse the $O(\sqrt{n})$ time and space complexity.
Compare BSGS performance to brute force experimentally.

1. The Key Idea

We want to find $x$ such that $g^x = h$ in a group of order $n$ .

Set $m = \lceil \sqrt{n} \rceil$ . Write $x = im + j$ where $0 \le j < m$ and $0 \le i < m$ .

Then: $g^x = g^{im + j} = h \quad \Longleftrightarrow \quad g^j = h \cdot g^{-im} = h \cdot (g^{-m})^i$

Strategy:

Baby steps: Precompute and store $g^j$ for $j = 0, 1, \ldots, m-1$ in a hash table.
Giant steps: For $i = 0, 1, \ldots, m-1$ , compute $h \cdot (g^{-m})^i$ and check if it appears in the table.
If $h \cdot (g^{-m})^i = g^j$ , then $x = im + j$ .

Phase	Operations	Storage
Baby steps	$m$ multiplications	$m$ table entries
Giant steps	up to $m$ multiplications + lookups	0 extra
Total	$O(\sqrt{n})$	$O(\sqrt{n})$

In [ ]:

# Visualise the decomposition x = i*m + j
n = 100   # group order
m = isqrt(n) + 1   # = 11
print(f"Group order n = {n}")
print(f"Step size m = ceil(sqrt({n})) = {m}")
print()

# Show the grid: every x in [0, n) maps to a unique (i, j)
print(f"x = i * {m} + j")
for x in [0, 1, 10, 11, 42, 99]:
    i, j = x // m, x % m
    print(f"{x} = {i} * {m} + {j}")

Checkpoint 1. If $n = 10000$ , what is $m$ ? How many baby steps and (at most) how many giant steps are needed? Compare with brute force's 10000 steps.

2. Worked Example: $p = 101$

Let us trace BSGS step by step for $p = 101$ , $g = 2$ (primitive root), and a target $h = g^{42}$ .

In [ ]:

# Setup
p = 101
n = p - 1   # = 100
g = Mod(primitive_root(p), p)
x_secret = 42
h = g^x_secret
m = isqrt(n) + 1   # = 11

print(f"p = {p}, g = {int(g)}, h = g^{x_secret} = {int(h)}")
print(f"Group order n = {n}, step size m = {m}")

# Phase 1: Baby steps, build table {g^j : j}
print(f"\n=== Phase 1: Baby Steps ===")
baby = {}
gj = Mod(1, p)   # g^0
for j in range(m):
    baby[gj] = j
    print(f"  g^{j} = {int(gj)}  [stored]")
    gj *= g

print(f"\nBaby-step table: {len(baby)} entries")

In [ ]:

# Phase 2: Giant steps, check h * (g^{-m})^i
print("=== Phase 2: Giant Steps ===")
g_inv_m = g^(-m)   # precompute g^{-m}
print(f"g^(-m) = g^(-{m}) = {int(g_inv_m)}")
print()

gamma = h   # start with h * (g^{-m})^0 = h
found = False
for i in range(m):
    in_table = gamma in baby
    status = f"MATCH! j = {baby[gamma]}" if in_table else "not in table"
    print(f"  i={i}: h * g^(-{m}*{i}) = h * g^(-{m*i}) = {int(gamma)}  [{status}]")
    if in_table:
        j = baby[gamma]
        x = i * m + j
        print(f"\n  => x = i*m + j = {i}*{m} + {j} = {x}")
        print(f"  Verify: g^{x} = {int(g^x)}  (should be {int(h)})")
        found = True
        break
    gamma *= g_inv_m

if not found:
    print("No match found (shouldn't happen if g is a generator)")

Notice: we found $x = 42$ after only $m + i = 11 + 3 = 14$ total group operations, not 42.

3. Complete Implementation

Let us package BSGS into a clean function.

In [ ]:

def baby_step_giant_step(g, h, n):
    """
    Solve g^x = h in a group of order n using BSGS.
    Returns x, or None if no solution found.
    """
    m = isqrt(n) + 1
    
    # Baby steps: build table g^j -> j
    table = {}
    gj = g^0
    for j in range(m):
        table[gj] = j
        gj = gj * g
    
    # Giant steps: check h * (g^{-m})^i
    g_inv_m = g^(-m)
    gamma = h
    for i in range(m):
        if gamma in table:
            x = i * m + table[gamma]
            return x % n
        gamma = gamma * g_inv_m
    
    return None

# Test
p = 101
g = Mod(primitive_root(p), p)
for x_test in [0, 1, 42, 73, 99]:
    h = g^x_test
    x_found = baby_step_giant_step(g, h, p - 1)
    print(f"  x = {x_test}: BSGS returns {x_found}, correct? {x_found == x_test}")

Misconception alert. "BSGS needs to know the exact group order $n$ ." Technically, an upper bound suffices. If you use $m = \lceil \sqrt{N} \rceil$ for any $N \ge n$ , the algorithm still works (the table might be slightly larger than necessary, but correctness is preserved).

4. Performance: BSGS vs Brute Force

Let us compare the actual running times.

In [ ]:


def discrete_log_brute(g, h, n):
    """Brute-force discrete log: try all x in [0, n)."""
    power = g^0
    for x in range(n):
        if power == h:
            return x
        power = power * g
    return None

print("Bits n sqrt(n) Brute (ms) BSGS (ms) Speedup")
for bits in [10, 14, 18, 20, 22]:
    p_test = next_prime(2^bits)
    n_test = p_test - 1
    g_test = Mod(primitive_root(p_test), p_test)
    x_test = randint(2, p_test - 2)
    h_test = g_test^x_test
    
    # Brute force
    start = walltime()
    x_brute = discrete_log_brute(g_test, h_test, n_test)
    t_brute = (walltime() - start) * 1000
    
    # BSGS
    start = walltime()
    x_bsgs = baby_step_giant_step(g_test, h_test, n_test)
    t_bsgs = (walltime() - start) * 1000
    
    assert x_brute == x_bsgs, f"Mismatch: brute={x_brute}, bsgs={x_bsgs}"
    speedup = t_brute / t_bsgs if t_bsgs > 0 else float('inf')
    print(f"{bits} {n_test} {isqrt(n_test)} {t_brute:>12.1f} {t_bsgs:>12.1f} {speedup:>8.1f}x")

Checkpoint 2. At 22 bits, how does the speedup compare to $\sqrt{n}/1 \approx \sqrt{2^{22}} \approx 2048$ ? Why might the actual speedup differ from the theoretical ratio?

5. Visualising Baby and Giant Steps

Let us visualise how the two sets of steps "meet" to find the answer.

In [ ]:

p = 47
n = p - 1
g = Mod(primitive_root(p), p)
x_secret = 31
h = g^x_secret
m = isqrt(n) + 1

# Compute baby steps
baby_vals = []
gj = g^0
for j in range(m):
    baby_vals.append((j, int(gj)))
    gj *= g

# Compute giant steps
giant_vals = []
g_inv_m = g^(-m)
gamma = h
match_i, match_j = None, None
baby_dict = {v: j for j, v in baby_vals}
for i in range(m):
    giant_vals.append((i, int(gamma)))
    if int(gamma) in baby_dict.values():
        # check if gamma matches any baby step
        for j_val, g_val in baby_vals:
            if g_val == int(gamma) and match_i is None:
                match_i, match_j = i, j_val
    gamma *= g_inv_m

G = Graphics()

# Plot baby steps
baby_j = [j for j, v in baby_vals]
baby_v = [v for j, v in baby_vals]
G += point(list(zip(baby_j, baby_v)), color='blue', size=40,
           legend_label='Baby steps $g^j$', zorder=3)

# Plot giant steps (offset x by m+1 to separate visually)
giant_i = [i for i, v in giant_vals]
giant_v = [v for i, v in giant_vals]
G += point([(i + m + 1, v) for i, v in zip(giant_i, giant_v)],
           color='red', size=40, marker='s',
           legend_label='Giant steps $h \\cdot g^{-im}$', zorder=3)

# Mark the match
if match_i is not None:
    match_val = int(g^match_j)
    G += line([(-1, match_val), (2*m + 2, match_val)], color='green',
              linestyle='--', alpha=0.5,
              legend_label=f'Match at value {match_val}')
    G += text(f'x = {match_i}*{m}+{match_j} = {match_i*m+match_j}',
              (match_j + 1, match_val + 2), fontsize=10, color='green')

G.show(figsize=[12, 5],
       axes_labels=['Step index', 'Group element value'],
       title=f'BSGS for $g^x = h$ in $\\mathbb{{Z}}/{p}\\mathbb{{Z}}^*$ (x={x_secret})')

6. Memory vs Time

BSGS uses $O(\sqrt{n})$ memory for the hash table. For a 128-bit group, that is $\sqrt{2^{128}} = 2^{64}$ entries, about 128 exabytes. This is infeasible!

Group size	$\sqrt{n}$	Memory (16 bytes/entry)
$2^{32}$	$2^{16}$ = 65,536	1 MB
$2^{64}$	$2^{32}$ = 4 billion	64 GB
$2^{128}$	$2^{64}$	256 EB (infeasible)
$2^{256}$	$2^{128}$	absurdly large

For cryptographic sizes, BSGS alone is not enough. More advanced methods like Pollard's rho algorithm achieve $O(\sqrt{n})$ time with $O(1)$ space, and index calculus methods achieve sub-exponential time for $\mathbb{Z}/p\mathbb{Z}^*$ .

Bridge to Module 06. For elliptic curve groups, the best known generic algorithm is still $O(\sqrt{n})$ (Pollard's rho). There is no index calculus for general elliptic curves! This is why ECC can use much smaller keys than $\mathbb{Z}/p\mathbb{Z}^*$ -based crypto: 256-bit ECC $\approx$ 3072-bit DH.

In [ ]:

# BSGS on progressively larger groups (measure memory)
import sys

print("Bits n m=sqrt(n) Table size (KB)")for bits in [10, 16, 20, 24, 28]:
    p_test = next_prime(2^bits)
    n_test = p_test - 1
    m_test = isqrt(n_test) + 1
    
    # Build baby-step table to measure memory
    g_test = Mod(primitive_root(p_test), p_test)
    table = {}
    gj = g_test^0
    for j in range(m_test):
        table[gj] = j
        gj *= g_test
    
    mem_kb = sys.getsizeof(table) / 1024
    print(f"{bits} {n_test} {m_test} {mem_kb:>16.1f}")

Checkpoint 3. If you have 8 GB of RAM, roughly how many bits can your BSGS table handle? (Assume each entry takes about 100 bytes.)

7. Exercises

Exercise 1 (Worked): BSGS by Hand

Problem. Solve $5^x \equiv 20 \pmod{23}$ using BSGS.

Solution. $n = 22$ , $m = \lceil\sqrt{22}\rceil = 5$ .

Baby steps ( $g^j \bmod 23$ for $j = 0, \ldots, 4$ ):

$j$	$5^j \bmod 23$
0	1
1	5
2	2
3	10
4	4

Giant steps: $g^{-m} = 5^{-5} \bmod 23$ . Since $5^5 = 3125 \equiv 3125 - 135 \cdot 23 = 3125 - 3105 = 20 \pmod{23}$ , so $g^{-m} = 20^{-1} \bmod 23$ .

Let's compute with SageMath.

In [ ]:

# Exercise 1, verification
p = 23; g = Mod(5, p); h = Mod(20, p)
n = 22; m = 5

# Baby steps
print("Baby steps:")
baby = {}
gj = Mod(1, p)
for j in range(m):
    print(f"  g^{j} = {int(gj)}")
    baby[gj] = j
    gj *= g

# Giant steps
print(f"\ng^(-m) = g^(-{m}) = {int(g^(-m))}")
g_inv_m = g^(-m)
gamma = h
print("\nGiant steps:")
for i in range(m):
    in_table = gamma in baby
    print(f"  i={i}: gamma = {int(gamma)}  {'<-- MATCH j=' + str(baby[gamma]) if in_table else ''}")
    if in_table:
        x = i * m + baby[gamma]
        print(f"\n  x = {i}*{m} + {baby[gamma]} = {x}")
        print(f"  Check: 5^{x} mod 23 = {int(g^x)}")
        break
    gamma *= g_inv_m

Exercise 2 (Guided): BSGS with a Larger Group

Problem. Use your baby_step_giant_step function to solve the DLP in $\mathbb{Z}/p\mathbb{Z}^*$ for $p = 10007$ .

Pick a random secret $x$ and compute $h = g^x$ .
Solve using BSGS and verify.
Time it and compare with discrete_log().

Hint: The step size should be $m \approx 100$ .

In [ ]:

# Exercise 2, fill in the TODOs

p = 10007
g = Mod(primitive_root(p), p)
x_secret = randint(2, p - 2)
h = g^x_secret

# TODO 1: Solve with your BSGS
# start = walltime()
# x_bsgs = baby_step_giant_step(g, h, p - 1)
# t_bsgs = (walltime() - start) * 1000
# print(f"BSGS: x = {x_bsgs}, time = {t_bsgs:.2f} ms")

# TODO 2: Solve with SageMath's discrete_log
# start = walltime()
# x_sage = discrete_log(h, g)
# t_sage = (walltime() - start) * 1000
# print(f"Sage: x = {x_sage}, time = {t_sage:.2f} ms")

# TODO 3: Verify both are correct
# print(f"Secret: {x_secret}, BSGS correct? {x_bsgs == x_secret}")

Exercise 3 (Independent): Generalised BSGS

Problem.

Modify baby_step_giant_step so that it works even when the group order $n$ is not exactly known, instead, accept an upper bound $N \ge n$ and use $m = \lceil \sqrt{N} \rceil$ .
Test your modified version on $\mathbb{Z}/p\mathbb{Z}^*$ where you pass $N = p$ (which is slightly larger than $n = p - 1$ ).
What is the overhead of using an upper bound instead of the exact order?

In [ ]:

# Exercise 3, write your solution here

Summary

Concept	Key Fact
Key idea	Write $x = im + j$ with $m = \lceil\sqrt{n}\rceil$ ; precompute baby steps, search with giant steps
Time	$O(\sqrt{n})$ group operations
Space	$O(\sqrt{n})$ hash table entries
Generic	Works for any group, no special structure needed
Limitation	Memory-bound: infeasible for $n > 2^{60}$ or so
Crypto impact	Sets the baseline: DLP in a group of order $n$ can always be solved in $O(\sqrt{n})$

BSGS is generic, it works for any group. But if the group order has special structure (many small prime factors), we can do even better. The next notebook introduces the Pohlig-Hellman algorithm, which exploits smooth-order groups.

Next: 05f. Pohlig-Hellman Algorithm

Notebook 05e: Baby-Step Giant-Step

1. The Key Idea

2. Worked Example: $p = 101$

3. Complete Implementation

4. Performance: BSGS vs Brute Force

5. Visualising Baby and Giant Steps

6. Memory vs Time

7. Exercises

Exercise 1 (Worked): BSGS by Hand

Exercise 2 (Guided): BSGS with a Larger Group

Exercise 3 (Independent): Generalised BSGS

Summary

Product

Resources

Company

Notebook 05e: Baby-Step Giant-Step

1. The Key Idea

2. Worked Example: p=101p = 101p=101

3. Complete Implementation

4. Performance: BSGS vs Brute Force

5. Visualising Baby and Giant Steps

6. Memory vs Time

7. Exercises

Exercise 1 (Worked): BSGS by Hand

Exercise 2 (Guided): BSGS with a Larger Group

Exercise 3 (Independent): Generalised BSGS

Summary

2. Worked Example: $p = 101$