GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/09-commitments-sigma-protocols/sage/09b-pedersen-commitments.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.5

Pedersen Commitments

Module 09b | Commitments and Sigma Protocols

Can you commit to a secret so that even an all-powerful adversary learns nothing about it?

Motivating Question: In 09a we saw hash-based commitments: $C = H(m \| r)$ . They are computationally hiding, an efficient adversary can't determine $m$ from $C$ . But an adversary with unlimited computing power could brute-force every possible $(m', r')$ and find which message was committed. Can we do better?
Pedersen commitments can. They achieve perfect hiding: even with infinite computational power, the commitment reveals absolutely zero information about $m$ . The price? Binding becomes computational rather than perfect, it relies on the hardness of the discrete logarithm problem (Module 05).

Objectives

By the end of this notebook you will be able to:

Set up Pedersen commitment parameters $(p, q, g, h)$ and explain why $\log_g(h)$ must be unknown
Commit to a message $m$ with randomness $r$ and verify the opening
Prove that Pedersen commitments are perfectly hiding by finding alternative openings
Explain why breaking binding requires solving the DLP
Use the homomorphic property to add commitments without opening them
Extend to vector Pedersen commitments for committing to multiple values at once

Prerequisites

Completion of 09a: Commitment Schemes (hiding and binding properties)
Familiarity with the discrete logarithm problem (05a)
Modular exponentiation in prime-order groups (01d)

The Pedersen Setup

The Pedersen commitment scheme works in a prime-order subgroup of $\mathbb{Z}_p^*$ . The setup requires:

A large prime $p$ such that $p = 2q + 1$ for another prime $q$ (a safe prime)
Two generators $g$ and $h$ of the subgroup of order $q$
Crucially: nobody knows $\log_g(h)$ , the discrete log of $h$ with respect to $g$

Why must $\log_g(h)$ be unknown? We'll see shortly that knowing it would let you break binding.

For pedagogical clarity, we'll start with small primes. In practice, $q$ would be at least 256 bits.

In [ ]:

# === Pedersen Setup ===
# We need a safe prime p = 2q + 1 so that we have a subgroup of prime order q.

# Step 1: Find a safe prime
def find_safe_prime(bits=32):
    """Find a safe prime p = 2q + 1 where q is also prime."""
    while True:
        q = random_prime(2^bits, lbound=2^(bits-1))
        p = 2*q + 1
        if is_prime(p):
            return p, q

p, q = find_safe_prime(32)
print(f'Safe prime: p = {p}')
print(f'Sophie Germain prime: q = {q}')
print(f'Subgroup order: q = {q}')
print(f'Check: p = 2q + 1? {p == 2*q + 1}')

# Step 2: Find generators of the order-q subgroup
# Any element a where a^q = 1 and a != 1 is a generator of the order-q subgroup
def find_generator(p, q):
    """Find a generator of the order-q subgroup of Z_p*."""
    while True:
        a = Mod(randint(2, p-2), p)
        g = a^2  # squaring maps into the order-q subgroup
        if g != 1:
            return g

g = find_generator(p, q)
h = find_generator(p, q)  # independent generator, nobody knows log_g(h)

print(f'\nGenerator g = {g}')
print(f'Generator h = {h}')
print(f'Verify: g^q = {g^q}  (should be 1)')
print(f'Verify: h^q = {h^q}  (should be 1)')

Checkpoint: Why do we compute g = a^2 to get a generator of the order- $q$ subgroup? Think about it: $\mathbb{Z}_p^*$ has order $p - 1 = 2q$ . Squaring any element sends it into the unique subgroup of order $q$ (the quadratic residues). As long as $g \neq 1$ , it generates this entire subgroup because $q$ is prime.

Commit and Reveal

The Pedersen commitment to message $m \in \mathbb{Z}_q$ with randomness $r \in \mathbb{Z}_q$ is:

C = g^m \cdot h^r \mod p

Commit phase: Choose random $r \leftarrow \mathbb{Z}_q$ , compute $C$ , send $C$ to verifier
Reveal phase: Send $(m, r)$ to verifier, who checks $C \stackrel{?}{=} g^m \cdot h^r$

In [ ]:

# === Pedersen Commit and Reveal ===

def pedersen_commit(m, r, g, h, p):
    """Compute Pedersen commitment C = g^m * h^r mod p."""
    return (g^m * h^r)

def pedersen_verify(C, m, r, g, h, p):
    """Verify that C opens to (m, r)."""
    return C == g^m * h^r

# Alice commits to the message m = 42
m = 42
r = randint(1, int(q) - 1)  # random blinding factor

C = pedersen_commit(m, r, g, h, p)
print(f'Message:     m = {m}')
print(f'Randomness:  r = {r}')
print(f'Commitment:  C = {C}')

# Later, Alice reveals (m, r) and Bob verifies
valid = pedersen_verify(C, m, r, g, h, p)
print(f'\nVerification: {valid}')

# Trying to open with a wrong message fails
wrong_m = 99
valid_wrong = pedersen_verify(C, wrong_m, r, g, h, p)
print(f'Wrong message verification: {valid_wrong}')

Perfect Hiding

Here is the key insight that makes Pedersen commitments special.

Claim: For any commitment $C$ and any target message $m'$ , there exists exactly one $r'$ such that $C = g^{m'} \cdot h^{r'}$ .

Why? If $C = g^m \cdot h^r$ , then we need $g^m \cdot h^r = g^{m'} \cdot h^{r'}$ , which gives $h^{r' - r} = g^{m - m'}$ . Setting $r' = r + (m - m') \cdot \log_g(h)^{-1}$ ... but wait, we don't know $\log_g(h)$ ! The point is that such an $r'$ exists mathematically.

This means the commitment $C$ is consistent with every possible message. An adversary looking at $C$ gains literally zero information about $m$ . This is information-theoretic hiding, it holds even against computationally unbounded adversaries.

Let's verify this concretely. Since we're in a teaching setting, we can compute $\log_g(h)$ for our small parameters and demonstrate the alternative opening.

In [ ]:

# === Demonstrating Perfect Hiding ===
# For our small group, we CAN compute log_g(h) to show alternative openings exist.

# Compute the discrete log (only feasible for small parameters!)
log_g_h = discrete_log(h, g)
print(f'log_g(h) = {log_g_h}  (we cheat and compute this for demonstration)')
print(f'Verify: g^{log_g_h} = {g^log_g_h} = h = {h}? {g^log_g_h == h}')
print()

# Original commitment
print(f'Original: C = g^{m} * h^{r}')
print(f'C = {C}')
print()

# For any alternative message m', find r' such that C = g^m' * h^r'
# We need: g^m * h^r = g^m' * h^r'
# So: h^(r'-r) = g^(m-m'), meaning r' = r + (m - m') * log_g(h)^(-1) mod q
for m_prime in [0, 1, 100, 9999]:
    # r' = r + (m - m') * (log_g_h)^(-1) mod q
    r_prime = Mod(r + (m - m_prime) * inverse_mod(log_g_h, q), q)
    C_check = g^(m_prime) * h^(int(r_prime))
    print(f"  m' = {m_prime}, r' = {r_prime} => C = {C_check}  match: {C_check == C}")

print()
print('Every message is equally consistent with C. PERFECT hiding!')

Checkpoint: We just showed that commitment $C$ can be "opened" to $m' = 0, 1, 100, 9999$ (and any other value). Convince yourself: why does this prove that seeing $C$ gives an adversary no information about $m$ ? The key is that the distribution of $C$ is identical regardless of which $m$ was committed, because $r$ was chosen uniformly at random.

Computational Binding

If Pedersen commitments are perfectly hiding, how can they be binding at all?

Common mistake: "If the commitment is perfectly hiding, how can it be binding at all?" The resolution: binding is computational, not information-theoretic. An adversary with unlimited computation could break binding. But an efficient adversary cannot, because doing so requires solving the DLP.

Why breaking binding implies solving the DLP:

Suppose a cheating committer finds two openings $(m_1, r_1)$ and $(m_2, r_2)$ with $m_1 \neq m_2$ for the same commitment $C$ :

g^{m_1} \cdot h^{r_1} = g^{m_2} \cdot h^{r_2}

Then $g^{m_1 - m_2} = h^{r_2 - r_1}$ , which gives:

\log_g(h) = \frac{m_1 - m_2}{r_2 - r_1} \mod q

So breaking binding directly reveals $\log_g(h)$ , it solves the DLP!

In [ ]:

# === Breaking Binding = Solving the DLP ===
# If someone finds two valid openings, they've solved the DLP.

# Suppose we somehow have two openings for the same C:
m1, r1 = m, r  # original opening

# Use the alternative opening we computed above (m'=100)
m2 = 100
r2 = int(Mod(r + (m - m2) * inverse_mod(log_g_h, q), q))

# Verify both open to the same C
print(f'Opening 1: m={m1}, r={r1} => C = {pedersen_commit(m1, r1, g, h, p)}')
print(f'Opening 2: m={m2}, r={r2} => C = {pedersen_commit(m2, r2, g, h, p)}')
print(f'Same C? {pedersen_commit(m1, r1, g, h, p) == pedersen_commit(m2, r2, g, h, p)}')
print()

# Extract log_g(h) from the two openings
recovered_log = Mod((m1 - m2) * inverse_mod(int(Mod(r2 - r1, q)), q), q)
print(f'Recovered log_g(h) = {recovered_log}')
print(f'Actual log_g(h)    = {log_g_h}')
print(f'Match: {int(recovered_log) == int(Mod(log_g_h, q))}')
print()
print('Breaking binding <=> Solving the DLP!')

The Hiding-Binding Duality

In 09a we learned that commitments have two security properties: hiding and binding. Here is a fundamental result:

Scheme	Hiding	Binding
Hash commitment $H(m \| r)$	Computational	Perfectly binding (collision-free)
Pedersen $g^m h^r$	Perfect	Computational (DLP-hard)

A deep theorem states: no commitment scheme can be both perfectly hiding AND perfectly binding. Intuitively:

Perfect hiding means every $C$ is consistent with every $m$ , so alternative openings exist
Perfect binding means only one valid opening exists, so $C$ leaks some information

You must choose one to be perfect and the other computational. Pedersen chooses perfect hiding because in many cryptographic protocols (zero-knowledge proofs, confidential transactions), the prover needs the strongest possible privacy guarantee.

The Homomorphic Property

This is the feature that makes Pedersen commitments a cornerstone of modern cryptography.

Given two commitments:

$C_1 = g^{m_1} \cdot h^{r_1}$
$C_2 = g^{m_2} \cdot h^{r_2}$

Their product is: $C_1 \cdot C_2 = g^{m_1} h^{r_1} \cdot g^{m_2} h^{r_2} = g^{m_1 + m_2} \cdot h^{r_1 + r_2} = \text{Commit}(m_1 + m_2,\; r_1 + r_2)$

Commitments can be added without opening them! Nobody needs to know $m_1$ or $m_2$ individually, the product of the commitments is a valid commitment to the sum of the messages.

In [ ]:

# === Homomorphic Addition of Pedersen Commitments ===

# Alice commits to m1, Bob commits to m2
m1 = 17
r1 = randint(1, int(q) - 1)
C1 = pedersen_commit(m1, r1, g, h, p)

m2 = 25
r2 = randint(1, int(q) - 1)
C2 = pedersen_commit(m2, r2, g, h, p)

print(f'C1 = Commit({m1}, r1) = {C1}')
print(f'C2 = Commit({m2}, r2) = {C2}')
print()

# Multiply the commitments
C_product = C1 * C2

# This should equal Commit(m1 + m2, r1 + r2)
C_sum = pedersen_commit(m1 + m2, (r1 + r2) % int(q), g, h, p)

print(f'C1 * C2           = {C_product}')
print(f'Commit({m1}+{m2}, r1+r2) = {C_sum}')
print(f'Equal? {C_product == C_sum}')
print()
print(f'We computed a commitment to {m1} + {m2} = {m1+m2} WITHOUT knowing the individual values!')

Checkpoint: What happens if you raise a commitment to a scalar $k$ ? We get $C^k = (g^m h^r)^k = g^{km} \cdot h^{kr} = \text{Commit}(km, kr)$ . Pedersen commitments are homomorphic under scalar multiplication too! Together with addition, this makes them linearly homomorphic.

In [ ]:

# === Scalar Multiplication of Commitments ===

k = 3
C_scaled = C1^k
C_direct = pedersen_commit(k * m1, (k * r1) % int(q), g, h, p)

print(f'C1^{k}               = {C_scaled}')
print(f'Commit({k}*{m1}, {k}*r1) = {C_direct}')
print(f'Equal? {C_scaled == C_direct}')
print()

# Linear combination: 3*m1 + 5*m2
a, b = 3, 5
C_lincomb = C1^a * C2^b
C_lincomb_direct = pedersen_commit(
    (a*m1 + b*m2) % int(q), 
    (a*r1 + b*r2) % int(q), 
    g, h, p
)
print(f'{a}*C1 + {b}*C2 (multiplicatively) = {C_lincomb}')
print(f'Commit({a}*{m1}+{b}*{m2}, ...) = {C_lincomb_direct}')
print(f'Equal? {C_lincomb == C_lincomb_direct}')
print(f'\nWe computed Commit({a*m1 + b*m2}) without knowing m1 or m2!')

Visualizing the Commitment Distribution

Let's verify perfect hiding visually. We commit to two different messages $m = 0$ and $m = 1$ many times (with fresh random $r$ each time) and plot the distribution of the resulting commitments. If hiding is perfect, the two distributions should be indistinguishable.

In [ ]:

# === Visualizing Perfect Hiding ===
# Commit to m=0 and m=1 many times and compare distributions

# Use a smaller group for visualization
p_vis = 2 * 227 + 1  # 455 = 2*227+1, and 227 is prime
assert is_prime(p_vis) and is_prime(227)
q_vis = 227

# Find generators of the order-227 subgroup
g_vis = Mod(2, p_vis)^2  # square to get into subgroup
while g_vis^q_vis != 1 or g_vis == 1:
    g_vis = Mod(randint(2, p_vis - 2), p_vis)^2

h_vis = Mod(randint(2, p_vis - 2), p_vis)^2
while h_vis^q_vis != 1 or h_vis == 1 or h_vis == g_vis:
    h_vis = Mod(randint(2, p_vis - 2), p_vis)^2

N = 500
commits_m0 = [int(g_vis^0 * h_vis^randint(0, q_vis - 1)) for _ in range(N)]
commits_m1 = [int(g_vis^1 * h_vis^randint(0, q_vis - 1)) for _ in range(N)]

# Plot as histograms
p1 = histogram(commits_m0, bins=50, color='blue', alpha=0.5, legend_label='m=0')
p2 = histogram(commits_m1, bins=50, color='red', alpha=0.5, legend_label='m=1')
show(p1 + p2, title='Commitment distributions for m=0 vs m=1 (should overlap)', figsize=(10, 4))

Both histograms should look uniformly distributed across the group, identical distributions regardless of the committed message. This is perfect hiding in action: the commitment value carries zero information about $m$ .

Vector Pedersen Commitments

What if you want to commit to multiple values $(m_1, m_2, \ldots, m_n)$ at once? The vector Pedersen commitment generalizes the scheme:

C = g_1^{m_1} \cdot g_2^{m_2} \cdot \ldots \cdot g_n^{m_n} \cdot h^r \mod p

where $g_1, g_2, \ldots, g_n, h$ are independent generators (nobody knows any discrete log relation between them).

This is much more efficient than making $n$ separate commitments: a single group element commits to an entire vector.

In [ ]:

# === Vector Pedersen Commitment ===

def find_n_generators(n, p, q):
    """Find n independent generators of the order-q subgroup."""
    generators = []
    seen = set()
    while len(generators) < n:
        candidate = Mod(randint(2, int(p) - 2), p)^2
        if candidate != 1 and candidate^q == 1 and int(candidate) not in seen:
            generators.append(candidate)
            seen.add(int(candidate))
    return generators

# Commit to a vector of 4 values
n_vals = 4
gens = find_n_generators(n_vals, p, q)  # g_1, ..., g_4
h_vec = find_generator(p, q)            # blinding generator

# Message vector and randomness
messages = [10, 20, 30, 40]
r_vec = randint(1, int(q) - 1)

# Compute vector commitment: C = g_1^m_1 * g_2^m_2 * ... * g_n^m_n * h^r
C_vec = prod(gi^mi for gi, mi in zip(gens, messages)) * h_vec^r_vec

print(f'Message vector: {messages}')
print(f'Vector commitment: C = {C_vec}')

# Verify
C_verify = prod(gi^mi for gi, mi in zip(gens, messages)) * h_vec^r_vec
print(f'Verification: {C_verify == C_vec}')
print()

# The homomorphic property extends to vectors!
messages2 = [5, 10, 15, 20]
r_vec2 = randint(1, int(q) - 1)
C_vec2 = prod(gi^mi for gi, mi in zip(gens, messages2)) * h_vec^r_vec2

# Component-wise sum
msg_sum = [(m1 + m2) % int(q) for m1, m2 in zip(messages, messages2)]
r_sum = (r_vec + r_vec2) % int(q)
C_sum_vec = prod(gi^mi for gi, mi in zip(gens, msg_sum)) * h_vec^r_sum

print(f'Vector 1:  {messages}')
print(f'Vector 2:  {messages2}')
print(f'Sum:       {msg_sum}')
print(f'C1 * C2 = {C_vec * C_vec2}')
print(f'C(sum)  = {C_sum_vec}')
print(f'Homomorphic? {C_vec * C_vec2 == C_sum_vec}')

Exercises

Exercise 1 (Worked)

Task: Set up Pedersen parameters with the safe prime $p = 23$ (so $q = 11$ ). Find generators $g$ and $h$ of the order-11 subgroup. Commit to $m = 7$ with $r = 3$ . Then find an alternative opening $(m', r')$ with $m' = 2$ that produces the same commitment.

This exercise reinforces the perfect hiding property by working through a small concrete example where we can compute $\log_g(h)$ by hand.

In [ ]:

# Exercise 1. Worked Solution

# Step 1: Setup
p_ex = 23       # safe prime: 23 = 2*11 + 1
q_ex = 11       # subgroup order

# Find a generator: square a random element
g_ex = Mod(2, p_ex)^2  # 4
print(f'g = {g_ex}')
print(f'g^11 = {g_ex^11}  (should be 1)')

# List the order-q subgroup generated by g
subgroup = [g_ex^i for i in range(q_ex)]
print(f'Subgroup: {sorted([int(x) for x in subgroup])}')

# Pick h as another generator (different from g)
h_ex = Mod(3, p_ex)^2  # 9
print(f'h = {h_ex}')
print(f'h^11 = {h_ex^11}  (should be 1)')
print()

# Step 2: Commit to m=7 with r=3
m_ex = 7
r_ex = 3
C_ex = g_ex^m_ex * h_ex^r_ex
print(f'C = g^{m_ex} * h^{r_ex} = {g_ex^m_ex} * {h_ex^r_ex} = {C_ex}')
print()

# Step 3: Find log_g(h) by brute force (small group!)
log_g_h_ex = None
for i in range(q_ex):
    if g_ex^i == h_ex:
        log_g_h_ex = i
        break
print(f'log_g(h) = {log_g_h_ex}')
print(f'Check: g^{log_g_h_ex} = {g_ex^log_g_h_ex} = h? {g_ex^log_g_h_ex == h_ex}')
print()

# Step 4: Find r' such that C = g^2 * h^r' (i.e., m'=2)
m_prime = 2
# r' = r + (m - m') * log_g(h)^{-1} mod q
inv_log = inverse_mod(log_g_h_ex, q_ex)
r_prime = Mod(r_ex + (m_ex - m_prime) * inv_log, q_ex)
print(f"Alternative opening: m' = {m_prime}, r' = {r_prime}")

# Verify
C_check = g_ex^m_prime * h_ex^int(r_prime)
print(f"g^{m_prime} * h^{int(r_prime)} = {C_check}")
print(f'Same as original C = {C_ex}? {C_check == C_ex}')
print()
print('Both (m=7, r=3) and (m=2, r={}) open to the same commitment!'.format(int(r_prime)))
print('This is why Pedersen commitments are PERFECTLY hiding.')

Exercise 2 (Guided)

Task: Demonstrate the homomorphic property with a concrete application. Alice commits to her age $m_1$ and Bob commits to his age $m_2$ . Without either revealing their age, compute a commitment to the sum of their ages and verify it.

Use the parameters $p, q, g, h$ from the main setup above.

Hints:

Create two commitments $C_1$ and $C_2$ with different messages and randomness
Compute $C_{\text{sum}} = C_1 \cdot C_2$
Verify that $C_{\text{sum}}$ opens with message $m_1 + m_2$ and randomness $r_1 + r_2 \mod q$

In [ ]:

# Exercise 2. Guided (fill in the TODOs)

# Alice's age (secret)
alice_age = 28
r_alice = randint(1, int(q) - 1)
C_alice = pedersen_commit(alice_age, r_alice, g, h, p)
print(f'Alice publishes her commitment: C_alice = {C_alice}')

# Bob's age (secret)
bob_age = 34
r_bob = randint(1, int(q) - 1)
C_bob = pedersen_commit(bob_age, r_bob, g, h, p)
print(f'Bob publishes his commitment:   C_bob   = {C_bob}')
print()

# TODO: Compute the combined commitment C_sum = C_alice * C_bob
# C_sum = ???

# TODO: What message and randomness does C_sum commit to?
# combined_age = ???
# combined_r = ???

# TODO: Verify by computing Commit(combined_age, combined_r) directly
# C_direct = pedersen_commit(???, ???, g, h, p)

# TODO: Print whether C_sum == C_direct
# print(f'Homomorphic addition works? {???}')

Exercise 3 (Independent)

Task: Implement a "balance proof" system. A bank commits to each customer's balance using Pedersen commitments. Given $n = 5$ customer balances, show that the bank can prove the total deposits equal a claimed amount $T$ by:

Computing individual commitments $C_i = g^{b_i} h^{r_i}$ for each balance $b_i$
Computing the product $C_{\text{total}} = \prod C_i$ (homomorphic sum)
Opening $C_{\text{total}}$ with $m = \sum b_i$ and $r = \sum r_i \mod q$ to prove the total

Choose your own balances and verify the entire flow. Bonus: what happens if the bank tries to lie about the total? Show that the verification fails.

In [ ]:

# Exercise 3. Your code here

Summary

Concept	Key idea
Pedersen setup	Prime-order group with generators $g$ and $h$ where $\log_g(h)$ is unknown
Commitment	$C = g^m \cdot h^r$ , opened by revealing $(m, r)$
Perfectly hiding	For any $C$ and any $m'$ , there exists $r'$ with $C = g^{m'} h^{r'}$ , so the commitment leaks zero information, even to an unbounded adversary
Computationally binding	Finding two valid openings implies solving the DLP
Hiding-binding duality	You cannot have both perfect hiding and perfect binding. Pedersen trades perfect binding for perfect hiding (opposite of hash commitments)
Homomorphic	$\text{Commit}(m_1) \cdot \text{Commit}(m_2) = \text{Commit}(m_1 + m_2)$ , so commitments can be added without opening
Vector commitments	$C = g_1^{m_1} \cdots g_n^{m_n} \cdot h^r$ commits to an entire vector in one group element

Crypto foreshadowing: The homomorphic property of Pedersen commitments is the foundation of Bulletproofs range proofs and confidential transactions in Monero and Mimblewimble. When you send cryptocurrency in these systems, the transaction amounts are hidden inside Pedersen commitments, and miners can verify that inputs equal outputs without seeing any amounts, using exactly the homomorphic addition we demonstrated above. In 09c, we'll see how Pedersen commitments combine with sigma protocols to build zero-knowledge proofs.

Next: Sigma Protocols: Intuition