GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/09-commitments-sigma-protocols/sage/09a-commitment-schemes.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.5

Commitment Schemes

Module 09a | Commitments and Sigma Protocols

The cryptographic sealed envelope: bind yourself to a value without revealing it.

Motivating question: Alice and Bob want to play "odds and evens" over the internet. Each picks a number (say 0 or 1). If the sum is even, Alice wins; if odd, Bob wins. But here's the problem: if Alice announces first, Bob sees her choice and picks accordingly. If Bob goes first, Alice cheats the same way. Can they play fairly WITHOUT a trusted third party?

By the end of this notebook, you'll have a complete answer, and you'll see that the tool behind it, the commitment scheme, is one of the most fundamental primitives in modern cryptography.

Objectives

By the end of this notebook you will be able to:

Explain the sealed-envelope analogy and the two phases of a commitment (commit, reveal)
State the two security properties, hiding and binding, and explain why they matter
Implement a hash-based commitment scheme in SageMath
Understand the fundamental duality: why you cannot have both perfect hiding and perfect binding
Apply commitments to solve fair coin-flipping and other protocols

Prerequisites

You should have completed Modules 01–08 (modular arithmetic through pairings). This notebook requires:

Comfort with hash functions (conceptually)
Basic familiarity with SageMath / Python syntax

Bridge from earlier modules: In Modules 01–08, we built encryption (AES, RSA), key exchange (Diffie–Hellman, ECDH), and digital signatures (ECDSA). These primitives protect confidentiality and authenticity. Commitments are a different kind of primitive — they don't encrypt a message or prove who sent it. Instead, they let you bind yourself to a value without revealing it. This seemingly simple idea unlocks an entirely new world: zero-knowledge proofs, fair protocols, and verifiable computation.

The Sealed Envelope Analogy

A commitment scheme is the digital version of a sealed envelope:

Physical World	Cryptographic World
Write message on paper	Choose message $m$
Seal it in an opaque envelope	Compute commitment $C = \text{Commit}(m, r)$
Hand the sealed envelope to a friend	Send $C$ to the verifier
Later: open the envelope to reveal the message	Reveal $m$ and $r$ ; verifier checks $C \stackrel{?}{=} \text{Commit}(m, r)$

Two phases:

Commit phase: The committer locks in a value and sends the commitment. The value is hidden.
Reveal phase (opening): The committer reveals the original value and any randomness. The verifier checks consistency.

The scheme must satisfy two security properties:

Hiding: The sealed envelope is opaque — seeing $C$ reveals nothing about $m$
Binding: The sealed envelope is tamper-proof — the committer cannot open it to a different message $m' \neq m$

Hash-Based Commitment

The simplest commitment scheme uses a cryptographic hash function:

C = H(m \,\|\, r)

where $m$ is the message and $r$ is a random nonce (the "randomness" or "blinding factor").

Commit: Pick random $r$ , compute $C = H(m \,\|\, r)$ , send $C$
Reveal: Send $(m, r)$ . The verifier computes $H(m \,\|\, r)$ and checks it equals $C$

Why does this work?

Hiding comes from the random $r$ : even if $m$ is known to be one of a few values, the hash output looks uniformly random because $r$ adds entropy
Binding comes from collision resistance: the committer cannot find a different $(m', r')$ such that $H(m' \,\|\, r') = H(m \,\|\, r)$

Misconception alert: "A commitment scheme is just a hash." No! The hash is ONE implementation. The concept is more general — any scheme satisfying hiding + binding is a commitment. We'll see Pedersen commitments (next notebook) that use discrete logarithms instead of hashes, and they have very different security properties.

In [ ]:

import hashlib
import os

def hash_commit(message: bytes, randomness: bytes) -> str:
    """Commit to a message using SHA-256: C = H(m || r)"""
    return hashlib.sha256(message + randomness).hexdigest()

def hash_verify(commitment: str, message: bytes, randomness: bytes) -> bool:
    """Verify that (message, randomness) opens to the given commitment."""
    return hashlib.sha256(message + randomness).hexdigest() == commitment

# --- Commit phase ---
m = b'heads'                    # Alice's secret choice
r = os.urandom(32)              # 256 bits of randomness
C = hash_commit(m, r)
print(f'Message:    {m}')
print(f'Randomness: {r.hex()}')
print(f'Commitment: {C}')
print()

# --- Reveal phase ---
print(f'Verification: {hash_verify(C, m, r)}')
print(f'Tampered msg: {hash_verify(C, b"tails", r)}')

Checkpoint: Before running the next cell, predict: if Alice tries to open her commitment to b'tails' using a different random nonce $r'$ , will she succeed? Why or why not?

In [ ]:

# Can Alice cheat? She committed to 'heads' but wants to claim 'tails'
# She would need to find r' such that H('tails' || r') = C

# Let's try a brute-force search (with a small nonce space, for illustration)
cheating_attempts = 0
found_cheat = False

for trial in range(100000):
    r_fake = trial.to_bytes(32, 'big')
    if hash_commit(b'tails', r_fake) == C:
        print(f'Cheat found after {trial} attempts!')
        found_cheat = True
        break
    cheating_attempts += 1

if not found_cheat:
    print(f'No cheat found after {cheating_attempts} attempts.')
    print('Finding a collision in SHA-256 would take ~2^128 operations.')
    print('The commitment is BINDING: Alice cannot change her choice.')

Why Randomness Is Essential

What if we skip the randomness and just compute $C = H(m)$ ? This is a common beginner mistake. Let's see what goes wrong.

In [ ]:

# BAD commitment: no randomness
C_bad = hashlib.sha256(b'heads').hexdigest()
print(f'Commitment (no randomness): {C_bad}')
print()

# The verifier knows Alice chose either 'heads' or 'tails'.
# Can the verifier figure out which one?
candidates = [b'heads', b'tails']
for candidate in candidates:
    h = hashlib.sha256(candidate).hexdigest()
    match = '  <-- MATCH!' if h == C_bad else ''
    print(f'H({candidate.decode():5s}) = {h[:20]}...{match}')

print()
print('Without randomness, the verifier can test all possible messages.')
print('HIDING is completely broken when the message space is small!')

The Two Security Properties in Detail

Let's be precise about what hiding and binding mean:

Hiding

Given a commitment $C$ , an adversary cannot determine which message $m$ was committed to.

Computationally hiding: No efficient algorithm can distinguish commitments to different messages. (The hash commitment achieves this — as long as the adversary can't invert SHA-256.)
Perfectly hiding (information-theoretic): Even with unlimited computational power, the adversary learns nothing about $m$ from $C$ . (The hash commitment does NOT achieve this — there exists exactly one preimage.)

Binding

The committer cannot find two different messages that produce the same commitment.

Computationally binding: No efficient algorithm can find $m \neq m'$ with $\text{Commit}(m, r) = \text{Commit}(m', r')$ . (The hash commitment achieves this via collision resistance.)
Perfectly binding (information-theoretic): It is impossible (not just hard) to find a collision. (The hash commitment achieves this! For a fixed $C$ , there is at most one valid opening — well, technically the function is injective on its domain.)

So the hash commitment is: computationally hiding + perfectly binding.

The Fundamental Duality

Here is a deep and surprising fact:

Theorem (informal): No commitment scheme can be both perfectly hiding and perfectly binding.

Why? Suppose a scheme is perfectly hiding. Then for every commitment $C$ , every message $m$ in the message space could have produced $C$ (with some randomness $r$ ). But that means the committer can find an alternative opening — so the scheme cannot be perfectly binding!

Conversely, if a scheme is perfectly binding, then each $C$ has a unique opening, so an unbounded adversary could (in principle) find it — breaking perfect hiding.

This gives us two families:

Type	Hiding	Binding	Example
Hash commitment	Computational	Perfect (effectively)	$C = H(m \,\|\, r)$
Pedersen commitment	Perfect	Computational	$C = g^m h^r$ (next notebook)

The choice depends on your application. For zero-knowledge proofs, we usually want perfect hiding (the Pedersen style), because the prover's secret must remain information-theoretically hidden.

Crypto foreshadowing: Pedersen commitments (next notebook, 09b) achieve perfect hiding using the discrete log assumption. They are additively homomorphic — you can add committed values without opening them. This property is the foundation of Bulletproofs, confidential transactions in blockchains, and the commitment schemes inside SNARKs (Module 10).

In [ ]:

# Demonstrating the duality: a "toy" perfectly-hiding scheme
# In a perfectly hiding scheme, every commitment C can be opened to ANY message.
# Let's show this with a toy example using one-time pads.

# Toy scheme: Commit(m, r) = m XOR r, where r is uniform random
# This is perfectly hiding: C = m XOR r is uniform regardless of m
# But NOT binding: given C, the committer can claim ANY m' by setting r' = m' XOR C

import secrets

m_real = 42
r = secrets.randbelow(256)  # random byte
C = m_real ^^ r  # XOR commitment (^^ is XOR in SageMath)
print(f'Committed value: m = {m_real}')
print(f'Randomness:      r = {r}')
print(f'Commitment:      C = {C}')
print()

# Now the dishonest committer wants to open to m' = 99
m_fake = 99
r_fake = m_fake ^^ C  # compute r' so that m' XOR r' = C
print(f'Fake opening:    m\' = {m_fake}, r\' = {r_fake}')
print(f'Verification:    m\' XOR r\' = {m_fake ^^ r_fake} == C = {C}? {m_fake ^^ r_fake == C}')
print()
print('Perfectly hiding (C is uniform), but NOT binding (committer can cheat)!')
print('This illustrates the duality: you cannot have both.')

Commitment as a Protocol

Let's formalize the two-phase protocol with a complete interactive simulation. Alice commits, sends the commitment to Bob, and later reveals.

Phase 1 (Commit):    Alice  ---[ C ]---->  Bob     (Bob stores C)
Phase 2 (Reveal):    Alice  ---[ m, r ]--->  Bob     (Bob checks H(m||r) == C)

In [ ]:

import hashlib, os

class Committer:
    """Alice: commits to a value, later reveals it."""
    def commit(self, message: bytes):
        self._m = message
        self._r = os.urandom(32)
        self._C = hashlib.sha256(self._m + self._r).hexdigest()
        return self._C  # send only the commitment
    
    def reveal(self):
        return self._m, self._r  # open the envelope

class Verifier:
    """Bob: receives commitment, later verifies the opening."""
    def receive_commitment(self, C: str):
        self._C = C
        print(f'[Bob] Received commitment: {C[:32]}...')
    
    def verify(self, message: bytes, randomness: bytes) -> bool:
        C_check = hashlib.sha256(message + randomness).hexdigest()
        valid = (C_check == self._C)
        print(f'[Bob] Received opening: m = {message}')
        print(f'[Bob] Recomputed hash matches commitment? {valid}')
        return valid

# --- Run the protocol ---
alice = Committer()
bob = Verifier()

print('=== Phase 1: Commit ===')
C = alice.commit(b'I choose 7')
bob.receive_commitment(C)
print()

print('=== Phase 2: Reveal ===')
m, r = alice.reveal()
result = bob.verify(m, r)
print(f'\nProtocol outcome: {"ACCEPT" if result else "REJECT"}')

Application: Fair Coin Flipping

Now let's solve the motivating question! Alice and Bob can play odds-and-evens fairly using commitments:

Alice picks her bit $a \in \{0, 1\}$ and sends $C = \text{Commit}(a, r)$ to Bob
Bob picks his bit $b \in \{0, 1\}$ and sends $b$ to Alice (in the clear)
Alice reveals $(a, r)$ . Bob verifies the commitment.
The "coin flip" result is $a \oplus b$ (XOR)

Why is this fair?

Bob can't cheat: when he sends $b$ in step 2, he doesn't know $a$ (hiding)
Alice can't cheat: she committed to $a$ before seeing $b$ (binding)

In [ ]:

import hashlib, os, secrets

def commit(bit: int) -> tuple:
    """Returns (commitment, message_bytes, randomness)."""
    m = str(bit).encode()
    r = os.urandom(32)
    C = hashlib.sha256(m + r).hexdigest()
    return C, m, r

def verify(C: str, m: bytes, r: bytes) -> bool:
    return hashlib.sha256(m + r).hexdigest() == C

print('=== Fair Coin Flip Protocol ===\n')

# Step 1: Alice commits
alice_bit = secrets.randbelow(2)
C, m_alice, r_alice = commit(alice_bit)
print(f'Step 1: Alice commits (bit hidden). Sends C = {C[:24]}...')

# Step 2: Bob chooses his bit (he can't see Alice's)
bob_bit = secrets.randbelow(2)
print(f'Step 2: Bob chooses b = {bob_bit} and sends it to Alice.')

# Step 3: Alice reveals
print(f'Step 3: Alice reveals a = {alice_bit}, r = {r_alice.hex()[:16]}...')
valid = verify(C, m_alice, r_alice)
print(f'        Bob verifies: {"VALID" if valid else "INVALID (Alice cheated!)"}')

# Step 4: Result
result = alice_bit ^^ bob_bit
winner = 'Alice' if result == 0 else 'Bob'
print(f'\nResult: a XOR b = {alice_bit} XOR {bob_bit} = {result}')
print(f'Winner: {winner}!')
print(f'\nNeither party could cheat: hiding protected Bob, binding protected Alice.')

Checkpoint: Think about this: what if Alice refuses to reveal in step 3? The protocol stalls, but does anyone learn anything unfair? What about if the commitment were NOT hiding — could Bob cheat in step 2? Make sure you can answer both before proceeding.

Other Applications

Commitments appear everywhere in cryptography:

Application	How commitments help
Sealed-bid auctions	Each bidder commits to their bid. All commitments are collected, then all are opened simultaneously. No one can adjust their bid after seeing others'.
Zero-knowledge proofs	The prover commits to intermediate values. The verifier challenges. The commitment ensures the prover can't "backtrack."
Coin flipping	As we just saw — commitments enable fair randomness without trusted third parties.
Blockchain	Commit-reveal schemes prevent front-running in DeFi, enable private voting, and more.

The pattern is always the same: commit first, then interact, then reveal. The commitment creates a temporal barrier that prevents cheating.

Exercises

Exercise 1: Sealed-Bid Auction (Worked)

Three bidders each commit to a secret bid. After all commitments are collected, the bids are revealed and the highest bidder wins. Implement this protocol and verify all commitments.

In [ ]:

import hashlib, os

# --- Sealed-Bid Auction Protocol ---

# Phase 1: Each bidder commits to a secret bid
bidders = {
    'Alice': 150,
    'Bob':   220,
    'Carol': 180,
}

commitments = {}
secrets_store = {}  # In practice, each bidder keeps their own secrets

print('=== Phase 1: Commit ===\n')
for name, bid in bidders.items():
    m = str(bid).encode()
    r = os.urandom(32)
    C = hashlib.sha256(m + r).hexdigest()
    commitments[name] = C
    secrets_store[name] = (m, r)
    print(f'{name} commits: C = {C[:32]}...')

print('\nAll commitments collected. No one can change their bid now.\n')

# Phase 2: Reveal all bids and verify
print('=== Phase 2: Reveal ===\n')
verified_bids = {}
for name in bidders:
    m, r = secrets_store[name]
    C_check = hashlib.sha256(m + r).hexdigest()
    valid = (C_check == commitments[name])
    bid_value = int(m.decode())
    verified_bids[name] = bid_value
    status = 'VALID' if valid else 'INVALID'
    print(f'{name} reveals bid = {bid_value}  [{status}]')

# Determine winner
winner = max(verified_bids, key=verified_bids.get)
print(f'\nWinner: {winner} with bid {verified_bids[winner]}!')
print('\nKey insight: binding ensures no bidder could adjust after seeing others\' commitments.')

Exercise 2: Broken Commitment — What Goes Wrong? (Guided)

Alice uses a bad commitment scheme: she commits to her message WITHOUT randomness, i.e., $C = H(m)$ instead of $C = H(m \,\|\, r)$ .

Bob knows Alice's message is one of: 'yes', 'no', or 'maybe'.

Your task: Write code that lets Bob figure out Alice's message from the commitment alone.

Hints:

Bob can compute $H(m)$ for each candidate message
Compare each result to Alice's commitment
Which security property is broken? Hiding or binding?

In [ ]:

import hashlib

# Alice's (bad) commitment: no randomness!
alice_secret = b'maybe'
C_bad = hashlib.sha256(alice_secret).hexdigest()
print(f'Alice sends commitment: {C_bad[:32]}...')
print()

# Bob knows the message is one of these:
candidates = [b'yes', b'no', b'maybe']

# TODO: Bob's attack
# For each candidate message, compute its hash and compare to C_bad.
# Print which message Alice committed to.
#
# Your code here:
# for candidate in candidates:
#     h = hashlib.sha256(______).hexdigest()
#     if h == ______:
#         print(f'Bob discovered Alice\'s message: {______}')
#
# After finding the answer, write a comment explaining:
# Which property is broken? Why does randomness fix it?

Exercise 3: Commit-and-Reveal Voting (Independent)

Design and implement a simple yes/no vote among 5 voters using hash commitments. Your protocol should:

Have each voter commit to their vote (b'yes' or b'no') with proper randomness
Collect all commitments before any reveals
Verify each opening and tally the votes
Print the final result

Challenge questions (answer in comments):

What happens if a voter refuses to reveal? How would you handle this?
Does this scheme protect voter privacy after the reveal phase? Why or why not?
How might you modify this to keep votes private even after tallying? (Hint: think about homomorphic commitments — coming in 09b)

In [ ]:

import hashlib, os

# Exercise 3: Your implementation here
#
# Suggested structure:
# 1. Define voters and their secret votes
# 2. Commit phase: each voter commits
# 3. Reveal phase: each voter reveals, verify each commitment
# 4. Tally and announce result

Summary

Concept	Key idea
Two phases	Commit (seal the value), then Reveal (open and verify)
Hiding	The commitment reveals nothing about the message
Binding	The committer cannot change the message after committing
Hash commitment	$C = H(m \,\|\, r)$ is simple, computationally hiding, and perfectly binding
Randomness is essential	Without $r$ , an adversary can brute-force small message spaces and break hiding
Fundamental duality	You cannot have both perfect hiding and perfect binding, you must choose one to be computational
Applications	Fair coin flipping, sealed-bid auctions, zero-knowledge proofs, and more

Looking ahead: In the next notebook, we move from hash-based commitments to Pedersen commitments — an algebraic scheme that achieves perfect hiding using the discrete log assumption. Pedersen commitments are additively homomorphic: you can add two commitments and get a commitment to the sum, without opening either one. This property is the foundation of Bulletproofs, confidential transactions, and the commitment schemes inside modern zero-knowledge proof systems (SNARKs, STARKs).

Next: Pedersen Commitments