GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/10-snarks-starks/sage/10d-groth16-overview.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.5

Notebook 10d: Groth16 Overview

Module 10. SNARKs and STARKs

Motivating Question. We have the QAP equation $A(x) \cdot B(x) - C(x) = H(x) \cdot Z(x)$ . But the verifier can't check this directly without learning the witness. How do we evaluate polynomials "in the exponent", hiding the witness inside elliptic curve points while still allowing verification? Groth16 does this using pairings and a trusted setup, producing a proof that's just 3 curve points (< 200 bytes) no matter how big the circuit.

Prerequisites. You should be comfortable with:

QAP construction: polynomials, vanishing polynomial, $H(x)$ (Notebook 10c)
Bilinear pairings: $e(aP, bQ) = e(P, Q)^{ab}$ (Module 07)
Elliptic curve scalar multiplication (Module 06)

Learning objectives. By the end of this notebook you will be able to:

Understand the trusted setup ceremony and the "toxic waste" problem.
See how polynomial evaluations are encoded as elliptic curve points.
Walk through a simplified Groth16-style proof and verification.
Appreciate why the proof is constant-size and why the trusted setup is necessary.

1. The Idea: Evaluate in the Exponent

Bridge from Notebook 10c. The QAP check is: does $A(\tau) \cdot B(\tau) - C(\tau) = H(\tau) \cdot Z(\tau)$ for a secret $\tau$ ? Recall from 10c that the Schwartz-Zippel lemma guarantees this check at a single random point $\tau$ catches any cheating prover with overwhelming probability, so we only need one evaluation point. The problem: if the verifier learns $A(\tau)$ as a field element, the witness leaks. The solution: encode $A(\tau)$ as a curve point $[A(\tau)]_1 = A(\tau) \cdot G_1$ and use a pairing (Module 07) to check the multiplication without revealing the scalar:

e([A(\tau)]_1, [B(\tau)]_2) = e(G_1, G_2)^{A(\tau) \cdot B(\tau)}

This is exactly what Groth16 does. The secret $\tau$ is generated during a trusted setup and then destroyed (the "toxic waste").

Component	Plain QAP (10c)	Groth16
Evaluation point	Random $\tau$ known to verifier	$\tau$ hidden in CRS, then destroyed
Polynomial values	Field elements $A(\tau), B(\tau), \ldots$	Curve points $[A(\tau)]_1, [B(\tau)]_2$
Checking $A \cdot B$	Field multiplication	Pairing: $e([A]_1, [B]_2)$
Proof size	All polynomial values	3 curve points (constant!)
Zero-knowledge	None (witness is visible)	Yes (randomized proof)

In [ ]:

# Setup: we'll use a toy pairing-friendly curve
# (Same supersingular curve as Module 07)
p = 467
E = EllipticCurve(GF(p), [1, 0])  # y² = x³ + x
n = 13  # subgroup order
k = 2   # embedding degree
cofactor = E.cardinality() // n

F2 = GF(p^k, 'a')
E_ext = E.change_ring(F2)

# Generators
while True:
    G1 = cofactor * E.random_point()
    if G1 != E(0) and n * G1 == E(0):
        break

G1_ext = E_ext(G1)
cofactor_ext = E_ext.cardinality() // n
while True:
    G2 = cofactor_ext * E_ext.random_point()
    if G2 != E_ext(0) and n * G2 == E_ext(0):
        if G2.weil_pairing(G1_ext, n) != 1:
            break

print(f"Curve: y² = x³ + x over F_{p}")
print(f"Subgroup order: n = {n}")
print(f"G1 = {G1}")
print(f"G2 = {G2}")

# Also set up the QAP field (small for demonstration)
F_qap = GF(n)  # QAP computations mod n
R.<X> = PolynomialRing(F_qap)

2. The Trusted Setup ("Ceremony")

A trusted party (or a multi-party computation) generates a secret $\tau \in \mathbb{Z}/n\mathbb{Z}$ and publishes curve points encoding powers of $\tau$ :

\text{CRS} = \{[\tau^i]_1 = \tau^i \cdot G_1\}_{i=0}^{d}, \quad \{[\tau^i]_2 = \tau^i \cdot G_2\}_{i=0}^{d}

Then $\tau$ is destroyed, this is the toxic waste. If anyone learns $\tau$ , they can forge proofs.

Misconception alert. "The trusted setup means you have to trust one person." Not in practice, multi-party computation (MPC) ceremonies ensure that as long as any single participant destroys their share, the toxic waste is unrecoverable. Zcash's Powers of Tau ceremony had hundreds of participants.

In [ ]:

# Trusted setup: generate tau and compute CRS
tau = ZZ(F_qap.random_element())  # secret!
while tau == 0:
    tau = ZZ(F_qap.random_element())

max_degree = 4  # enough for our toy example

# Powers of tau as curve points
CRS_G1 = [ZZ(F_qap(tau^i)) * G1 for i in range(max_degree + 1)]  # [G1, τG1, τ²G1, ...]
CRS_G2 = [ZZ(F_qap(tau^i)) * G2 for i in range(max_degree + 1)]  # [G2, τG2, τ²G2, ...]

print(f"=== Trusted Setup ===")
print(f"Secret tau = {tau} (TOXIC WASTE, must be destroyed!)")
print(f"\nCRS (G1 points):")
for i, pt in enumerate(CRS_G1):
    print(f"  [τ^{i}]₁ = τ^{i}·G1 = {pt}")
print(f"\nCRS (G2 points):")
for i, pt in enumerate(CRS_G2):
    print(f"  [τ^{i}]₂ = τ^{i}·G2 = {pt}")

print(f"\nAnyone can USE these points but cannot recover τ from them.")
print(f"(Recovering τ from τ·G would require solving the discrete log problem.)")

Checkpoint 1. The CRS encodes $\tau$ "in the exponent", as curve points. Given $[\tau^i]_1$ , you can compute $[f(\tau)]_1$ for any polynomial $f$ by taking linear combinations: $[f(\tau)]_1 = f_0 [1]_1 + f_1 [\tau]_1 + f_2 [\tau^2]_1 + \cdots$ . But you can never learn $\tau$ itself.

3. Evaluating Polynomials on Curve Points

Given a polynomial $f(x) = a_0 + a_1 x + a_2 x^2 + \cdots$ and the CRS, anyone can compute $[f(\tau)]_1$ without knowing $\tau$ :

[f(\tau)]_1 = a_0 [1]_1 + a_1 [\tau]_1 + a_2 [\tau^2]_1 + \cdots

In [ ]:

def eval_poly_on_curve(poly, CRS_points, E_zero):
    """
    Evaluate polynomial on CRS curve points.
    Returns [f(tau)]_G = sum(coeff_i * [tau^i]_G)
    """
    coeffs = poly.list()  # [a_0, a_1, a_2, ...]
    result = E_zero  # point at infinity
    for i, c in enumerate(coeffs):
        if c != 0:
            result = result + ZZ(c) * CRS_points[i]
    return result

# Example: f(X) = 3X² + 2X + 1
f_test = R(1 + 2*X + 3*X^2)
print(f"f(X) = {f_test}")
print(f"f(τ) = f({tau}) = {f_test(F_qap(tau))}")

# Evaluate on curve points (without knowing tau!)
f_tau_G1 = eval_poly_on_curve(f_test, CRS_G1, E(0))

# Compare with direct computation (using tau, cheating for verification)
f_tau_direct = ZZ(f_test(F_qap(tau))) * G1

print(f"\n[f(τ)]₁ via CRS:    {f_tau_G1}")
print(f"[f(τ)]₁ directly:   {f_tau_direct}")
print(f"Match? {f_tau_G1 == f_tau_direct}")
print(f"\nWe computed [f(τ)]₁ without knowing τ = {tau}!")

4. Simplified Groth16 Proof

Let's build a simplified SNARK for our running example $f(x) = x^3 + x + 5$ .

Setup: We have the QAP from Notebook 10c. Now the prover computes:

$[A(\tau)]_1$ , encoding of the "left" polynomial evaluation
$[B(\tau)]_2$ , encoding on $G_2$
$[H(\tau)]_1$ , encoding of the quotient polynomial

Verification: The verifier checks: $e([A(\tau)]_1, [B(\tau)]_2) = e([C(\tau)]_1, G_2) \cdot e([H(\tau)]_1, [Z(\tau)]_2)$

This is the pairing-based version of $A(\tau) \cdot B(\tau) = C(\tau) + H(\tau) \cdot Z(\tau)$ .

In [ ]:

# Build the QAP (reusing tools from 10c, simplified)
# f(x) = x³ + x + 5, two multiplication constraints
# Simplified: A(X)·B(X) - C(X) = H(X)·Z(X)

# We'll work with a concrete example where we know all values
# x_input = 3, so f(3) = 35
x_input = F_qap(3)

# For this demo, define the QAP polynomials directly
# (In practice these come from R1CS interpolation)
# Simplified: A(τ) = 3 (our x value), B(τ) = 3, result is 9 = w1

# Let's directly demonstrate the pairing check concept
# Suppose A(τ) = a, B(τ) = b, C(τ) = c, H(τ) = h, Z(τ) = z
# such that a * b = c + h * z

a_val = ZZ(F_qap(5))   # some QAP evaluation
b_val = ZZ(F_qap(7))   # some QAP evaluation
hz_val = ZZ(F_qap(a_val * b_val % n))  # h*z such that a*b = c + h*z
c_val = ZZ(F_qap(0))   # for simplicity
# So a*b mod n = 35 mod 13 = 9, and c + h*z = 0 + 9 = 9 ✓

# Encode as curve points
proof_A = a_val * G1      # [A(τ)]₁
proof_B = b_val * G2      # [B(τ)]₂
proof_C = c_val * G1      # [C(τ)]₁
proof_HZ = hz_val * G1    # [H(τ)·Z(τ)]₁

print(f"=== Proof Elements ===")
print(f"[A(τ)]₁ = {proof_A}")
print(f"[B(τ)]₂ = {proof_B}")
print(f"[C(τ)]₁ = {proof_C}")
print(f"[H·Z(τ)]₁ = {proof_HZ}")
print(f"\nProof = 3 curve points (in real Groth16)")
print(f"Size: constant regardless of circuit complexity!")

5. Pairing-Based Verification

The verifier checks: $e([A]_1, [B]_2) = e([C]_1 + [H \cdot Z]_1, G_2)$

Why does this work?

LHS: $e(a \cdot G_1, b \cdot G_2) = e(G_1, G_2)^{ab}$
RHS: $e((c + hz) \cdot G_1, G_2) = e(G_1, G_2)^{c + hz}$
Equal when $ab = c + hz$ , which is exactly the QAP equation!

In [ ]:

# Pairing-based verification
proof_A_ext = E_ext(proof_A)
proof_C_ext = E_ext(proof_C)
proof_HZ_ext = E_ext(proof_HZ)
G1_ext_pt = E_ext(G1)

# LHS: e([A]₁, [B]₂)
lhs = proof_A_ext.weil_pairing(proof_B, n)

# RHS: e([C]₁ + [HZ]₁, G₂)
rhs_point = proof_C_ext + proof_HZ_ext
rhs = rhs_point.weil_pairing(G2, n)

print(f"=== Pairing Verification ===")
print(f"LHS: e([A]₁, [B]₂) = {lhs}")
print(f"RHS: e([C+HZ]₁, G₂) = {rhs}")
print(f"Equal? {lhs == rhs}")
print(f"\nThe verifier checked a·b = c + h·z using pairings,")
print(f"without ever learning a={a_val}, b={b_val}, c={c_val}, hz={hz_val}!")

In [ ]:

# Demonstrate soundness: wrong proof fails
print("=== Wrong Proof ===")
fake_a = ZZ(F_qap(6))  # wrong value
fake_A = fake_a * G1
fake_A_ext = E_ext(fake_A)

lhs_fake = fake_A_ext.weil_pairing(proof_B, n)
print(f"Fake LHS: e([6·G1], [B]₂) = {lhs_fake}")
print(f"Real RHS: e([C+HZ]₁, G₂)  = {rhs}")
print(f"Equal? {lhs_fake == rhs}")
print(f"\nForged proof REJECTED!")

Checkpoint 2. The pairing translates the polynomial identity check into an elliptic curve equation. The verifier computes a constant number of pairings (3 in real Groth16), regardless of the circuit size. This is why Groth16 verification is so fast, it takes ~5ms even for circuits with millions of gates.

6. The Toxic Waste Problem

If an attacker knows $\tau$ , they can forge proofs for any statement.

In [ ]:

# With tau, attacker can forge proofs!
print("=== Toxic Waste Attack ===")
print(f"Attacker knows tau = {tau}")

# Attacker wants to prove a*b = c + hz for arbitrary values
# They can compute any [f(τ)]₁ directly
fake_statement_a = ZZ(F_qap(11))
fake_statement_b = ZZ(F_qap(3))
fake_ab = ZZ(F_qap(fake_statement_a * fake_statement_b))

forged_A = fake_statement_a * G1
forged_B = fake_statement_b * G2
forged_HZ = fake_ab * G1
forged_C = E(0)  # zero

# Verify the forged proof
forged_A_ext = E_ext(forged_A)
forged_HZ_ext = E_ext(forged_HZ)
lhs_forged = forged_A_ext.weil_pairing(forged_B, n)
rhs_forged = forged_HZ_ext.weil_pairing(G2, n)

print(f"Forged proof passes? {lhs_forged == rhs_forged}")
print(f"\n⚠ This is why tau MUST be destroyed!")
print(f"In practice, multi-party computation ensures no single party knows tau.")

7. Real Groth16: The Full Picture

The actual Groth16 protocol is more complex. Here's the high-level structure:

Phase	What Happens	Output
Setup	Generate CRS from $\tau, \alpha, \beta, \gamma, \delta$	Proving key, Verification key
Prove	Prover computes $\pi = ([A]_1, [B]_2, [C]_1)$ using witness	3 curve points
Verify	Verifier checks pairing equation	Accept/Reject

Groth16 Pairing Equation (Simplified)

e([A]_1, [B]_2) = e([\alpha]_1, [\beta]_2) \cdot e(\sum_{\text{public}} s_i [L_i]_1, [\gamma]_2) \cdot e([C]_1, [\delta]_2)

Where:

$\alpha, \beta, \gamma, \delta$ are additional toxic waste parameters
$L_i$ encode the QAP polynomials for public inputs
The proof randomization uses blinding factors for zero-knowledge

Proof Size and Verification Cost

Metric	Groth16
Proof size	2 $G_1$ points + 1 $G_2$ point = 192 bytes (BLS12-381)
Verification	3 pairings + 1 multi-scalar multiplication
Verification time	~5 ms
Proving time	$O(n \log n)$ where $n$ = circuit size

8. SNARK Properties

Groth16 is a zk-SNARK, let's unpack the acronym:

Letter	Property	Meaning
zk	Zero-Knowledge	Proof reveals nothing about the witness
S	Succinct	Proof is constant-size (192 bytes)
N	Non-interactive	Prover sends one message to verifier
AR	ARgument	Computationally sound (not information-theoretically)
K	Knowledge	Prover must "know" a valid witness (extractability)

In [ ]:

# Size comparison
print("=== Proof Size Comparison ===")
for gates in [100, 10000, 1000000, 100000000]:
    witness_bytes = gates * 32  # each wire value is 32 bytes
    proof_bytes = 192  # always 192 bytes!
    print(f"{gates:>15,} | {witness_bytes:>12,} B | {proof_bytes} B")

print(f"\nThe proof is ALWAYS 192 bytes, that's succinctness!")
print(f"A 100M-gate circuit (ZK-EVM scale) still has a tiny proof.")

Checkpoint 3. Groth16 achieves the smallest proof size of any known SNARK (192 bytes on BLS12-381). The trade-off is the trusted setup, if the toxic waste is compromised, soundness breaks. This trade-off motivates transparent proof systems like STARKs (next notebooks).

9. Groth16 in the Wild

Application	What's proved	Circuit size
Zcash	Transaction validity (spend authority, value balance)	~100K gates
Filecoin	Storage proofs (PoRep)	~150M gates
Tornado Cash	Merkle tree membership (deposit/withdrawal)	~30K gates
zkSync (v1)	Batch of transfer validity	~100K gates

Crypto foreshadowing. Groth16 needs a circuit-specific trusted setup, changing the circuit requires a new ceremony. Universal SNARKs (PLONK, Marlin) require only a single "universal" setup. STARKs (next notebooks) eliminate the trusted setup entirely.

10. Exercises

Exercise 1 (Worked): CRS Polynomial Evaluation

Problem. Given the CRS, compute $[f(\tau)]_1$ for $f(X) = X^2 + 2X + 3$ without knowing $\tau$ . Then verify by comparing with the direct computation.

Solution:

In [ ]:

# Exercise 1: Worked solution
f_ex1 = R(3 + 2*X + X^2)  # 3 + 2X + X²
print(f"f(X) = {f_ex1}")

# Using CRS (without knowing tau)
f_tau_crs = eval_poly_on_curve(f_ex1, CRS_G1, E(0))

# Direct (cheating, using tau for verification)
f_tau_val = ZZ(f_ex1(F_qap(tau)))
f_tau_direct = f_tau_val * G1

print(f"\nVia CRS:    [f(τ)]₁ = {f_tau_crs}")
print(f"Direct:     [f(τ)]₁ = {f_tau_direct}")
print(f"Match? {f_tau_crs == f_tau_direct}")
print(f"\nf(τ) = f({tau}) = {f_tau_val} (verifier doesn't see this!)")

Exercise 2 (Guided): Pairing Verification

Problem. Create curve points $[a]_1$ and $[b]_2$ for $a = 4, b = 9$ . Verify that $e([a]_1, [b]_2) = e([ab]_1, G_2)$ using the Weil pairing.

Fill in the TODOs:

In [ ]:

# Exercise 2: fill in the TODOs

# TODO 1: Create the curve points
# pt_a = ZZ(4) * G1   # [a]₁
# pt_b = ZZ(9) * G2   # [b]₂
# pt_ab = ZZ(F_qap(4 * 9)) * G1  # [ab]₁

# TODO 2: Compute pairings
# lhs_ex = E_ext(pt_a).weil_pairing(pt_b, n)   # e([a]₁, [b]₂)
# rhs_ex = E_ext(pt_ab).weil_pairing(G2, n)     # e([ab]₁, G₂)

# TODO 3: Compare
# print(f"e([a]₁, [b]₂) = {lhs_ex}")
# print(f"e([ab]₁, G₂)  = {rhs_ex}")
# print(f"Equal? {lhs_ex == rhs_ex}")

Exercise 3 (Independent): Forge with Toxic Waste

Problem.

Suppose you know $\tau$ . Create a valid-looking proof for the statement $7 \times 8 = 4$ (which is false mod 13, since $56 \bmod 13 = 4$ ... wait, that's actually true!).
Instead, prove $7 \times 8 = 3$ (which IS false mod 13). Can you make the pairing check pass if you know $\tau$ ?
Explain why not knowing $\tau$ prevents this forgery.

In [ ]:

# Exercise 3: write your solution here

Summary

Concept	Key Fact
Trusted setup	Generate CRS from toxic waste $\tau$ ; $\tau$ must be destroyed
CRS	Curve points $[\tau^i]_1, [\tau^i]_2$ , encode powers of $\tau$
Polynomial evaluation	$[f(\tau)]_1 = \sum f_i [\tau^i]_1$ , no need to know $\tau$
Pairing check	$e([A]_1, [B]_2) = e(G_1, G_2)^{A(\tau) \cdot B(\tau)}$ , multiplicative check
Proof size	3 curve points = 192 bytes (constant!)
Toxic waste risk	Knowing $\tau$ allows forging any proof

Groth16 is the gold standard for compact zk-SNARKs: tiny proofs, fast verification, but requires a trusted setup. In the next notebook, we'll explore FRI, the key ingredient of STARKs, which eliminate the trusted setup entirely.

Next: 10e: FRI Protocol