GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/11-homomorphic-encryption/sage/11a-what-is-fhe.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.5

Notebook 11a: What Is Fully Homomorphic Encryption?

Module 11: Homomorphic Encryption

Motivating Question. Can you compute on data you can't see? Imagine a hospital that wants a cloud server to run a machine-learning model on patient records, but the server should never see the records. Fully Homomorphic Encryption (FHE) makes this possible: you encrypt the data, the server computes on the ciphertexts, and the result decrypts to the correct answer. No one but the key holder ever sees the plaintext.

Prerequisites. You should be comfortable with:

Modular arithmetic (Module 01)
The Learning With Errors (LWE) problem (Module 08)

Learning objectives. By the end of this notebook you will be able to:

Explain the FHE dream: $\text{Enc}(a) \oplus \text{Enc}(b) = \text{Enc}(a + b)$ .
Build a toy symmetric encryption scheme with additive homomorphism.
Observe noise growth as operations accumulate and understand why it limits computation depth.
Describe bootstrapping as the key idea that makes FHE fully homomorphic.
Place FHE in context with a timeline of the four generations.

1. The FHE Dream

Bridge from Module 08. In Module 08 we saw that LWE gives us encryption that's hard to break even for quantum computers. Now we'll exploit the algebraic structure of LWE ciphertexts: adding two ciphertexts adds the underlying messages.

Homomorphic encryption lets you compute on ciphertexts:

\text{Enc}(a) \boxplus \text{Enc}(b) = \text{Enc}(a + b)

\text{Enc}(a) \boxtimes \text{Enc}(b) = \text{Enc}(a \cdot b)

If a scheme supports both addition and multiplication on ciphertexts, it's fully homomorphic, you can evaluate any function, because addition + multiplication = any arithmetic circuit.

In [ ]:

# The dream in pseudocode
print("=== The FHE Dream ===")
print()
print("Alice has secret data: a = 7, b = 3")
print("Alice encrypts: c_a = Enc(7), c_b = Enc(3)")
print("Alice sends c_a, c_b to Cloud (Cloud sees only ciphertext)")
print()
print("Cloud computes on ciphertexts:")
print("  c_sum  = c_a ⊕ c_b     (homomorphic addition)")
print("  c_prod = c_a ⊗ c_b     (homomorphic multiplication)")
print("Cloud sends c_sum, c_prod back to Alice")
print()
print("Alice decrypts:")
print("  Dec(c_sum)  = 10  (= 7 + 3  ✓)")
print("  Dec(c_prod) = 21  (= 7 × 3  ✓)")
print()
print("Cloud computed the CORRECT answers without ever seeing 7 or 3!")

2. A Toy Symmetric HE Scheme

Let's build the simplest possible homomorphic encryption scheme to see the idea in action.

Secret key: $s \in \mathbb{Z}_q$ (a random secret modulus- $q$ integer)

Encrypt message $m$ (small integer, $|m| \ll q$ ):

Choose random $a \in \mathbb{Z}_q$ and small noise $e \in \{-B, \ldots, B\}$
Output ciphertext $(a, \; b = a \cdot s + m + e \pmod{q})$

Decrypt ciphertext $(a, b)$ :

Compute $v = b - a \cdot s \pmod{q}$ (centered in $[-q/2, q/2)$ )
Return $v$ (this equals $m + e$ , and if $|e|$ is small, we recover $m$ approximately)

In [ ]:


q = 1000003  # a large prime modulus
B = 5        # noise bound

def centered_mod(x, q):
    """Reduce x modulo q into the range [-q/2, q/2)."""
    r = x % q
    if r > q // 2:
        r -= q
    return r

def keygen(q):
    """Generate a random secret key."""
    return randint(1, q - 1)

def encrypt(m, s, q, B):
    """Encrypt message m with secret key s."""
    a = randint(0, q - 1)
    e = randint(-B, B)
    b = (a * s + m + e) % q
    return (a, b, e)  # returning e for educational purposes

def decrypt(ct, s, q):
    """Decrypt ciphertext (a, b) with secret key s."""
    a, b = ct[0], ct[1]
    v = centered_mod(b - a * s, q)
    return v

# Generate key and encrypt
s = keygen(q)
print(f"Secret key: s = {s}")
print(f"Modulus: q = {q}")
print(f"Noise bound: B = {B}")

m = 42
ct = encrypt(m, s, q, B)
print(f"\nPlaintext: m = {m}")
print(f"Ciphertext: (a={ct[0]}, b={ct[1]})")
print(f"Noise: e = {ct[2]}")

decrypted = decrypt(ct, s, q)
print(f"\nDecrypted: {decrypted}")
print(f"Original + noise: {m} + {ct[2]} = {m + ct[2]}")
print(f"Match? {decrypted == m + ct[2]}")

Checkpoint 1. Decryption recovers $m + e$ (message plus noise). Since $e$ is tiny compared to $q$ , we can round to recover $m$ exactly, as long as the noise stays small. This "noise budget" is the central challenge of FHE.

3. Homomorphic Addition

The magic: if we add two ciphertexts component-wise, the result encrypts the sum of the messages!

\text{ct}_1 = (a_1, \; a_1 s + m_1 + e_1)

\text{ct}_2 = (a_2, \; a_2 s + m_2 + e_2)

\text{ct}_1 + \text{ct}_2 = (a_1 + a_2, \; (a_1 + a_2)s + (m_1 + m_2) + (e_1 + e_2))

This is a valid encryption of $m_1 + m_2$ with noise $e_1 + e_2$ .

In [ ]:

def he_add(ct1, ct2, q):
    """Homomorphic addition: add ciphertexts component-wise."""
    a_new = (ct1[0] + ct2[0]) % q
    b_new = (ct1[1] + ct2[1]) % q
    noise_new = ct1[2] + ct2[2]  # for tracking
    return (a_new, b_new, noise_new)

# Encrypt two messages
m1, m2 = 17, 25
ct1 = encrypt(m1, s, q, B)
ct2 = encrypt(m2, s, q, B)

print(f"m1 = {m1}, noise e1 = {ct1[2]}")
print(f"m2 = {m2}, noise e2 = {ct2[2]}")

# Homomorphic addition (no secret key needed!)
ct_sum = he_add(ct1, ct2, q)
decrypted_sum = decrypt(ct_sum, s, q)

print(f"\nHomomorphic addition (done WITHOUT the secret key):")
print(f"  Dec(ct1 + ct2) = {decrypted_sum}")
print(f"  Expected: m1 + m2 = {m1 + m2}")
print(f"  Accumulated noise: e1 + e2 = {ct_sum[2]}")
print(f"  Decrypted = message + noise = {m1 + m2} + {ct_sum[2]} = {m1 + m2 + ct_sum[2]}")
print(f"  Match? {decrypted_sum == m1 + m2 + ct_sum[2]}")

In [ ]:

# Chain many additions to see noise grow
messages = [randint(1, 50) for _ in range(20)]
cts = [encrypt(m, s, q, B) for m in messages]

# Sum all ciphertexts
ct_running = cts[0]
running_sum = messages[0]


for i in range(1, len(messages)):
    ct_running = he_add(ct_running, cts[i], q)
    running_sum += messages[i]
    dec = decrypt(ct_running, s, q)
    noise = ct_running[2]
    correct = (dec == running_sum + noise)

print(f"\nAfter {len(messages)-1} additions: noise = {ct_running[2]}, still tiny vs q = {q}")
print(f"Addition noise grows LINEARLY: each addition adds at most ±{B}.")

4. The Noise Problem: Why Multiplication Is Hard

Addition adds noise linearly ( $e_1 + e_2$ ). But multiplication makes noise grow multiplicatively. In a proper LWE-based scheme, multiplying two ciphertexts with noise $e_1, e_2$ produces noise roughly $e_1 \cdot e_2$ , which grows exponentially with the number of multiplications.

Let's simulate this with a simple noise-growth model.

In [ ]:

# Simulate noise growth for addition vs multiplication
initial_noise = 3  # |e| ≤ 3
noise_budget = q // 4  # decryption fails when noise exceeds this

print(f"Initial noise: ±{initial_noise}")
print(f"Noise budget (max tolerable noise): ±{noise_budget}")
print()

# Addition: noise grows linearly
print("=== Noise Growth: Addition ===")
noise_add = initial_noise
for i in range(1, 21):
    noise_add += initial_noise
    bar = "█" * min(50, int(50 * noise_add / noise_budget))
    status = "✓" if noise_add < noise_budget else "✗ OVERFLOW"
    if i in [1, 5, 10, 15, 20]:
        print(f"  {i} adds: noise ≤ {noise_add} |{bar}| {status}")

print()

# Multiplication: noise grows exponentially (simplified model)
print("=== Noise Growth: Multiplication ===")
noise_mul = initial_noise
for i in range(1, 21):
    noise_mul = noise_mul * initial_noise  # simplified: noise roughly multiplies
    bar = "█" * min(50, int(50 * min(noise_mul, noise_budget) / noise_budget))
    status = "✓" if noise_mul < noise_budget else "✗ OVERFLOW"
    if i in [1, 2, 3, 4, 5, 10]:
        print(f"  {i} muls: noise ≤ {noise_mul} |{bar}| {status}")
    if noise_mul >= noise_budget:
        print(f"  ... noise exceeded budget after {i} multiplications!")
        break

print(f"\nAddition: can do ~{noise_budget // initial_noise} operations")
print(f"Multiplication: can only do ~{int(log(noise_budget, initial_noise))} operations")
print(f"This is the fundamental challenge of FHE.")

Misconception alert. "FHE means unlimited computation on ciphertexts." Without bootstrapping, you can only do a limited number of operations before the noise overwhelms the message. A scheme that supports limited operations is called leveled FHE. True ("pure") FHE requires bootstrapping to reset the noise.

5. Bootstrapping: The Key to "Fully" Homomorphic

In 2009, Craig Gentry had a breakthrough idea: evaluate the decryption circuit homomorphically.

The trick:

You have a noisy ciphertext $\text{ct}$ that's about to overflow.
Encrypt the secret key $s$ under a new key $s'$ : publish $\text{Enc}_{s'}(s)$ .
Homomorphically evaluate $\text{Dec}(\text{ct}, s)$ using $\text{Enc}_{s'}(s)$ .
The result is $\text{Enc}_{s'}(m)$ , the same message encrypted under $s'$ , with fresh (low) noise!

This "refreshes" the ciphertext, resetting the noise budget. You can then continue computing.

  ct (noisy, key s)  ──→  Homomorphic Dec  ──→  ct' (fresh, key s')
       ↑                                              ↑
   noise ≈ budget                               noise ≈ small

In [ ]:

# Conceptual demonstration of bootstrapping
# (We can't implement real bootstrapping in a toy scheme, but we can show the idea)

print("=== Bootstrapping Concept ===")
print()
print("Without bootstrapping (leveled FHE):")
print("  Enc(m) → op → op → op → op → op → ✗ noise overflow!")
print()
print("With bootstrapping (fully homomorphic):")
print("  Enc(m) → op → op → [bootstrap] → op → op → [bootstrap] → op → ...")
print("                     ↑ noise reset                ↑ noise reset")
print()
print("Each bootstrap is expensive (~1 second per operation in early schemes),")
print("but it lets you compute INDEFINITELY.")
print()

# Simulate noise with bootstrapping
noise = initial_noise
bootstrap_threshold = noise_budget // 2  # bootstrap when noise gets high
bootstrap_cost = initial_noise * 2  # fresh noise after bootstrap
n_bootstraps = 0


for i in range(1, 15):
    noise = noise * initial_noise  # multiplication (exponential growth)
    if noise >= bootstrap_threshold:
        n_bootstraps += 1
        noise = bootstrap_cost  # reset noise!
    else:

print(f"\nCompleted 14 multiplications with {n_bootstraps} bootstraps.")
print(f"Without bootstrapping, we could only do ~{int(log(noise_budget, initial_noise))}.")

Checkpoint 2. Bootstrapping is what transforms a somewhat homomorphic scheme (limited operations) into a fully homomorphic one (unlimited operations). The requirement is that the scheme can homomorphically evaluate its own decryption circuit, Gentry called this "squashing" the decryption circuit.

6. Four Generations of FHE

FHE has evolved rapidly since Gentry's 2009 breakthrough:

Gen	Year	Schemes	Key Innovation	Performance
1st	2009	Gentry	Bootstrapping! First proof that FHE is possible	Minutes per gate
2nd	2011–12	BGV, BFV	Modulus switching replaces expensive bootstrapping	Seconds per circuit
3rd	2013	GSW	Approximate eigenvector technique, simpler design	Improved asymptotic
4th	2017	CKKS	Approximate arithmetic for real numbers	Practical for ML

In [ ]:

# Timeline visualization
timeline = [
    (1978, "RSA",      "Rivest asks: can we compute on encrypted data?"),
    (2009, "Gentry",   "First FHE construction! (ideal lattices + bootstrapping)"),
    (2011, "BGV",      "Brakerski-Gentry-Vaikuntanathan: modulus switching"),
    (2012, "BFV",      "Brakerski/Fan-Vercauteren: scale-invariant noise management"),
    (2013, "GSW",      "Gentry-Sahai-Waters: approximate eigenvector, no key-switching"),
    (2016, "TFHE",     "Chillotti et al.: gate-by-gate bootstrapping in <1ms"),
    (2017, "CKKS",     "Cheon-Kim-Kim-Song: approximate arithmetic for ML"),
    (2020, "Libraries", "Microsoft SEAL, IBM HElib, PALISADE go production"),
    (2024, "Hardware", "DARPA DPRIVE, Intel/DARPA FHE accelerators"),
]

print("=== FHE Timeline ===")
print()
for year, name, desc in timeline:
    bar_pos = (year - 1978) // 2
    marker = "●" if year >= 2009 else "○"
    print(f"  {year} {marker} {name}: {desc}")

print()
print("The field went from 'impossible?' (1978) to 'theoretically possible' (2009)")
print("to 'practically useful for some applications' (2020+).")

7. FHE vs Other Privacy Technologies

FHE is one of several approaches to privacy-preserving computation:

Technology	What it does	Trust model	Performance
FHE	Compute on encrypted data	Single key holder	Slow (10-1000x overhead)
MPC	Multiple parties compute together	Threshold trust	Moderate overhead, communication-heavy
ZK Proofs	Prove computation was correct	Prover knows witness	Fast verification
TEE (e.g., SGX)	Hardware-isolated computation	Trust hardware vendor	Near-native speed

These are complementary, not competing:

FHE + ZK: FHE keeps inputs encrypted while computing; ZK proves the computation was done correctly. Together they give verifiable computation on private data.
FHE + MPC: distributed key generation, threshold decryption
FHE vs ZK: ZK hides the witness but the computation is public; FHE hides the data but the computation is known to the server. They solve different privacy problems.
All three together: verifiable computation on private, distributed data

In [ ]:

# Performance comparison (approximate orders of magnitude)
print("=== Performance Overhead (approximate) ===")
print()
operations = [
    ("Integer addition",    "1 ns",     "1 μs",    "~1,000x"),
    ("Integer multiply",    "1 ns",     "10 μs",   "~10,000x"),
    ("AES evaluation",      "1 μs",     "1 s",     "~1,000,000x"),
    ("ML inference",        "10 ms",    "10 min",  "~60,000x"),
    ("DB query (simple)",   "1 ms",     "10 s",    "~10,000x"),
]
for op, plain, fhe, overhead in operations:

print()
print("FHE is MUCH slower than plaintext computation.")
print("But for privacy-critical applications, the overhead is worth it.")
print("And the gap is closing fast, 2024 FHE is >1000x faster than 2011 FHE.")

Checkpoint 3. FHE is not a silver bullet, it's orders of magnitude slower than plaintext computation. But for applications where the alternative is not computing at all (because the data is too sensitive), even 10,000x overhead is acceptable. The field is rapidly closing the performance gap.

Crypto foreshadowing. The next notebook explores partially homomorphic schemes (RSA, Paillier) that support only one operation. These are much faster and already widely deployed. We'll then build up to the full BGV, BFV, and CKKS schemes in subsequent notebooks.

8. Exercises

Exercise 1 (Worked): Maximum Additions

Problem. Using our toy scheme with $q = 1000003$ and $B = 5$ , how many homomorphic additions can we perform before the accumulated noise exceeds $q/4$ ?

Solution:

In [ ]:

# Exercise 1: Worked solution
q_ex = 1000003
B_ex = 5
budget = q_ex // 4

print(f"Modulus: q = {q_ex}")
print(f"Noise bound per ciphertext: B = {B_ex}")
print(f"Noise budget: q/4 = {budget}")
print()

# Each addition adds at most B noise
# After k additions: noise ≤ (k+1) * B
max_additions = budget // B_ex - 1
print(f"After k additions of fresh ciphertexts:")
print(f"  Total noise ≤ (k + 1) × B = (k + 1) × {B_ex}")
print(f"  Constraint: (k + 1) × {B_ex} < {budget}")
print(f"  k < {budget // B_ex} - 1 = {max_additions}")
print(f"\nWe can perform up to {max_additions:,} additions safely.")
print(f"That's a LOT of additions, addition is 'cheap' in noise budget.")

Exercise 2 (Guided): Noise Growth Experiment

Problem. Encrypt 100 random messages with our toy scheme. Sum them homomorphically and verify the result matches the plaintext sum.

Fill in the TODOs:

In [ ]:

# Exercise 2: fill in the TODOs

# TODO 1: Generate 100 random messages between 1 and 100
# msgs = [randint(1, 100) for _ in range(100)]

# TODO 2: Encrypt all messages
# encrypted = [encrypt(m, s, q, B) for m in msgs]

# TODO 3: Sum all ciphertexts homomorphically
# ct_total = encrypted[0]
# for i in range(1, len(encrypted)):
#     ct_total = he_add(ct_total, encrypted[i], q)

# TODO 4: Decrypt and compare with plaintext sum
# dec_total = decrypt(ct_total, s, q)
# true_sum = sum(msgs)
# noise = ct_total[2]
# print(f"Plaintext sum: {true_sum}")
# print(f"Decrypted (with noise): {dec_total}")
# print(f"Total noise: {noise}")
# print(f"Recovered message: {dec_total - noise}")
# print(f"Match? {dec_total - noise == true_sum}")

Exercise 3 (Independent): Noise Budget Analysis

Problem.

In a scheme where multiplication noise grows as $e_1 \cdot e_2$ , starting from initial noise $B = 2$ , how many sequential multiplications can you perform before exceeding a budget of $2^{40}$ ?
If each bootstrapping reduces noise back to $B = 2$ but costs the equivalent of 100 multiplications in wall-clock time, how much does bootstrapping increase the total computation time for a circuit of depth 20?
Why might CKKS (approximate arithmetic) tolerate more noise than BFV (exact arithmetic)?

In [ ]:

# Exercise 3: write your solution here

Summary

Concept	Key Fact
Homomorphic encryption	Compute on ciphertexts: $\text{Enc}(a) \oplus \text{Enc}(b) = \text{Enc}(a+b)$
Noise	Every operation adds noise; too much noise → decryption fails
Addition noise	Grows linearly: $e_1 + e_2$ (cheap)
Multiplication noise	Grows multiplicatively: $\sim e_1 \cdot e_2$ (expensive)
Leveled FHE	Supports a bounded number of operations (no bootstrapping)
Bootstrapping	Homomorphically decrypt to refresh noise → unlimited operations
Fully homomorphic	Leveled + bootstrapping = compute any circuit

FHE is the most ambitious goal in cryptography: compute anything on encrypted data. The field has gone from theoretical curiosity (2009) to practical deployment (2020s). In the next notebooks, we'll build concrete schemes step by step.

Next: 11b: Partially Homomorphic Schemes