Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
duyuefeng0708
GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/11-homomorphic-encryption/break/cpa-deterministic-encryption.ipynb
483 views
unlisted
Kernel: SageMath 10.0

Break: CPA Attack on Deterministic Homomorphic Encryption

Module 11 | Breaking Weak Parameters

Without randomized encryption, homomorphic schemes leak plaintext equality --- and a chosen-plaintext attacker can identify every encrypted value.

Why This Matters

Homomorphic encryption promises computation on encrypted data. But if the encryption is deterministic (same plaintext always produces the same ciphertext), an attacker can:

  1. Detect equality: two employees earn the same salary if their ciphertexts match.

  2. Dictionary attack: encrypt all candidate values, compare with target ciphertexts.

  3. Frequency analysis: the most common ciphertext reveals the most common value.

This is the CPA (Chosen-Plaintext Attack) model: the attacker can encrypt arbitrary messages and compare the results with target ciphertexts. Any deterministic scheme is trivially broken under CPA.

Proper FHE schemes (LWE-based) use randomized encryption, where the same plaintext produces different ciphertexts every time. This notebook demonstrates why randomness is essential.

The Scenario: A Deterministic "Homomorphic" Scheme

Consider a toy scheme that encrypts by a simple affine transformation:

E(m)=smmodqE(m) = s \cdot m \bmod q

where ss is the secret key. This is additively homomorphic:

E(m1)+E(m2)=sm1+sm2=s(m1+m2)=E(m1+m2)(modq)E(m_1) + E(m_2) = s \cdot m_1 + s \cdot m_2 = s \cdot (m_1 + m_2) = E(m_1 + m_2) \pmod{q}

But it is deterministic: E(5)E(5) always gives the same ciphertext.

# === Deterministic "homomorphic" scheme === q = 997 # prime modulus s_det = randint(1, q - 1) # secret key def det_encrypt(m, s, q): """Deterministic encryption: E(m) = s * m mod q.""" return (s * m) % q def det_decrypt(c, s, q): """Decryption: m = c * s^{-1} mod q.""" s_inv = inverse_mod(s, q) return (c * s_inv) % q # Verify it works m_test = 42 c_test = det_encrypt(m_test, s_det, q) d_test = det_decrypt(c_test, s_det, q) print(f'Secret key: s = {s_det}') print(f'Encrypt({m_test}) = {c_test}') print(f'Decrypt({c_test}) = {d_test}') print(f'Correct? {d_test == m_test}') print() # Verify additive homomorphism m1, m2 = 17, 25 c1, c2 = det_encrypt(m1, s_det, q), det_encrypt(m2, s_det, q) c_sum = (c1 + c2) % q d_sum = det_decrypt(c_sum, s_det, q) print(f'E({m1}) + E({m2}) mod q = {c_sum}') print(f'Decrypt(sum) = {d_sum}') print(f'm1 + m2 = {m1 + m2}') print(f'Additively homomorphic? {d_sum == m1 + m2}')

Step 1: Deterministic Encryption Leaks Equality

Encrypt the same value multiple times. With deterministic encryption, the ciphertext is always identical. An observer immediately knows when two plaintexts are equal.

# Same message encrypted multiple times print('=== Deterministic: Same message, multiple encryptions ===') m_same = 5 for i in range(5): c = det_encrypt(m_same, s_det, q) print(f' Enc({m_same}) = {c}') print(f'\nAll ciphertexts are IDENTICAL. Equality is leaked!') print() # Application: encrypted salary database print('=== Encrypted Salary Database ===') salaries = [50, 75, 50, 100, 75, 50, 120, 100, 75, 50] employee_names = ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank', 'Grace', 'Heidi', 'Ivan', 'Judy'] enc_salaries = [det_encrypt(s, s_det, q) for s in salaries] print('Employee | Encrypted Salary')for name, ec in zip(employee_names, enc_salaries): print(f'{name} | {ec}') print() print('An observer can immediately see:') # Group by ciphertext from collections import Counter ct_counts = Counter(enc_salaries) for ct_val, count in ct_counts.most_common(): who = [name for name, ec in zip(employee_names, enc_salaries) if ec == ct_val] print(f' Ciphertext {ct_val}: {count} employees ({who}) earn the SAME salary')

Step 2: CPA Dictionary Attack

In the CPA model, the attacker can encrypt any message they choose. With a deterministic scheme, the attacker simply encrypts all candidate plaintexts and looks for matches.

# CPA attack: attacker knows the encryption function (public key scenario) # or has access to an encryption oracle # Target: one encrypted salary target_ct = enc_salaries[0] # Alice's encrypted salary print(f'Target ciphertext (Alice): {target_ct}') print() # Attacker encrypts all plausible salary values print('=== CPA Dictionary Attack ===') print('Attacker tries all salary values from 0 to 200:') print() found = False attempts = 0 for candidate_salary in range(0, 201): attempts += 1 candidate_ct = det_encrypt(candidate_salary, s_det, q) if candidate_ct == target_ct: print(f' MATCH at salary = {candidate_salary}!') print(f' E({candidate_salary}) = {candidate_ct} = target') found = True break print(f'\nAttack completed in {attempts} encryption queries.') print(f'Alice\'s actual salary: {salaries[0]}') print(f'Recovered salary: {candidate_salary}') print(f'Success? {candidate_salary == salaries[0]}')
# Recover ALL salaries at once print('=== Full Database Recovery ===') print() # Build dictionary: ciphertext -> plaintext ct_to_pt = {} for candidate in range(0, 201): ct = det_encrypt(candidate, s_det, q) ct_to_pt[ct] = candidate all_correct = True for name, ec, actual in zip(employee_names, enc_salaries, salaries): recovered = ct_to_pt.get(ec, '???') correct = (recovered == actual) all_correct = all_correct and correct print(f'\nAll salaries recovered correctly: {all_correct}') print(f'Total encryption queries needed: 201 (one per candidate value)') print(f'The "encryption" provided ZERO privacy.')

Step 3: The CPA Distinguishing Game

The formal definition of CPA security uses a distinguishing game:

  1. Attacker chooses two messages m0,m1m_0, m_1.

  2. Challenger flips a coin b{0,1}b \in \{0, 1\} and encrypts mbm_b.

  3. Attacker sees the ciphertext and guesses bb.

The scheme is CPA-secure if no attacker can guess bb with probability significantly better than 1/21/2. With deterministic encryption, the attacker wins with probability 11.

# CPA distinguishing game against deterministic encryption def cpa_game_deterministic(n_rounds=1000): """Play the CPA game against deterministic encryption.""" wins = 0 for _ in range(n_rounds): # Attacker chooses two messages m0 = randint(0, q - 1) m1 = randint(0, q - 1) while m1 == m0: m1 = randint(0, q - 1) # Challenger picks random bit and encrypts b = randint(0, 1) challenge_ct = det_encrypt([m0, m1][b], s_det, q) # Attacker's strategy: encrypt m0, compare c0 = det_encrypt(m0, s_det, q) guess = 0 if (challenge_ct == c0) else 1 if guess == b: wins += 1 return wins / n_rounds advantage = cpa_game_deterministic() print(f'=== CPA Distinguishing Game (Deterministic Scheme) ===') print(f'Attacker win rate: {advantage:.3f}') print(f'Random guessing: 0.500') print(f'Advantage: {advantage - 0.5:.3f}') print() print(f'Advantage = {advantage - 0.5:.3f} means the attacker ALWAYS identifies the plaintext.') print(f'For CPA security we need advantage ~ 0.000.')

Step 4: Randomized Encryption Defeats the Attack

Proper LWE-based FHE schemes add fresh random noise to every encryption. The same plaintext produces a different ciphertext each time, making CPA attacks impossible.

We use the toy LWE scheme from Notebook 11a: E(m)=(a,as+m+e)E(m) = (a, a \cdot s + m + e) where aa is random and ee is small random noise.

# Randomized LWE-based encryption q_lwe = 1000003 B_lwe = 5 s_lwe = randint(1, q_lwe - 1) def centered_mod(x, q): r = x % q if r > q // 2: r -= q return r def lwe_encrypt(m, s, q, B): """Randomized LWE encryption.""" a = randint(0, q - 1) e = randint(-B, B) b = (a * s + m + e) % q return (a, b) def lwe_decrypt(ct, s, q): a, b = ct return centered_mod(b - a * s, q) # Same message encrypted multiple times: ALL different! print('=== Randomized LWE: Same message, different ciphertexts ===') m_same = 5 for i in range(5): ct = lwe_encrypt(m_same, s_lwe, q_lwe, B_lwe) dec = lwe_decrypt(ct, s_lwe, q_lwe) print(f' Enc({m_same}) = (a={ct[0]}, b={ct[1]}) --> Dec = {dec}') print(f'\nEvery ciphertext is DIFFERENT, even for the same plaintext!') print(f'An attacker cannot tell which ciphertexts encrypt the same value.') print(f'Dictionary attacks fail because there is no fixed ciphertext to match.')
# CPA game against randomized encryption def cpa_game_randomized(n_rounds=1000): """Play the CPA game against randomized LWE encryption.""" wins = 0 for _ in range(n_rounds): m0 = randint(0, 100) m1 = randint(0, 100) while m1 == m0: m1 = randint(0, 100) b = randint(0, 1) challenge_ct = lwe_encrypt([m0, m1][b], s_lwe, q_lwe, B_lwe) # Attacker's best strategy without the key: just guess randomly # (encrypting m0 won't help because the ciphertext is randomized) c0 = lwe_encrypt(m0, s_lwe, q_lwe, B_lwe) guess = 0 if (challenge_ct == c0) else 1 # will almost never match if guess == b: wins += 1 return wins / n_rounds advantage_rand = cpa_game_randomized() print(f'=== CPA Distinguishing Game (Randomized LWE Scheme) ===') print(f'Attacker win rate: {advantage_rand:.3f}') print(f'Random guessing: 0.500') print(f'Advantage: {advantage_rand - 0.5:.3f}') print() print(f'Advantage ~ 0.5 means the attacker is just guessing randomly.') print(f'The randomized scheme is CPA-secure: the attacker learns NOTHING.') print() print(f'=== Comparison ===') print(f' Deterministic advantage: {cpa_game_deterministic() - 0.5:.3f} (BROKEN)') print(f' Randomized advantage: {advantage_rand - 0.5:.3f} (SECURE)')

Exercises

Exercise 1 (Guided): Frequency Analysis Attack

Generate 100 random salaries from the set {40,50,60,75,100}\{40, 50, 60, 75, 100\}, encrypt them deterministically, and show that the frequency distribution of ciphertexts exactly reveals the frequency distribution of salaries.

# Exercise 1: frequency analysis # TODO: generate 100 salaries from {40, 50, 60, 75, 100} # salary_pool = [40, 50, 60, 75, 100] # salaries_ex = [random.choice(salary_pool) for _ in range(100)] # TODO: encrypt all salaries deterministically # enc_ex = [det_encrypt(s, s_det, q) for s in salaries_ex] # TODO: count frequencies of ciphertexts # from collections import Counter # pt_freq = Counter(salaries_ex) # ct_freq = Counter(enc_ex) # TODO: show that ciphertext frequencies match plaintext frequencies # print('Plaintext frequencies:', dict(pt_freq.most_common())) # print('Ciphertext frequencies:', dict(ct_freq.most_common())) # print('Frequency distributions are identical --- total privacy failure!')

Exercise 2 (Independent)

  1. Paillier encryption uses a random rr in each encryption: E(m)=gmrnmodn2E(m) = g^m \cdot r^n \bmod n^2. Verify that encrypting the same message twice with Paillier produces different ciphertexts (see Notebook 11b for the implementation).

  2. Why is textbook RSA (c=memodnc = m^e \bmod n) deterministic, and how does OAEP padding fix this?

  3. In a system where ciphertexts are deterministic but the attacker does NOT have access to an encryption oracle, is the scheme still insecure? Consider the case where the plaintext space is small (e.g., binary yes/no votes).

Summary

PropertyDeterministic SchemeRandomized (LWE) Scheme
Same plaintext, same ciphertext?Yes --- equality leakedNo --- different every time
CPA attacker advantage1.01.0 (total break)0.0\approx 0.0 (secure)
Dictionary attackTrivialImpossible
Frequency analysisDirectly reveals distributionCiphertexts uniformly distributed
Semantic securityNoYes

Key takeaways:

  • Deterministic encryption is never CPA-secure, regardless of the key size or algebraic structure.

  • Proper FHE schemes (BGV, BFV, CKKS) add fresh random noise in every encryption, achieving semantic security.

  • Paillier achieves randomization through a random rr; LWE-based schemes through random aa and noise ee.

  • The cost of randomization is that ciphertexts are larger and decryption requires error correction.


Back to Module 11: Homomorphic Encryption