Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
duyuefeng0708
GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/11-homomorphic-encryption/connect/encrypted-databases.ipynb
483 views
unlisted
Kernel: SageMath 10.0

Connect: Encrypted Database Queries

Module 11 | Real-World Connections

A cloud database stores encrypted records. Using Paillier, we compute sums and averages without the server ever seeing a single plaintext value.

Introduction

Consider a company that stores employee salary data in the cloud. The cloud provider should never see individual salaries, but HR needs to run queries like:

  • What is the total payroll?

  • What is the average salary?

  • How many employees earn above a threshold? (harder!)

With homomorphic encryption, the database stores only ciphertexts. The server computes query results on ciphertexts, and only the client (key holder) can decrypt the answer.

In this notebook, we'll build a toy encrypted database using Paillier and demonstrate which queries are easy (additive) and which are hard (comparison-based).

Step 1: Set Up the Encrypted Database

Each employee's salary is encrypted with Paillier before being stored. The server holds only ciphertexts.

# === Paillier setup (reused from Module 11b) === p_pail, q_pail = 17, 19 n = p_pail * q_pail # 323 n2 = n^2 # 104329 lam = lcm(p_pail - 1, q_pail - 1) g = n + 1 def L(x, n): return (x - 1) // n mu = inverse_mod(L(power_mod(g, lam, n2), n), n) def paillier_encrypt(m, n, g, n2): r = randint(1, n - 1) while gcd(r, n) != 1: r = randint(1, n - 1) return (power_mod(g, m % n, n2) * power_mod(r, n, n2)) % n2 def paillier_decrypt(c, lam, mu, n, n2): x = power_mod(c, lam, n2) return (L(x, n) * mu) % n def paillier_add(c1, c2, n2): return (c1 * c2) % n2 def paillier_scalar_mul(c, k, n2): return power_mod(c, k, n2) print(f'Paillier setup: n = {n}, plaintext space Z_{n}')
# === Employee salary database === employees = [ ('Alice', 50), ('Bob', 75), ('Carol', 60), ('Dave', 80), ('Eve', 55), ('Frank', 90), ('Grace', 65), ('Heidi', 70), ] # Client encrypts salaries before uploading encrypted_db = [] for name, salary in employees: enc_salary = paillier_encrypt(salary, n, g, n2) encrypted_db.append((name, enc_salary)) print('=== Plaintext Database (client only) ===') print('Name | Salary')for name, salary in employees: print(f'{name} | {salary}') print() print('=== Encrypted Database (server sees only this) ===') print('Name | Enc(Salary)')for name, enc_sal in encrypted_db: print(f'{name} | {enc_sal}') print() print('The server stores ciphertexts. It has no idea what any salary is.')

Step 2: Query 1 --- Total Payroll (Homomorphic Sum)

The server computes the total payroll by multiplying all Paillier ciphertexts (which adds the plaintexts). No decryption key is needed for this step.

# === Server side: compute total payroll homomorphically === # Start with Enc(0) = 1 (identity for multiplication mod n^2) # Actually: g^0 * r^n = r^n, but simpler to just start with the first ciphertext enc_total = paillier_encrypt(0, n, g, n2) # Enc(0) for name, enc_sal in encrypted_db: enc_total = paillier_add(enc_total, enc_sal, n2) print('Server computed the encrypted total payroll.') print(f'Enc(total) = {enc_total}') print(f'(Server has NO idea what this number means.)') print() # === Client side: decrypt the result === total_payroll = paillier_decrypt(enc_total, lam, mu, n, n2) true_total = sum(salary for _, salary in employees) print(f'Client decrypts: total payroll = {total_payroll}') print(f'True total payroll: {true_total}') print(f'Correct? {total_payroll == true_total}') print() print('The server computed the correct total without seeing any individual salary!')

Step 3: Query 2 --- Average Salary

The average is just the total divided by the count. The server can compute the encrypted sum, and the client divides after decryption. Alternatively, the server can use scalar multiplication to scale the sum if needed.

# Average = total / count # The server already computed enc_total. The client decrypts and divides. num_employees = len(employees) average_salary = total_payroll / num_employees true_average = true_total / num_employees print(f'Number of employees: {num_employees}') print(f'Encrypted total (from server): {enc_total}') print(f'Decrypted total: {total_payroll}') print(f'Average salary: {total_payroll}/{num_employees} = {average_salary:.1f}') print(f'True average: {true_average:.1f}') print(f'Correct? {average_salary == true_average}') print() print('For the average, the server computes the sum homomorphically,') print('and the client performs the final division after decryption.') print('The count (number of records) is typically public metadata.')

Step 4: Query 3 --- Weighted Sum (Bonus Allocation)

Suppose each employee gets a bonus that is a percentage of their salary. HR wants to know the total bonus cost: irisi\sum_i r_i \cdot s_i where rir_i is the bonus rate (public) and sis_i is the salary (encrypted).

Paillier supports this: Enc(si)ri=Enc(risi)\text{Enc}(s_i)^{r_i} = \text{Enc}(r_i \cdot s_i).

# Bonus rates (public) and encrypted salaries # Rate is a percentage; we use integer percentages for simplicity bonus_rates = [10, 15, 10, 20, 10, 25, 15, 15] # percent true_total_bonus = 0 for (name, salary), rate in zip(employees, bonus_rates): bonus = salary * rate // 100 true_total_bonus += salary * rate # keep in percentage-units for now print() # Server computes weighted sum homomorphically enc_bonus_total = paillier_encrypt(0, n, g, n2) for (name, enc_sal), rate in zip(encrypted_db, bonus_rates): # Enc(salary)^rate = Enc(salary * rate) enc_weighted = paillier_scalar_mul(enc_sal, rate, n2) enc_bonus_total = paillier_add(enc_bonus_total, enc_weighted, n2) # Client decrypts dec_bonus_total = paillier_decrypt(enc_bonus_total, lam, mu, n, n2) # The result is sum(salary * rate) in percentage-units # Divide by 100 to get actual bonus total print(f'Encrypted weighted sum (server): {enc_bonus_total}') print(f'Decrypted: {dec_bonus_total} (this is sum of salary*rate)') print(f'True total: {true_total_bonus}') print(f'Match? {dec_bonus_total == true_total_bonus % n}') print(f'Total bonus cost: {dec_bonus_total // 100}')

Step 5: Why Comparisons Are Hard

Now suppose HR asks: "How many employees earn more than 70?"

This requires comparing encrypted values, which Paillier cannot do. Comparison is fundamentally a nonlinear operation --- it requires bit decomposition and many multiplications, which demands full FHE (BGV/CKKS) with enough multiplicative depth.

# Why comparison is hard with Paillier threshold = 70 print(f'Query: How many employees earn more than {threshold}?') print() print('=== What we CAN do with cleartext ===') count_above = sum(1 for _, salary in employees if salary > threshold) for name, salary in employees: above = 'YES' if salary > threshold else 'no' print(f' {name}: {salary} > {threshold}? {above}') print(f' Count: {count_above}') print() print('=== What we CANNOT do with Paillier ===') print(f' Paillier supports: Enc(a) + Enc(b), k * Enc(a)') print(f' Comparison needs: Is Enc(salary) > Enc({threshold})?') print(f' This requires:') print(f' 1. Computing Enc(salary - {threshold}) homomorphically') print(f' 2. Determining the SIGN of the encrypted difference') print(f' 3. Sign extraction needs bit decomposition (many multiplications)') print(f' 4. Paillier cannot multiply two ciphertexts!') print() print(' For encrypted comparisons, you need full FHE (BGV/BFV/CKKS/TFHE).') print(' TFHE is especially efficient for Boolean circuits like comparisons.')

Real Deployments

Encrypted databases are not just theory. Several real systems exist:

SystemApproachWhat It Supports
CryptDB (MIT, 2011)Layered encryption: different schemes for different queriesEquality, order, sum
Microsoft SEALBFV/CKKS libraryArbitrary polynomial queries
Google Private Join and ComputePaillier + MPCAggregation on joined datasets
Enveil ZeroRevealCommercial FHE platformEncrypted search and analytics
TFHE (Zama)Fast bootstrappingGate-by-gate encrypted computation

CryptDB's insight: different query types need different levels of encryption. For equality checks, use deterministic encryption (fast but leaks equality). For sums, use Paillier. For arbitrary queries, use full FHE. CryptDB "peels off" encryption layers as needed --- a practical trade-off between privacy and functionality.

# Summary: what each HE scheme enables for databases print('=== Encrypted Database Query Capabilities ===') print() queries = [ ('SUM / COUNT', 'YES', 'YES', 'YES', 'YES'), ('AVERAGE', 'YES', 'YES', 'YES', 'YES'), ('Weighted sum', 'YES', 'YES', 'YES', 'YES'), ('Equality (WHERE x=5)', 'NO', 'YES', 'Approx', 'YES'), ('Comparison (WHERE x>5)','NO', 'YES', 'Approx', 'YES'), ('JOIN', 'NO', 'Expensive','Expensive','YES'), ('Arbitrary SQL', 'NO', 'YES*', 'YES*', 'YES'), ] for query, pail, bgv, ckks, tfhe in queries: print() print('* = requires sufficient multiplicative depth (large parameters)') print() print('Key insight: the more powerful the query, the more expensive the FHE.') print('Simple aggregation (SUM/AVG) is cheap with Paillier.') print('Full SQL requires deep circuits and either large parameters or bootstrapping.')

Concept Map

Module 11 ConceptDatabase Application
Additive HE (Paillier)SUM, AVERAGE, weighted aggregation
Full FHE (BGV/BFV)Arbitrary SQL on encrypted data
CKKSApproximate queries, ML on encrypted data
TFHEFast Boolean comparisons, encrypted WHERE clauses
Noise managementLimits the complexity of the SQL query
BootstrappingEnables arbitrarily complex queries
Scalar multiplicationWeighted queries (e.g., bonus calculations)

Summary

AspectDetail
ProblemCloud stores data it should never see
PaillierEnables SUM, AVG, weighted sums on encrypted columns
LimitationComparisons and joins need full FHE
CryptDBPractical system using layered encryption for different queries
Trade-offMore powerful queries require more expensive FHE schemes

The math from Module 11 --- Paillier's additive homomorphism, BGV's noise management, CKKS's approximate arithmetic --- directly determines which database queries are possible on encrypted data and how fast they run.


Back to Module 11: Homomorphic Encryption