Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
duyuefeng0708
GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/11-homomorphic-encryption/connect/fhe-private-ml.ipynb
483 views
unlisted
Kernel: SageMath 10.0

Connect: FHE in Privacy-Preserving Machine Learning

Module 11 | Real-World Connections

Hospitals want cloud ML on patient data without exposing it. FHE makes this possible: encrypt the features, send ciphertexts to the cloud, run the model homomorphically, decrypt only the result.

Introduction

A hospital has patient health data (blood pressure, cholesterol, BMI, etc.) and wants a cloud provider to run a diagnostic ML model on this data. The problem: sending raw patient data to the cloud violates privacy regulations (HIPAA, GDPR).

FHE solution:

  1. Hospital encrypts each patient's features with FHE.

  2. Cloud receives only ciphertexts --- never sees the raw data.

  3. Cloud evaluates the ML model homomorphically on the ciphertexts.

  4. Cloud returns encrypted predictions.

  5. Hospital decrypts to get the prediction --- cloud never learned anything.

In this notebook, we'll implement this workflow using Paillier encryption (from Notebook 11b), which supports addition and scalar multiplication --- enough for linear models.

Step 1: Set Up Paillier Encryption

We reuse the Paillier implementation from Notebook 11b. Paillier gives us:

  • Enc(m1)Enc(m2)=Enc(m1+m2)(modn2)\text{Enc}(m_1) \cdot \text{Enc}(m_2) = \text{Enc}(m_1 + m_2) \pmod{n^2} (homomorphic addition)

  • Enc(m)k=Enc(km)(modn2)\text{Enc}(m)^k = \text{Enc}(k \cdot m) \pmod{n^2} (scalar multiplication)

These two operations are exactly what we need for linear models: y^=w1x1+w2x2++wdxd+b\hat{y} = w_1 x_1 + w_2 x_2 + \cdots + w_d x_d + b.

# === Paillier key generation === p_pail, q_pail = 17, 19 n = p_pail * q_pail # 323 n2 = n^2 # 104329 lam = lcm(p_pail - 1, q_pail - 1) # 144 g = n + 1 def L(x, n): return (x - 1) // n mu = inverse_mod(L(power_mod(g, lam, n2), n), n) def paillier_encrypt(m, n, g, n2): """Encrypt m (0 <= m < n) with random r.""" r = randint(1, n - 1) while gcd(r, n) != 1: r = randint(1, n - 1) return (power_mod(g, m % n, n2) * power_mod(r, n, n2)) % n2 def paillier_decrypt(c, lam, mu, n, n2): """Decrypt ciphertext c.""" x = power_mod(c, lam, n2) return (L(x, n) * mu) % n def paillier_add(c1, c2, n2): """Homomorphic addition: Enc(m1) * Enc(m2) = Enc(m1 + m2).""" return (c1 * c2) % n2 def paillier_scalar_mul(c, k, n2): """Scalar multiplication: Enc(m)^k = Enc(k*m).""" return power_mod(c, k, n2) # Verify m_test = 42 c_test = paillier_encrypt(m_test, n, g, n2) d_test = paillier_decrypt(c_test, lam, mu, n, n2) print(f'Paillier setup: n = {n}, n^2 = {n2}') print(f'Encrypt({m_test}) -> Decrypt = {d_test} (correct: {d_test == m_test})')

Step 2: Encrypt Patient Data

We have 5 patients, each with 3 features (blood pressure, cholesterol, BMI index). The hospital encrypts all features before sending them to the cloud.

Note: In real systems, features would be scaled to integers. We use small integers here for clarity.

# Patient dataset (plaintext, hospital side) patients = [ {'name': 'Patient A', 'bp': 12, 'chol': 20, 'bmi': 25}, {'name': 'Patient B', 'bp': 14, 'chol': 22, 'bmi': 30}, {'name': 'Patient C', 'bp': 11, 'chol': 18, 'bmi': 22}, {'name': 'Patient D', 'bp': 16, 'chol': 25, 'bmi': 35}, {'name': 'Patient E', 'bp': 13, 'chol': 19, 'bmi': 27}, ] print('=== Plaintext Patient Data (hospital only) ===') for p in patients: # Encrypt all features enc_patients = [] for p in patients: enc_p = { 'name': p['name'], 'bp': paillier_encrypt(p['bp'], n, g, n2), 'chol': paillier_encrypt(p['chol'], n, g, n2), 'bmi': paillier_encrypt(p['bmi'], n, g, n2), } enc_patients.append(enc_p) print() print('=== Encrypted Data (sent to cloud) ===') for ep in enc_patients: print() print('The cloud sees only ciphertext values. No patient data is exposed.')

Step 3: Cloud Computes the Linear Model Homomorphically

The cloud has the ML model weights (these are public --- only the data is private):

risk_score=w1BP+w2Chol+w3BMI+b\text{risk\_score} = w_1 \cdot \text{BP} + w_2 \cdot \text{Chol} + w_3 \cdot \text{BMI} + b

Using Paillier:

  • wiEnc(xi)w_i \cdot \text{Enc}(x_i) = Enc(x_i)^{w_i} (scalar multiplication)

  • Enc(w1x1)+Enc(w2x2)\text{Enc}(w_1 x_1) + \text{Enc}(w_2 x_2) = Enc(w_1 x_1) * Enc(w_2 x_2) (addition)

  • Bias bb: Enc(w_1 x_1 + w_2 x_2 + ...) * Enc(b) (add encrypted bias)

# Model weights (public, known to the cloud) w_bp = 3 # weight for blood pressure w_chol = 2 # weight for cholesterol w_bmi = 1 # weight for BMI bias = 10 # intercept print(f'Model: risk = {w_bp}*BP + {w_chol}*Chol + {w_bmi}*BMI + {bias}') print() # === Cloud side: homomorphic evaluation (no secret key!) === enc_predictions = [] for ep in enc_patients: # Scalar multiply each encrypted feature by its weight term_bp = paillier_scalar_mul(ep['bp'], w_bp, n2) term_chol = paillier_scalar_mul(ep['chol'], w_chol, n2) term_bmi = paillier_scalar_mul(ep['bmi'], w_bmi, n2) # Encrypt the bias (cloud can do this since bias is public) enc_bias = paillier_encrypt(bias, n, g, n2) # Sum all terms: Enc(w1*x1) * Enc(w2*x2) * Enc(w3*x3) * Enc(b) enc_score = paillier_add(term_bp, term_chol, n2) enc_score = paillier_add(enc_score, term_bmi, n2) enc_score = paillier_add(enc_score, enc_bias, n2) enc_predictions.append(enc_score) print('Cloud computed encrypted predictions (never saw the plaintext data).') print() for ep, enc_pred in zip(enc_patients, enc_predictions): print(f' {ep["name"]}: Enc(risk_score) = {enc_pred}')

Step 4: Hospital Decrypts the Results

Only the hospital (key holder) can decrypt the predictions. Let's verify they match the cleartext computation.

# === Hospital side: decrypt predictions === all_correct = True for p, enc_pred in zip(patients, enc_predictions): # Decrypt the homomorphic result fhe_result = paillier_decrypt(enc_pred, lam, mu, n, n2) # Cleartext computation for verification clear_result = w_bp * p['bp'] + w_chol * p['chol'] + w_bmi * p['bmi'] + bias match = (fhe_result == clear_result % n) all_correct = all_correct and match print(f'\nAll predictions correct: {all_correct}') print() print('The cloud computed the CORRECT ML predictions without ever seeing') print('any patient data. This is the power of homomorphic encryption!')

Limitations: Paillier vs Full FHE

Paillier only supports addition and scalar multiplication (linear operations). This is sufficient for:

  • Linear regression

  • Weighted sums and averages

  • Simple statistics (mean, variance with a trick)

But real ML models need nonlinear operations:

  • Neural networks need activation functions (ReLU, sigmoid)

  • Decision trees need comparisons

  • Polynomial regression needs multiplication of encrypted values

For these, you need full FHE (BGV, BFV, or CKKS).

# What Paillier CAN and CANNOT do print('=== What Paillier Can Compute Homomorphically ===') print() operations = [ ('Sum of encrypted values', 'Enc(a) * Enc(b) = Enc(a+b)', True), ('Weighted sum (linear model)', 'Enc(x)^w = Enc(w*x)', True), ('Average (sum / count)', 'Decrypt sum, divide by n', True), ('Product of encrypted values', 'Enc(a) * Enc(b) = Enc(a*b)?', False), ('Comparison (a > b?)', 'Requires multiplication depth', False), ('ReLU activation', 'max(0, x) needs comparison', False), ('Sigmoid activation', 'Polynomial approximation', False), ('Polynomial of degree > 1', 'x^2 needs Enc(x) * Enc(x)', False), ] for op, detail, supported in operations: icon = 'YES' if supported else 'NO ' print(f' [{icon}] {op}') print(f' {detail}') print() print('For neural networks and complex models, you need BGV/BFV/CKKS.') print('CKKS is especially popular for ML because it supports approximate') print('arithmetic on real numbers, which is what ML models naturally use.')

CKKS for ML: Approximate Arithmetic

The CKKS scheme (Cheon-Kim-Kim-Song, 2017) was designed specifically for approximate computation --- exactly what ML needs. Key features:

FeatureCKKSBFV/BGV
Message typeReal/complex numbersIntegers mod tt
ArithmeticApproximate (small error tolerated)Exact
Suited forNeural networks, statisticsCounting, voting, exact queries
Noise handlingNoise becomes part of approximationNoise must stay below threshold

CKKS enables encrypted inference on neural networks with polynomial activation function approximations (e.g., approximate ReLU with a low-degree polynomial).

Production deployments:

  • Crypto-NN (CryptoNets): first encrypted neural network inference (2016)

  • nGraph-HE: Intel's framework for encrypted deep learning

  • Concrete ML (Zama): compiles scikit-learn and PyTorch models to FHE

Concept Map

Module 11 ConceptML Application
Paillier (additive HE)Linear regression, weighted sums, averages
BGV/BFV (integer FHE)Decision trees, exact classification
CKKS (approximate FHE)Neural networks, floating-point ML
Noise budgetLimits the depth of the ML model (number of layers)
BootstrappingEnables arbitrarily deep neural networks
Scalar multiplicationApplying model weights to encrypted features
Homomorphic additionSumming weighted features (dot product)
# Summary of the privacy-preserving ML pipeline print('=== Privacy-Preserving ML Pipeline ===') print() print('Step 1: Hospital encrypts patient data with FHE') print(' [BP=12, Chol=20, BMI=25] --> [Enc(12), Enc(20), Enc(25)]') print() print('Step 2: Cloud receives ONLY ciphertexts') print(' Cloud sees: [82341, 19472, 63918] (meaningless numbers)') print() print('Step 3: Cloud evaluates ML model homomorphically') print(' Enc(risk) = Enc(12)^3 * Enc(20)^2 * Enc(25)^1 * Enc(10)') print(' = Enc(3*12 + 2*20 + 1*25 + 10)') print(' = Enc(111)') print() print('Step 4: Hospital decrypts the prediction') print(' Dec(Enc(111)) = 111') print() print('Result: Cloud computed the correct diagnosis (risk score = 111)') print(' without ever seeing blood pressure, cholesterol, or BMI.') print() print('This is not science fiction --- production systems like Zama\'s Concrete ML') print('and Microsoft SEAL make this possible TODAY, with BFV/CKKS for full models.')

Summary

AspectDetail
ProblemCloud ML on sensitive data violates privacy
SolutionEncrypt data with FHE, compute model homomorphically
PaillierSupports linear models (addition + scalar multiply)
CKKSSupports neural networks (approximate floating-point FHE)
Trade-off10,000x--1,000,000x slowdown vs. cleartext computation
RealityProduction systems exist (SEAL, Concrete ML, nGraph-HE)

FHE for ML is the "holy grail" of privacy-preserving computation: the cloud provides compute power, the hospital keeps data private, and the patient gets a correct diagnosis. The math from Module 11 --- additive homomorphism, noise budgets, bootstrapping --- is what makes this possible.


Back to Module 11: Homomorphic Encryption