GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/11-homomorphic-encryption/connect/fhe-private-ml.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.0

Connect: FHE in Privacy-Preserving Machine Learning

Module 11 | Real-World Connections

Hospitals want cloud ML on patient data without exposing it. FHE makes this possible: encrypt the features, send ciphertexts to the cloud, run the model homomorphically, decrypt only the result.

Introduction

A hospital has patient health data (blood pressure, cholesterol, BMI, etc.) and wants a cloud provider to run a diagnostic ML model on this data. The problem: sending raw patient data to the cloud violates privacy regulations (HIPAA, GDPR).

FHE solution:

Hospital encrypts each patient's features with FHE.
Cloud receives only ciphertexts --- never sees the raw data.
Cloud evaluates the ML model homomorphically on the ciphertexts.
Cloud returns encrypted predictions.
Hospital decrypts to get the prediction --- cloud never learned anything.

In this notebook, we'll implement this workflow using Paillier encryption (from Notebook 11b), which supports addition and scalar multiplication --- enough for linear models.

Step 1: Set Up Paillier Encryption

We reuse the Paillier implementation from Notebook 11b. Paillier gives us:

$\text{Enc}(m_1) \cdot \text{Enc}(m_2) = \text{Enc}(m_1 + m_2) \pmod{n^2}$ (homomorphic addition)
$\text{Enc}(m)^k = \text{Enc}(k \cdot m) \pmod{n^2}$ (scalar multiplication)

These two operations are exactly what we need for linear models: $\hat{y} = w_1 x_1 + w_2 x_2 + \cdots + w_d x_d + b$ .

In [ ]:


# === Paillier key generation ===
p_pail, q_pail = 17, 19
n = p_pail * q_pail       # 323
n2 = n^2                  # 104329
lam = lcm(p_pail - 1, q_pail - 1)  # 144
g = n + 1

def L(x, n):
    return (x - 1) // n

mu = inverse_mod(L(power_mod(g, lam, n2), n), n)

def paillier_encrypt(m, n, g, n2):
    """Encrypt m (0 <= m < n) with random r."""
    r = randint(1, n - 1)
    while gcd(r, n) != 1:
        r = randint(1, n - 1)
    return (power_mod(g, m % n, n2) * power_mod(r, n, n2)) % n2

def paillier_decrypt(c, lam, mu, n, n2):
    """Decrypt ciphertext c."""
    x = power_mod(c, lam, n2)
    return (L(x, n) * mu) % n

def paillier_add(c1, c2, n2):
    """Homomorphic addition: Enc(m1) * Enc(m2) = Enc(m1 + m2)."""
    return (c1 * c2) % n2

def paillier_scalar_mul(c, k, n2):
    """Scalar multiplication: Enc(m)^k = Enc(k*m)."""
    return power_mod(c, k, n2)

# Verify
m_test = 42
c_test = paillier_encrypt(m_test, n, g, n2)
d_test = paillier_decrypt(c_test, lam, mu, n, n2)
print(f'Paillier setup: n = {n}, n^2 = {n2}')
print(f'Encrypt({m_test}) -> Decrypt = {d_test} (correct: {d_test == m_test})')

Step 2: Encrypt Patient Data

We have 5 patients, each with 3 features (blood pressure, cholesterol, BMI index). The hospital encrypts all features before sending them to the cloud.

Note: In real systems, features would be scaled to integers. We use small integers here for clarity.

In [ ]:

# Patient dataset (plaintext, hospital side)
patients = [
    {'name': 'Patient A', 'bp': 12, 'chol': 20, 'bmi': 25},
    {'name': 'Patient B', 'bp': 14, 'chol': 22, 'bmi': 30},
    {'name': 'Patient C', 'bp': 11, 'chol': 18, 'bmi': 22},
    {'name': 'Patient D', 'bp': 16, 'chol': 25, 'bmi': 35},
    {'name': 'Patient E', 'bp': 13, 'chol': 19, 'bmi': 27},
]

print('=== Plaintext Patient Data (hospital only) ===')
for p in patients:

# Encrypt all features
enc_patients = []
for p in patients:
    enc_p = {
        'name': p['name'],
        'bp':   paillier_encrypt(p['bp'], n, g, n2),
        'chol': paillier_encrypt(p['chol'], n, g, n2),
        'bmi':  paillier_encrypt(p['bmi'], n, g, n2),
    }
    enc_patients.append(enc_p)

print()
print('=== Encrypted Data (sent to cloud) ===')
for ep in enc_patients:

print()
print('The cloud sees only ciphertext values. No patient data is exposed.')

Step 3: Cloud Computes the Linear Model Homomorphically

The cloud has the ML model weights (these are public --- only the data is private):

\text{risk\_score} = w_1 \cdot \text{BP} + w_2 \cdot \text{Chol} + w_3 \cdot \text{BMI} + b

Using Paillier:

$w_i \cdot \text{Enc}(x_i)$ = Enc(x_i)^{w_i} (scalar multiplication)
$\text{Enc}(w_1 x_1) + \text{Enc}(w_2 x_2)$ = Enc(w_1 x_1) * Enc(w_2 x_2) (addition)
Bias $b$ : Enc(w_1 x_1 + w_2 x_2 + ...) * Enc(b) (add encrypted bias)

In [ ]:

# Model weights (public, known to the cloud)
w_bp = 3     # weight for blood pressure
w_chol = 2   # weight for cholesterol
w_bmi = 1    # weight for BMI
bias = 10    # intercept

print(f'Model: risk = {w_bp}*BP + {w_chol}*Chol + {w_bmi}*BMI + {bias}')
print()

# === Cloud side: homomorphic evaluation (no secret key!) ===
enc_predictions = []
for ep in enc_patients:
    # Scalar multiply each encrypted feature by its weight
    term_bp   = paillier_scalar_mul(ep['bp'], w_bp, n2)
    term_chol = paillier_scalar_mul(ep['chol'], w_chol, n2)
    term_bmi  = paillier_scalar_mul(ep['bmi'], w_bmi, n2)
    
    # Encrypt the bias (cloud can do this since bias is public)
    enc_bias = paillier_encrypt(bias, n, g, n2)
    
    # Sum all terms: Enc(w1*x1) * Enc(w2*x2) * Enc(w3*x3) * Enc(b)
    enc_score = paillier_add(term_bp, term_chol, n2)
    enc_score = paillier_add(enc_score, term_bmi, n2)
    enc_score = paillier_add(enc_score, enc_bias, n2)
    
    enc_predictions.append(enc_score)

print('Cloud computed encrypted predictions (never saw the plaintext data).')
print()
for ep, enc_pred in zip(enc_patients, enc_predictions):
    print(f'  {ep["name"]}: Enc(risk_score) = {enc_pred}')

Step 4: Hospital Decrypts the Results

Only the hospital (key holder) can decrypt the predictions. Let's verify they match the cleartext computation.

In [ ]:

# === Hospital side: decrypt predictions ===

all_correct = True
for p, enc_pred in zip(patients, enc_predictions):
    # Decrypt the homomorphic result
    fhe_result = paillier_decrypt(enc_pred, lam, mu, n, n2)
    
    # Cleartext computation for verification
    clear_result = w_bp * p['bp'] + w_chol * p['chol'] + w_bmi * p['bmi'] + bias
    
    match = (fhe_result == clear_result % n)
    all_correct = all_correct and match

print(f'\nAll predictions correct: {all_correct}')
print()
print('The cloud computed the CORRECT ML predictions without ever seeing')
print('any patient data. This is the power of homomorphic encryption!')

Limitations: Paillier vs Full FHE

Paillier only supports addition and scalar multiplication (linear operations). This is sufficient for:

Linear regression
Weighted sums and averages
Simple statistics (mean, variance with a trick)

But real ML models need nonlinear operations:

Neural networks need activation functions (ReLU, sigmoid)
Decision trees need comparisons
Polynomial regression needs multiplication of encrypted values

For these, you need full FHE (BGV, BFV, or CKKS).

In [ ]:

# What Paillier CAN and CANNOT do
print('=== What Paillier Can Compute Homomorphically ===')
print()

operations = [
    ('Sum of encrypted values',       'Enc(a) * Enc(b) = Enc(a+b)',    True),
    ('Weighted sum (linear model)',    'Enc(x)^w = Enc(w*x)',           True),
    ('Average (sum / count)',          'Decrypt sum, divide by n',      True),
    ('Product of encrypted values',   'Enc(a) * Enc(b) = Enc(a*b)?',   False),
    ('Comparison (a > b?)',            'Requires multiplication depth', False),
    ('ReLU activation',               'max(0, x) needs comparison',    False),
    ('Sigmoid activation',            'Polynomial approximation',      False),
    ('Polynomial of degree > 1',      'x^2 needs Enc(x) * Enc(x)',    False),
]

for op, detail, supported in operations:
    icon = 'YES' if supported else 'NO '
    print(f'  [{icon}] {op}')
    print(f'        {detail}')

print()
print('For neural networks and complex models, you need BGV/BFV/CKKS.')
print('CKKS is especially popular for ML because it supports approximate')
print('arithmetic on real numbers, which is what ML models naturally use.')

CKKS for ML: Approximate Arithmetic

The CKKS scheme (Cheon-Kim-Kim-Song, 2017) was designed specifically for approximate computation --- exactly what ML needs. Key features:

Feature	CKKS	BFV/BGV
Message type	Real/complex numbers	Integers mod $t$
Arithmetic	Approximate (small error tolerated)	Exact
Suited for	Neural networks, statistics	Counting, voting, exact queries
Noise handling	Noise becomes part of approximation	Noise must stay below threshold

CKKS enables encrypted inference on neural networks with polynomial activation function approximations (e.g., approximate ReLU with a low-degree polynomial).

Production deployments:

Crypto-NN (CryptoNets): first encrypted neural network inference (2016)
nGraph-HE: Intel's framework for encrypted deep learning
Concrete ML (Zama): compiles scikit-learn and PyTorch models to FHE

Concept Map

Module 11 Concept	ML Application
Paillier (additive HE)	Linear regression, weighted sums, averages
BGV/BFV (integer FHE)	Decision trees, exact classification
CKKS (approximate FHE)	Neural networks, floating-point ML
Noise budget	Limits the depth of the ML model (number of layers)
Bootstrapping	Enables arbitrarily deep neural networks
Scalar multiplication	Applying model weights to encrypted features
Homomorphic addition	Summing weighted features (dot product)

In [ ]:

# Summary of the privacy-preserving ML pipeline
print('=== Privacy-Preserving ML Pipeline ===')
print()
print('Step 1: Hospital encrypts patient data with FHE')
print('        [BP=12, Chol=20, BMI=25] --> [Enc(12), Enc(20), Enc(25)]')
print()
print('Step 2: Cloud receives ONLY ciphertexts')
print('        Cloud sees: [82341, 19472, 63918]  (meaningless numbers)')
print()
print('Step 3: Cloud evaluates ML model homomorphically')
print('        Enc(risk) = Enc(12)^3 * Enc(20)^2 * Enc(25)^1 * Enc(10)')
print('                  = Enc(3*12 + 2*20 + 1*25 + 10)')
print('                  = Enc(111)')
print()
print('Step 4: Hospital decrypts the prediction')
print('        Dec(Enc(111)) = 111')
print()
print('Result: Cloud computed the correct diagnosis (risk score = 111)')
print('        without ever seeing blood pressure, cholesterol, or BMI.')
print()
print('This is not science fiction --- production systems like Zama\'s Concrete ML')
print('and Microsoft SEAL make this possible TODAY, with BFV/CKKS for full models.')

Summary

Aspect	Detail
Problem	Cloud ML on sensitive data violates privacy
Solution	Encrypt data with FHE, compute model homomorphically
Paillier	Supports linear models (addition + scalar multiply)
CKKS	Supports neural networks (approximate floating-point FHE)
Trade-off	10,000x--1,000,000x slowdown vs. cleartext computation
Reality	Production systems exist (SEAL, Concrete ML, nGraph-HE)

FHE for ML is the "holy grail" of privacy-preserving computation: the cloud provides compute power, the hospital keeps data private, and the patient gets a correct diagnosis. The math from Module 11 --- additive homomorphism, noise budgets, bootstrapping --- is what makes this possible.

Back to Module 11: Homomorphic Encryption