GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/12-mpc/connect/private-set-intersection.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.0

Connect: Private Set Intersection

Module 12 | Real-World Connections

Two parties find shared contacts without revealing their full lists.

Introduction

Private Set Intersection (PSI) lets two parties, Alice and Bob, discover which elements they have in common without revealing their non-common elements.

Real-world uses:

Contact tracing: determine if two people visited the same location without revealing all their location data.
Ad conversion measurement: an advertiser and a publisher find matching users without sharing their full user lists.
Sanctions screening: a bank checks if any customer appears on a sanctions list without revealing its entire customer database.

The MPC approach encodes sets as polynomials and uses oblivious evaluation to test membership without leaking non-matching elements.

The Idea: Sets as Polynomials

Encode a set $S = \{s_1, s_2, \ldots, s_k\}$ as the polynomial whose roots are the set elements:

P_S(x) = (x - s_1)(x - s_2) \cdots (x - s_k)

Then $P_S(v) = 0$ if and only if $v \in S$ . Set membership reduces to polynomial evaluation.

In [ ]:

# === Setup: encode sets as polynomials over GF(p) ===

p = 1009
F = GF(p)
R.<x> = PolynomialRing(F)

def set_to_poly(elements, ring):
    """Encode a set as the polynomial with those elements as roots."""
    x_var = ring.gen()
    poly = ring(1)
    for elem in elements:
        poly *= (x_var - ring.base_ring()(elem))
    return poly

# Alice's and Bob's sets
alice_set = {1, 3, 5, 7}
bob_set = {2, 3, 6, 7}
true_intersection = alice_set & bob_set

print(f"Alice's set: {sorted(alice_set)}")
print(f"Bob's set:   {sorted(bob_set)}")
print(f"True intersection: {sorted(true_intersection)}")
print()

# Encode Alice's set as a polynomial
P_A = set_to_poly(alice_set, R)
print(f"Alice's polynomial: P_A(x) = {P_A}")
print(f"Degree: {P_A.degree()} (= |Alice's set|)")
print()

# Check: roots are exactly Alice's set elements
for v in range(1, 10):
    result = P_A(F(v))
    in_set = v in alice_set
    print(f"  P_A({v}) = {result}  {'= 0 (in Alice\'s set)' if result == 0 else ''}")

Step 1: Naive PSI (Insecure)

The simplest approach: Alice sends $P_A(x)$ to Bob. Bob evaluates it at each of his elements. If $P_A(b) = 0$ , then $b \in$ Alice's set.

Problem: Bob now has Alice's full polynomial, so he can evaluate it at any element and test membership arbitrarily. This reveals Alice's entire set.

In [ ]:

# === Step 1: Naive (insecure) PSI ===

print("=== Naive PSI (Bob receives P_A directly) ===")
print()
print("Bob evaluates P_A at each of his elements:")
naive_intersection = []
for b in sorted(bob_set):
    result = P_A(F(b))
    in_intersection = (result == F(0))
    if in_intersection:
        naive_intersection.append(b)
    print(f"  P_A({b}) = {result}  {'<-- in intersection!' if in_intersection else ''}")

print(f"\nIntersection found: {naive_intersection}")
print(f"Correct? {set(naive_intersection) == true_intersection}")
print()
print("PROBLEM: Bob can now test ANY element:")
for extra in [1, 5, 9, 100]:
    result = P_A(F(extra))
    print(f"  P_A({extra}) = {result}  {'(in Alice\'s set!)' if result == 0 else ''}")
print("\nBob learns Alice's ENTIRE set, not just the intersection!")

Step 2: Randomized PSI (Hides Non-Matches)

To protect Alice's polynomial, Bob evaluates $P_A$ at his elements but masks the result with a random nonzero value:

For each $b \in$ Bob's set, compute: $v_b = r_b \cdot P_A(b)$ where $r_b \xleftarrow{$} \mathbb{F}_p^*$ .

If $b \in$ Alice's set: $P_A(b) = 0$ , so $v_b = r_b \cdot 0 = 0$ . The zero is preserved.
If $b \notin$ Alice's set: $P_A(b) \neq 0$ , so $v_b = r_b \cdot P_A(b)$ is a uniformly random nonzero element. Bob learns nothing about $P_A(b)$ beyond "nonzero".

But we still need to prevent Bob from having Alice's polynomial directly. The full protocol uses oblivious polynomial evaluation (OPE).

In [ ]:

# === Step 2: Randomized evaluation ===

print("=== Randomized PSI ===")
print()
print("Bob masks each evaluation with a random nonzero scalar:")
rand_intersection = []
for b in sorted(bob_set):
    r_b = F(randint(1, p - 1))  # random nonzero
    v_b = r_b * P_A(F(b))
    in_intersection = (v_b == F(0))
    if in_intersection:
        rand_intersection.append(b)
    print(f"  b = {b}: r = {r_b}, r * P_A({b}) = {v_b}  {'<-- ZERO = match!' if in_intersection else '(random nonzero)'}")

print(f"\nIntersection found: {rand_intersection}")
print(f"Correct? {set(rand_intersection) == true_intersection}")
print()
print("For non-matching elements, the masked value is random.")
print("Bob cannot distinguish between different non-matching elements.")

The remaining problem: Bob should evaluate $P_A$ at his elements without seeing $P_A$ itself, and Alice should not learn which elements Bob is testing.

One approach using additive secret sharing:

Alice secret-shares each coefficient of $P_A(x)$ with Bob.
Bob evaluates the shared polynomial at his element $b$ using only share operations.
Alice adds her random mask and Bob adds his; neither learns the raw evaluation.

We simulate this with a simplified 2-party protocol.

In [ ]:

# === Step 3: Oblivious polynomial evaluation (simplified) ===

def oblivious_poly_eval(P_A_coeffs, bob_element, field):
    """Simplified 2-party oblivious polynomial evaluation.
    
    Alice has polynomial coefficients [a_0, a_1, ..., a_d].
    Bob has an element b.
    Output: Bob learns r * P_A(b) for a random r (zero iff b is a root).
    Alice learns nothing about b.
    """
    d = len(P_A_coeffs) - 1
    b = field(bob_element)
    
    # Step 1: Alice shares each coefficient additively
    # Alice keeps a_i^A, sends a_i^B = a_i - a_i^A to Bob
    alice_coeff_shares = [field.random_element() for _ in range(d + 1)]
    bob_coeff_shares = [P_A_coeffs[i] - alice_coeff_shares[i] for i in range(d + 1)]
    
    # Step 2: Bob evaluates his share of the polynomial at b
    # Bob computes sum(bob_coeff_shares[i] * b^i)
    bob_eval = sum(bob_coeff_shares[i] * b^i for i in range(d + 1))
    
    # Alice evaluates her share at b (but she must NOT learn b!)
    # In a real protocol, Bob would use OT to help Alice evaluate
    # without revealing b. Here we simulate the result.
    alice_eval = sum(alice_coeff_shares[i] * b^i for i in range(d + 1))
    
    # Step 3: Bob's random mask
    r = field(randint(1, field.order() - 1))
    
    # Each party masks their evaluation share
    # In the real protocol, they use OT/secret sharing for this
    raw_eval = alice_eval + bob_eval  # = P_A(b)
    masked_eval = r * raw_eval
    
    return masked_eval, raw_eval

# Get coefficients of P_A
P_A_coeffs = list(P_A)  # [a_0, a_1, ..., a_d]

print("=== Oblivious Polynomial Evaluation ===")
print(f"Alice's polynomial has {len(P_A_coeffs)} coefficients (degree {P_A.degree()})")
print()

ope_intersection = []
for b in sorted(bob_set):
    masked, raw = oblivious_poly_eval(P_A_coeffs, b, F)
    is_match = (masked == F(0))
    if is_match:
        ope_intersection.append(b)
    print(f"  Bob tests b = {b}: masked result = {masked}  {'MATCH (zero)' if is_match else '(random nonzero)'}")

print(f"\nIntersection: {ope_intersection}")
print(f"Correct? {set(ope_intersection) == true_intersection}")
print()
print("In the full protocol:")
print("  - Alice never sees Bob's elements (OT protects Bob's queries).")
print("  - Bob never sees Alice's polynomial (only masked evaluations).")

Step 4: PSI with Larger Sets

Let's try PSI with larger random sets to see it scale.

In [ ]:

# === Step 4: Larger sets ===

# Generate random sets of size ~20 with elements in [1, 100]
random.seed(42)
alice_large = set(sample(range(1, 101), 20))
bob_large = set(sample(range(1, 101), 20))
true_inter_large = alice_large & bob_large

print(f"Alice's set ({len(alice_large)} elements): {sorted(alice_large)}")
print(f"Bob's set ({len(bob_large)} elements):   {sorted(bob_large)}")
print(f"True intersection: {sorted(true_inter_large)}")
print()

# Encode and evaluate
P_large = set_to_poly(alice_large, R)
print(f"Alice's polynomial: degree {P_large.degree()}")

found = []
for b in sorted(bob_large):
    val = P_large(F(b))
    if val == F(0):
        found.append(b)

print(f"PSI result: {found}")
print(f"Correct? {set(found) == true_inter_large}")
print(f"\nComplexity: Alice's polynomial has {P_large.degree() + 1} coefficients.")
print(f"Bob makes {len(bob_large)} evaluations. Total work: O(|A| + |B|) field operations.")

Real-World PSI Deployments

Deployment	Protocol	Use Case
Apple CSAM detection	PSI variant	Compare device hashes against known CSAM database
Google ads	PSI-CA (cardinality)	Measure ad conversion without sharing user lists
COVID contact tracing	PSI	Match location histories without revealing all visits
Password breach checking	PSI	Check if a password hash appears in a breach database

Modern PSI protocols handle billions of elements using hashing-based techniques (e.g., cuckoo hashing + OT extension) rather than polynomial encoding for efficiency. The polynomial approach we showed captures the core idea.

Concept Map: Module 12 $\to$ Private Set Intersection

Module 12 Concept	PSI Application
Polynomial evaluation	Set membership test: $P_S(v) = 0 \iff v \in S$
Additive secret sharing	Sharing polynomial coefficients between parties
Oblivious transfer	Protecting Bob's query elements from Alice
Random masking	Hiding non-matching evaluations from Bob
MPC composition	Combining OT + sharing + masking into a full protocol

In [ ]:

# === Exercise: Asymmetric PSI ===

# In many real scenarios, one set is much larger than the other.
# A server has a database of 100 elements; a client has 5 elements.

random.seed(123)
server_set = set(sample(range(1, 501), 100))
client_set = set(sample(range(1, 501), 5))
true_inter_asym = server_set & client_set

print(f"Server database: {len(server_set)} elements")
print(f"Client query: {sorted(client_set)}")
print(f"True intersection: {sorted(true_inter_asym)}")
print()

# Encode the SERVER's set as a polynomial (larger set)
P_server = set_to_poly(server_set, R)
print(f"Server polynomial: degree {P_server.degree()}")

# Client evaluates at their (small) set
found_asym = []
for c in sorted(client_set):
    r = F(randint(1, p - 1))  # random mask
    masked_val = r * P_server(F(c))
    is_match = (masked_val == F(0))
    if is_match:
        found_asym.append(c)
    print(f"  Client tests {c}: {'MATCH' if is_match else 'no match'}")

print(f"\nPSI result: {found_asym}")
print(f"Correct? {set(found_asym) == true_inter_asym}")
print(f"\nThe client makes only {len(client_set)} evaluations, regardless of server size.")

Summary

Concept	Key idea
Sets as polynomials	Encode a set as a polynomial whose roots are the set elements
Membership test	Evaluate the polynomial at a candidate: zero means a match
Random masking	Multiply non-matching evaluations by random scalars to hide them
Oblivious evaluation	Use secret sharing and OT so neither party sees the other's raw input
Scalability	Production systems add cuckoo hashing and OT extension to handle billions of elements

PSI is one of the most widely deployed MPC primitives, used in ad measurement, contact tracing, and password breach checking. The polynomial approach captures the mathematical core; production systems add hashing and OT extension for scalability to billions of elements.

Back to Module 12: Multi-Party Computation

Connect: Private Set Intersection

Introduction

The Idea: Sets as Polynomials

Step 1: Naive PSI (Insecure)

Step 2: Randomized PSI (Hides Non-Matches)

Step 4: PSI with Larger Sets

Real-World PSI Deployments

Concept Map: Module 12 $\to$ Private Set Intersection

Summary

Product

Resources

Company

Connect: Private Set Intersection

Introduction

The Idea: Sets as Polynomials

Step 1: Naive PSI (Insecure)

Step 2: Randomized PSI (Hides Non-Matches)

Step 3: Oblivious Polynomial Evaluation via Secret Sharing

Step 4: PSI with Larger Sets

Real-World PSI Deployments

Concept Map: Module 12 →\to→ Private Set Intersection

Summary

Concept Map: Module 12 $\to$ Private Set Intersection