Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
duyuefeng0708
GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/frontier/12-mpc/connect/private-set-intersection.ipynb
483 views
unlisted
Kernel: SageMath 10.0

Connect: Private Set Intersection

Module 12 | Real-World Connections

Two parties find shared contacts without revealing their full lists.

Introduction

Private Set Intersection (PSI) lets two parties, Alice and Bob, discover which elements they have in common without revealing their non-common elements.

Real-world uses:

  • Contact tracing: determine if two people visited the same location without revealing all their location data.

  • Ad conversion measurement: an advertiser and a publisher find matching users without sharing their full user lists.

  • Sanctions screening: a bank checks if any customer appears on a sanctions list without revealing its entire customer database.

The MPC approach encodes sets as polynomials and uses oblivious evaluation to test membership without leaking non-matching elements.

The Idea: Sets as Polynomials

Encode a set S={s1,s2,,sk}S = \{s_1, s_2, \ldots, s_k\} as the polynomial whose roots are the set elements:

PS(x)=(xs1)(xs2)(xsk)P_S(x) = (x - s_1)(x - s_2) \cdots (x - s_k)

Then PS(v)=0P_S(v) = 0 if and only if vSv \in S. Set membership reduces to polynomial evaluation.

# === Setup: encode sets as polynomials over GF(p) === p = 1009 F = GF(p) R.<x> = PolynomialRing(F) def set_to_poly(elements, ring): """Encode a set as the polynomial with those elements as roots.""" x_var = ring.gen() poly = ring(1) for elem in elements: poly *= (x_var - ring.base_ring()(elem)) return poly # Alice's and Bob's sets alice_set = {1, 3, 5, 7} bob_set = {2, 3, 6, 7} true_intersection = alice_set & bob_set print(f"Alice's set: {sorted(alice_set)}") print(f"Bob's set: {sorted(bob_set)}") print(f"True intersection: {sorted(true_intersection)}") print() # Encode Alice's set as a polynomial P_A = set_to_poly(alice_set, R) print(f"Alice's polynomial: P_A(x) = {P_A}") print(f"Degree: {P_A.degree()} (= |Alice's set|)") print() # Check: roots are exactly Alice's set elements for v in range(1, 10): result = P_A(F(v)) in_set = v in alice_set print(f" P_A({v}) = {result} {'= 0 (in Alice\'s set)' if result == 0 else ''}")

Step 1: Naive PSI (Insecure)

The simplest approach: Alice sends PA(x)P_A(x) to Bob. Bob evaluates it at each of his elements. If PA(b)=0P_A(b) = 0, then bb \in Alice's set.

Problem: Bob now has Alice's full polynomial, so he can evaluate it at any element and test membership arbitrarily. This reveals Alice's entire set.

# === Step 1: Naive (insecure) PSI === print("=== Naive PSI (Bob receives P_A directly) ===") print() print("Bob evaluates P_A at each of his elements:") naive_intersection = [] for b in sorted(bob_set): result = P_A(F(b)) in_intersection = (result == F(0)) if in_intersection: naive_intersection.append(b) print(f" P_A({b}) = {result} {'<-- in intersection!' if in_intersection else ''}") print(f"\nIntersection found: {naive_intersection}") print(f"Correct? {set(naive_intersection) == true_intersection}") print() print("PROBLEM: Bob can now test ANY element:") for extra in [1, 5, 9, 100]: result = P_A(F(extra)) print(f" P_A({extra}) = {result} {'(in Alice\'s set!)' if result == 0 else ''}") print("\nBob learns Alice's ENTIRE set, not just the intersection!")

Step 2: Randomized PSI (Hides Non-Matches)

To protect Alice's polynomial, Bob evaluates PAP_A at his elements but masks the result with a random nonzero value:

For each bb \in Bob's set, compute: vb=rbPA(b)v_b = r_b \cdot P_A(b) where rb$Fpr_b \xleftarrow{$} \mathbb{F}_p^*.

  • If bb \in Alice's set: PA(b)=0P_A(b) = 0, so vb=rb0=0v_b = r_b \cdot 0 = 0. The zero is preserved.

  • If bb \notin Alice's set: PA(b)0P_A(b) \neq 0, so vb=rbPA(b)v_b = r_b \cdot P_A(b) is a uniformly random nonzero element. Bob learns nothing about PA(b)P_A(b) beyond "nonzero".

But we still need to prevent Bob from having Alice's polynomial directly. The full protocol uses oblivious polynomial evaluation (OPE).

# === Step 2: Randomized evaluation === print("=== Randomized PSI ===") print() print("Bob masks each evaluation with a random nonzero scalar:") rand_intersection = [] for b in sorted(bob_set): r_b = F(randint(1, p - 1)) # random nonzero v_b = r_b * P_A(F(b)) in_intersection = (v_b == F(0)) if in_intersection: rand_intersection.append(b) print(f" b = {b}: r = {r_b}, r * P_A({b}) = {v_b} {'<-- ZERO = match!' if in_intersection else '(random nonzero)'}") print(f"\nIntersection found: {rand_intersection}") print(f"Correct? {set(rand_intersection) == true_intersection}") print() print("For non-matching elements, the masked value is random.") print("Bob cannot distinguish between different non-matching elements.")

Step 3: Oblivious Polynomial Evaluation via Secret Sharing

The remaining problem: Bob should evaluate PAP_A at his elements without seeing PAP_A itself, and Alice should not learn which elements Bob is testing.

One approach using additive secret sharing:

  1. Alice secret-shares each coefficient of PA(x)P_A(x) with Bob.

  2. Bob evaluates the shared polynomial at his element bb using only share operations.

  3. Alice adds her random mask and Bob adds his; neither learns the raw evaluation.

We simulate this with a simplified 2-party protocol.

# === Step 3: Oblivious polynomial evaluation (simplified) === def oblivious_poly_eval(P_A_coeffs, bob_element, field): """Simplified 2-party oblivious polynomial evaluation. Alice has polynomial coefficients [a_0, a_1, ..., a_d]. Bob has an element b. Output: Bob learns r * P_A(b) for a random r (zero iff b is a root). Alice learns nothing about b. """ d = len(P_A_coeffs) - 1 b = field(bob_element) # Step 1: Alice shares each coefficient additively # Alice keeps a_i^A, sends a_i^B = a_i - a_i^A to Bob alice_coeff_shares = [field.random_element() for _ in range(d + 1)] bob_coeff_shares = [P_A_coeffs[i] - alice_coeff_shares[i] for i in range(d + 1)] # Step 2: Bob evaluates his share of the polynomial at b # Bob computes sum(bob_coeff_shares[i] * b^i) bob_eval = sum(bob_coeff_shares[i] * b^i for i in range(d + 1)) # Alice evaluates her share at b (but she must NOT learn b!) # In a real protocol, Bob would use OT to help Alice evaluate # without revealing b. Here we simulate the result. alice_eval = sum(alice_coeff_shares[i] * b^i for i in range(d + 1)) # Step 3: Bob's random mask r = field(randint(1, field.order() - 1)) # Each party masks their evaluation share # In the real protocol, they use OT/secret sharing for this raw_eval = alice_eval + bob_eval # = P_A(b) masked_eval = r * raw_eval return masked_eval, raw_eval # Get coefficients of P_A P_A_coeffs = list(P_A) # [a_0, a_1, ..., a_d] print("=== Oblivious Polynomial Evaluation ===") print(f"Alice's polynomial has {len(P_A_coeffs)} coefficients (degree {P_A.degree()})") print() ope_intersection = [] for b in sorted(bob_set): masked, raw = oblivious_poly_eval(P_A_coeffs, b, F) is_match = (masked == F(0)) if is_match: ope_intersection.append(b) print(f" Bob tests b = {b}: masked result = {masked} {'MATCH (zero)' if is_match else '(random nonzero)'}") print(f"\nIntersection: {ope_intersection}") print(f"Correct? {set(ope_intersection) == true_intersection}") print() print("In the full protocol:") print(" - Alice never sees Bob's elements (OT protects Bob's queries).") print(" - Bob never sees Alice's polynomial (only masked evaluations).")

Step 4: PSI with Larger Sets

Let's try PSI with larger random sets to see it scale.

# === Step 4: Larger sets === # Generate random sets of size ~20 with elements in [1, 100] random.seed(42) alice_large = set(sample(range(1, 101), 20)) bob_large = set(sample(range(1, 101), 20)) true_inter_large = alice_large & bob_large print(f"Alice's set ({len(alice_large)} elements): {sorted(alice_large)}") print(f"Bob's set ({len(bob_large)} elements): {sorted(bob_large)}") print(f"True intersection: {sorted(true_inter_large)}") print() # Encode and evaluate P_large = set_to_poly(alice_large, R) print(f"Alice's polynomial: degree {P_large.degree()}") found = [] for b in sorted(bob_large): val = P_large(F(b)) if val == F(0): found.append(b) print(f"PSI result: {found}") print(f"Correct? {set(found) == true_inter_large}") print(f"\nComplexity: Alice's polynomial has {P_large.degree() + 1} coefficients.") print(f"Bob makes {len(bob_large)} evaluations. Total work: O(|A| + |B|) field operations.")

Real-World PSI Deployments

DeploymentProtocolUse Case
Apple CSAM detectionPSI variantCompare device hashes against known CSAM database
Google adsPSI-CA (cardinality)Measure ad conversion without sharing user lists
COVID contact tracingPSIMatch location histories without revealing all visits
Password breach checkingPSICheck if a password hash appears in a breach database

Modern PSI protocols handle billions of elements using hashing-based techniques (e.g., cuckoo hashing + OT extension) rather than polynomial encoding for efficiency. The polynomial approach we showed captures the core idea.

Concept Map: Module 12 \to Private Set Intersection

Module 12 ConceptPSI Application
Polynomial evaluationSet membership test: PS(v)=0    vSP_S(v) = 0 \iff v \in S
Additive secret sharingSharing polynomial coefficients between parties
Oblivious transferProtecting Bob's query elements from Alice
Random maskingHiding non-matching evaluations from Bob
MPC compositionCombining OT + sharing + masking into a full protocol
# === Exercise: Asymmetric PSI === # In many real scenarios, one set is much larger than the other. # A server has a database of 100 elements; a client has 5 elements. random.seed(123) server_set = set(sample(range(1, 501), 100)) client_set = set(sample(range(1, 501), 5)) true_inter_asym = server_set & client_set print(f"Server database: {len(server_set)} elements") print(f"Client query: {sorted(client_set)}") print(f"True intersection: {sorted(true_inter_asym)}") print() # Encode the SERVER's set as a polynomial (larger set) P_server = set_to_poly(server_set, R) print(f"Server polynomial: degree {P_server.degree()}") # Client evaluates at their (small) set found_asym = [] for c in sorted(client_set): r = F(randint(1, p - 1)) # random mask masked_val = r * P_server(F(c)) is_match = (masked_val == F(0)) if is_match: found_asym.append(c) print(f" Client tests {c}: {'MATCH' if is_match else 'no match'}") print(f"\nPSI result: {found_asym}") print(f"Correct? {set(found_asym) == true_inter_asym}") print(f"\nThe client makes only {len(client_set)} evaluations, regardless of server size.")

Summary

ConceptKey idea
Sets as polynomialsEncode a set as a polynomial whose roots are the set elements
Membership testEvaluate the polynomial at a candidate: zero means a match
Random maskingMultiply non-matching evaluations by random scalars to hide them
Oblivious evaluationUse secret sharing and OT so neither party sees the other's raw input
ScalabilityProduction systems add cuckoo hashing and OT extension to handle billions of elements

PSI is one of the most widely deployed MPC primitives, used in ad measurement, contact tracing, and password breach checking. The polynomial approach captures the mathematical core; production systems add hashing and OT extension for scalability to billions of elements.


Back to Module 12: Multi-Party Computation