Path: blob/main/frontier/12-mpc/connect/private-set-intersection.ipynb
483 views
Connect: Private Set Intersection
Module 12 | Real-World Connections
Two parties find shared contacts without revealing their full lists.
Introduction
Private Set Intersection (PSI) lets two parties, Alice and Bob, discover which elements they have in common without revealing their non-common elements.
Real-world uses:
Contact tracing: determine if two people visited the same location without revealing all their location data.
Ad conversion measurement: an advertiser and a publisher find matching users without sharing their full user lists.
Sanctions screening: a bank checks if any customer appears on a sanctions list without revealing its entire customer database.
The MPC approach encodes sets as polynomials and uses oblivious evaluation to test membership without leaking non-matching elements.
The Idea: Sets as Polynomials
Encode a set as the polynomial whose roots are the set elements:
Then if and only if . Set membership reduces to polynomial evaluation.
Step 1: Naive PSI (Insecure)
The simplest approach: Alice sends to Bob. Bob evaluates it at each of his elements. If , then Alice's set.
Problem: Bob now has Alice's full polynomial, so he can evaluate it at any element and test membership arbitrarily. This reveals Alice's entire set.
Step 2: Randomized PSI (Hides Non-Matches)
To protect Alice's polynomial, Bob evaluates at his elements but masks the result with a random nonzero value:
For each Bob's set, compute: where .
If Alice's set: , so . The zero is preserved.
If Alice's set: , so is a uniformly random nonzero element. Bob learns nothing about beyond "nonzero".
But we still need to prevent Bob from having Alice's polynomial directly. The full protocol uses oblivious polynomial evaluation (OPE).
Step 3: Oblivious Polynomial Evaluation via Secret Sharing
The remaining problem: Bob should evaluate at his elements without seeing itself, and Alice should not learn which elements Bob is testing.
One approach using additive secret sharing:
Alice secret-shares each coefficient of with Bob.
Bob evaluates the shared polynomial at his element using only share operations.
Alice adds her random mask and Bob adds his; neither learns the raw evaluation.
We simulate this with a simplified 2-party protocol.
Step 4: PSI with Larger Sets
Let's try PSI with larger random sets to see it scale.
Real-World PSI Deployments
| Deployment | Protocol | Use Case |
|---|---|---|
| Apple CSAM detection | PSI variant | Compare device hashes against known CSAM database |
| Google ads | PSI-CA (cardinality) | Measure ad conversion without sharing user lists |
| COVID contact tracing | PSI | Match location histories without revealing all visits |
| Password breach checking | PSI | Check if a password hash appears in a breach database |
Modern PSI protocols handle billions of elements using hashing-based techniques (e.g., cuckoo hashing + OT extension) rather than polynomial encoding for efficiency. The polynomial approach we showed captures the core idea.
Concept Map: Module 12 Private Set Intersection
| Module 12 Concept | PSI Application |
|---|---|
| Polynomial evaluation | Set membership test: |
| Additive secret sharing | Sharing polynomial coefficients between parties |
| Oblivious transfer | Protecting Bob's query elements from Alice |
| Random masking | Hiding non-matching evaluations from Bob |
| MPC composition | Combining OT + sharing + masking into a full protocol |
Summary
| Concept | Key idea |
|---|---|
| Sets as polynomials | Encode a set as a polynomial whose roots are the set elements |
| Membership test | Evaluate the polynomial at a candidate: zero means a match |
| Random masking | Multiply non-matching evaluations by random scalars to hide them |
| Oblivious evaluation | Use secret sharing and OT so neither party sees the other's raw input |
| Scalability | Production systems add cuckoo hashing and OT extension to handle billions of elements |
PSI is one of the most widely deployed MPC primitives, used in ad measurement, contact tracing, and password breach checking. The polynomial approach captures the mathematical core; production systems add hashing and OT extension for scalability to billions of elements.