Path: blob/master/site/en-snapshot/federated/collaborations/notes/2022-07-28.md
21377 views
Notes from the 7/28/2022 meeting of TFF collaborators
New people
Let’s all be on the Discord server to facilitate conversations interactively
Ping Krzys to become a Contributor to be able to post
Discussion of free-riding and data poisoning in x-silo, discussion led by LinkedIn (context from use cases identified by LinkedIn unless specified otherwise):
Free riding - certain tenants not contributing to the group, so diluting benefit
Could be intentional or unintentional
Focus on the unintentional at this point - this is the case we’re interested in at LinkedIn primarily
Could be a simple as a participant not having enough data, or data that is not useful in training
Currently thinking of modeling this as an anomaly detection problem
Comparing against majority contrbiution works if it’s the ase for minority of the data
Another approach: multiple federated models, built with or without contributions from a given participant; observe which ones make progress, and exclude participants based on that
Some freeriders could be contributing garbage data
Harder to model as anomaly detection
Same approach as above
Poisoning
Likewise, could be intentional or not
Focus on the unintentional - larger tenants can overwhelm the group and bias the model towards their contributions
For scenarios of interest, this bears similarities to the freerider problem
Relevant techniques in distributed byzantine training
E.g., instead of average, could adopt a median to add some robustness against poisoning
Do we see these problems occuring elsewhere, is it worth contributing such logic to the ecosystem?
Yes! Common problems to see in adversarial settings, where silos interests may not be aligned (contributions incur computation cost and require resources)
How can we measure the impact of freeloading or poisoning?
Per contribution vs. in aggregate - ideas above point to the latter
Observation: one of the features of TFF is parameterizable and stateful aggregations that can maintain their own internal state and update that state as they aggregate.
E.g. federated_aggregate
Thoguhts on the tradeoffs and synergies with other goals (e.g., DP)
DP can definitely help with poisoning
Question about DP in the contetx of freloading - still an open question
We found data poisoning attacks could have negligible impact
E.g., see https://arxiv.org/pdf/2108.10241.pdf
Important to provide such a feature as a part of a cros-silo FL platform regardless of magnitude of impact
Write up with ideas with more details on the above and proposals for components to add to the TFF ecosystem from LinkedIn upcoming
See more discussion on Discord
Next meeting in 2 weeks