Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
tensorflow
GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/en-snapshot/federated/collaborations/notes/2022-07-28.md
21377 views

Notes from the 7/28/2022 meeting of TFF collaborators

  • New people

  • Let’s all be on the Discord server to facilitate conversations interactively

    • Ping Krzys to become a Contributor to be able to post

  • SIG Federated

  • Discussion of free-riding and data poisoning in x-silo, discussion led by LinkedIn (context from use cases identified by LinkedIn unless specified otherwise):

    • Free riding - certain tenants not contributing to the group, so diluting benefit

      • Could be intentional or unintentional

      • Focus on the unintentional at this point - this is the case we’re interested in at LinkedIn primarily

      • Could be a simple as a participant not having enough data, or data that is not useful in training

        • Currently thinking of modeling this as an anomaly detection problem

        • Comparing against majority contrbiution works if it’s the ase for minority of the data

        • Another approach: multiple federated models, built with or without contributions from a given participant; observe which ones make progress, and exclude participants based on that

      • Some freeriders could be contributing garbage data

        • Harder to model as anomaly detection

        • Same approach as above

    • Poisoning

      • Likewise, could be intentional or not

      • Focus on the unintentional - larger tenants can overwhelm the group and bias the model towards their contributions

      • For scenarios of interest, this bears similarities to the freerider problem

      • Relevant techniques in distributed byzantine training

        • E.g., instead of average, could adopt a median to add some robustness against poisoning

    • Do we see these problems occuring elsewhere, is it worth contributing such logic to the ecosystem?

      • Yes! Common problems to see in adversarial settings, where silos interests may not be aligned (contributions incur computation cost and require resources)

    • How can we measure the impact of freeloading or poisoning?

      • Per contribution vs. in aggregate - ideas above point to the latter

    • Observation: one of the features of TFF is parameterizable and stateful aggregations that can maintain their own internal state and update that state as they aggregate.

    • Thoguhts on the tradeoffs and synergies with other goals (e.g., DP)

      • DP can definitely help with poisoning

      • Question about DP in the contetx of freloading - still an open question

    • We found data poisoning attacks could have negligible impact

  • Write up with ideas with more details on the above and proposals for components to add to the TFF ecosystem from LinkedIn upcoming

  • See more discussion on Discord

  • Next meeting in 2 weeks