Path: blob/master/site/en-snapshot/federated/program/federated_program.md
25118 views
Federated Program
This documentation is for anyone who is interested in a high-level overview of federated program concepts. It assumes knowledge of TensorFlow Federated, especially its type system.
For more information about federated program, see:
[TOC]
What is a federated program?
A federated program is a program that executes computations and other processing logic in a federated environment.
More, specifically a federated program:
executes computations
using program logic
given parameters set by the program
and parameters set by the customer
and may materialize data in [platform storage](#platform storage) to:
use in Python logic
implement [fault tolerance](#fault tolerance)
and may release data to [customer storage](#customer storage)
Defining these concepts and abstractions make it possible to describe the relationships between the components of a federated program and allows these components to be owned and authored by different roles. This decoupling allows developers to compose federated program using components that are shared with other federated programs, typically this means executing the same program logic on many different platforms.
TFF's federated program library (tff.program) defines the abstractions required to create a federated program and provides platform-agnostic components.
Components
The components of TFF's federated program library are designed so they can be owned and authored by different roles.
Note: This is a high-level overview of the components, see tff.program for documentation of specific APIs.
Program
The program is a Python binary that:
defines parameters (e.g. flags)
constructs platform-specific components and platform-agnostic components
executes computations using program logic in a federated context
For example:
Parameters
The parameters are the inputs to the program, these inputs may be set by the customer, if they are exposed as flags, or they may be set by the program. In the example above, output_dir
is a parameter that is set by the customer, and total_rounds
and num_clients
are parameters set by the program.
Platform-Specific Components
The platform-specific components are the components provided by a platform implementing the abstract interfaces defined by TFF's federated program library.
Platform-Agnostic Components
The platform-agnostic components are the components provided by a library (e.g. TFF) implementing the abstract interfaces defined by TFF's federated program library.
Computations
The computations are implementations of the abstract interface tff.Computation
.
For example, in the TFF platform you can use the tff.tf_computation
or tff.federated_computation
decorators to create a tff.framework.ConcreteComputation
:
See life of a computation for more information.
Program Logic
The program logic is a Python function that takes as an input:
parameters set by the customer and the program
and performs some operations, which typically includes:
executing computations
executing Python logic
materializing data in [platform storage](#platform storage) to:
use in Python logic
implement [fault tolerance](#fault tolerance)
and may yields some output, which typically includes:
For example:
Roles
There are three roles that are useful to define when discussing federated programs: the customer, the platform, and the library. Each of these roles owns and authors some of the components used to create a federated program. However, it is possible for a single entity or group to fulfill multiple roles.
Customer
The customer typically:
owns customer storage
launches the program
but may:
Platform
The platform typically:
owns platform storage
authors platform-specific components
but may:
Library
A library typically:
authors platform-agnostic components
authors computations
authors program logic
Concepts
There are a few concepts that are useful to define when discussing federated programs.
Customer Storage
Customer storage is storage that the customer has read and write access to and that the platform has write access to.
Platform Storage
Platform storage is storage that only the platform has read and write access to.
Release
Releasing a value makes the value available to customer storage (e.g. publishing the value to a dashboard, logging the value, or writing the value to disk).
Materialize
Materializing a value reference makes the referenced value available to the program. Often materializing a value reference is required to release the value or to make program logic fault tolerant.
Fault Tolerance
Fault tolerance is the capability of the program logic to recover from a failure when executing a computations. For example, if you successfully train the first 90 rounds out of 100 and then experience a failure, is the program logic capable of resuming training from round 91 or does training need to be restarted at round 1?