GitHub Repository: freebsd/freebsd-src
Path: blob/main/crypto/openssl/doc/designs/quic-design/record-layer.md
³⁴⁸⁷⁷ views

Design Problem: Abstract Record Layer

This document covers the design of an abstract record layer for use in (D)TLS. The QUIC record layer is handled separately.

A record within this document refers to a packet of data. It will typically contain some header data and some payload data, and will often be cryptographically protected. A record may or may not have a one-to-one correspondence with network packets, depending on the implementation details of an individual record layer.

The term record comes directly from the TLS and DTLS specifications.

Libssl supports a number of different types of record layer, and record layer variants:

Standard TLS record layer
Standard DTLS record layer
Kernel TLS record layer

Within the TLS record layer there are options to handle "multiblock" and "pipelining" which are different approaches for supporting the reading or writing of multiple records at the same time. All record layer variants also have to be able to handle different protocol versions.

These different record layer implementations, variants and protocol versions have each been added at different times and over many years. The result is that each took slightly different approaches for achieving the goals that were appropriate at the time and the integration points where they were added were spread throughout the code.

The introduction of QUIC support will see the implementation of a new record layer, i.e. the QUIC-TLS record layer. This refers to the "inner" TLS implementation used by QUIC. Records here will be in the form of QUIC CRYPTO frames.

Requirements

The technical requirements document lists these requirements that are relevant to the record layer:

The current libssl record layer includes support for TLS, DTLS and KTLS. QUIC will introduce another variant and there may be more over time. The OMC requires a pluggable record layer interface to be implemented to enable this to be less intrusive, more maintainable, and to harmonize the existing record layer interactions between TLS, DTLS, KTLS and the planned QUIC protocols. The pluggable record layer interface will be internal only for MVP and be public in a future release.
The minimum viable product (MVP) for the next release is a pluggable record layer interface and a single stream QUIC client in the form of s_client that does not require significant API changes. In the MVP, interoperability should be prioritized over strict standards compliance.
Once we have a fully functional QUIC implementation (in a subsequent release), it should be possible for external libraries to be able to use the pluggable record layer interface and it should offer a stable ABI (via a provider).

The MVP requirements are:

a pluggable record layer (not public for MVP)

Candidate Solutions that were considered

This section outlines two different solution approaches that were considered for the abstract record layer

Use a METHOD based approach

A METHOD based approach is simply a structure containing function pointers. It is a common pattern in the OpenSSL codebase. Different strategies for implementing a METHOD can be employed, but these differences are hidden from the caller of the METHOD.

In this solution we would seek to implement a different METHOD for each of the types of record layer that we support, i.e. there would be one for the standard TLS record layer, one for the standard DTLS record layer, one for kernel TLS and one for QUIC-TLS.

In the MVP the METHOD approach would be private. However, once it has stabilised, it would be straight forward to supply public functions to enable end user applications to construct their own METHODs.

This option is simpler to implement than the alternative of having a provider based approach. However it could be used as a "stepping stone" for that, i.e. the MVP could implement a METHOD based approach, and subsequent releases could convert the METHODs into fully fetchable algorithms.

Pros:

Simple approach that has been used historically in OpenSSL
Could be used as the basis for the final public solution
Could also be used as the basis for a fetchable solution in a subsequent release
If this option is later converted to a fetchable solution then much of the effort involved in making the record layer fetchable can be deferred to a later release

Cons:

Not consistent with the provider based approach we used for extensibility in 3.0
If this option is implemented and later converted to a fetchable solution then some rework might be required

Use a provider based approach

This approach is very similar to the alternative METHOD based approach. The main difference is that the record layer implementations would be held in providers and "fetched" in much the same way that cryptographic algorithms are fetched in OpenSSL 3.0.

This approach is more consistent with the approach adopted for extensibility in 3.0. METHODS are being deprecated with providers being used extensively.

Complex objects (e.g. an SSL object) cannot be passed across the libssl/provider boundary. This imposes some restrictions on the design of the functions that can be implemented. Additionally implementing the infrastructure for a new fetchable operation is more involved than a METHOD based approach.

Pros:

Consistent with the extensibility solution used in 3.0
If this option is implemented immediately in the MVP then it would avoid later rework if adopted in a subsequent release

Cons:

More complicated to implement than the simple METHOD based approach
Cannot pass complex objects across the provider boundary

Selected solution

The METHOD based approach has been selected for MVP, with the expectation that subsequent releases will convert it to a full provider based solution accessible to third party applications.

Solution Description: The METHOD based approach

This section focuses on the selected approach of using METHODs and further elaborates on how the design works.

A proposed internal record method API is given in Appendix A.

An OSSL_RECORD_METHOD represents the implementation of a particular type of record layer. It contains a set of function pointers to represent the various actions that can be performed by a record layer.

An OSSL_RECORD_LAYER object represents a specific instantiation of a particular OSSL_RECORD_METHOD. It contains the state used by that OSSL_RECORD_METHOD for a specific connection (i.e. SSL object). Any SSL object will have at least 2 OSSL_RECORD_LAYER objects associated with it - one for reading and one for writing. In some cases there may be more than 2 - for example in DTLS it may be necessary to retransmit records from a previous epoch. There will be different OSSL_RECORD_LAYER objects for different protection levels or epochs. It may be that different OSSL_RECORD_METHODs are used for different protection levels. For example a connection might start using the standard TLS record layer during the handshake, and later transition to using the kernel TLS record layer once the handshake is complete.

A new OSSL_RECORD_LAYER is created by calling the new function of the associated OSSL_RECORD_METHOD, and freed by calling the free function. The parameters to the new function also supply all of the cryptographic state (e.g. keys, ivs, symmetric encryption algorithms, hash algorithm etc) used by the record layer. The internal structure details of an OSSL_RECORD_LAYER are entirely hidden to the rest of libssl and can be specific to the given OSSL_RECORD_METHOD. In practice the standard internal TLS, DTLS and KTLS OSSL_RECORD_METHODs all use a common OSSL_RECORD_LAYER structure. However the QUIC-TLS implementation is likely to use a different structure layout.

All of the header and payload data for a single record will be represented by an OSSL_RECORD_TEMPLATE structure when writing. Libssl will construct a set of templates for records to be written out and pass them to the "write" record layer. In most cases only a single record is ever written out at one time, however there are some cases (such as when using the "pipelining" or "multibuffer" optimisations) that multiple records can be written in one go.

It is the record layer's responsibility to know whether it can support multiple records in one go or not. It is libssl's responsibility to split the payload data into OSSL_RECORD_TEMPLATE objects. Libssl will call the record layer's get_max_records() function to determine how many records a given payload should be split into. If that value is more than one, then libssl will construct (up to) that number of OSSL_RECORD_TEMPLATEs and pass the whole set to the record layer's write_records() function.

The implementation of the write_records function must construct the appropriate number of records, apply protection to them as required and then write them out to the underlying transport layer BIO. In the event that not all the data can be transmitted at the current time (e.g. because the underlying transport has indicated a retry), then the write_records function will return a "retry" response. It is permissible for the data to be partially sent, but this is still considered a "retry" until all of the data is sent.

On a success or retry response libssl may free its buffers immediately. The OSSL_RECORD_LAYER object will have to buffer any untransmitted data until it is eventually sent.

If a "retry" occurs, then libssl will subsequently call retry_write_records and continue to do so until a success return value is received. Libssl will never call write_records a second time until a previous call to write_records or retry_write_records has indicated success.

Libssl will read records by calling the read_record function. The OSSL_RECORD_LAYER may read multiple records in one go and buffer them, but the read_record function only ever returns one record at a time. The OSSL_RECORD_LAYER object owns the buffers for the record that has been read and supplies a pointer into that buffer back to libssl for the payload data, as well as other information about the record such as its length and the type of data contained in it. Each record has an associated opaque handle rechandle. The record data must remain buffered by the OSSL_RECORD_LAYER until it has been released via a call to release_record().

A record layer implementation supplies various functions to enable libssl to query the current state. In particular:

unprocessed_read_pending(): to query whether there is data buffered that has already been read from the underlying BIO, but not yet processed.

processed_read_pending(): to query whether there is data buffered that has been read from the underlying BIO and has been processed. The data is not necessarily application data.

app_data_pending(): to query the amount of processed application data that is buffered and available for immediate read.

get_alert_code(): to query the alert code that should be used in the event that a previous attempt to read or write records failed.

get_state(): to obtain a printable string to describe the current state of the record layer.

get_compression(): to obtain information about the compression method currently being used by the record layer.

get_max_record_overhead(): to obtain the maximum amount of bytes the record layer will add to the payload bytes before transmission. This does not include any expansion that might occur during compression. Currently this is only implemented for DTLS.

In addition, libssl will tell the record layer about various events that might occur that are relevant to the record layer's operation:

set1_bio(): called if the underlying BIO being used by the record layer has been changed.

set_protocol_version(): called during protocol version negotiation when a specific protocol version has been selected.

set_plain_alerts(): to indicate that receiving unencrypted alerts is allowed in the current context, even if normally we would expect to receive encrypted data. This is only relevant for TLSv1.3.

set_first_handshake(): called at the beginning and end of the first handshake for any given (D)TLS connection.

set_max_pipelines(): called to configure the maximum number of pipelines of data that the record layer should process in one go. By default this is 1.

set_in_init(): called by libssl to tell the record layer whether we are currently in_init or not. Defaults to "true".

set_options(): called by libssl in the event that the current set of options to use has been updated.

set_max_frag_len(): called by libssl to set the maximum allowed fragment length that is in force at the moment. This might be the result of user configuration, or it may be negotiated during the handshake.

increment_sequence_ctr(): force the record layer to increment its sequence counter. In most cases the record layer will entirely manage its own sequence counters. However in the DTLSv1_listen() corner case, libssl needs to initialise the record layer with an incremented sequence counter.

alloc_buffers(): called by libssl to request that the record layer allocate its buffers. This is a hint only and the record layer is expected to manage its own buffer allocation and freeing.

free_buffers(): called by libssl to request that the record layer free its buffers. This is a hint only and the record layer is expected to manage its own buffer allocation and freeing.

Appendix A: The internal record method API

The internal recordmethod.h header file for the record method API:

/*
 * We use the term "record" here to refer to a packet of data. Records are
 * typically protected via a cipher and MAC, or an AEAD cipher (although not
 * always). This usage of the term record is consistent with the TLS concept.
 * In QUIC the term "record" is not used but it is analogous to the QUIC term
 * "packet". The interface in this file applies to all protocols that protect
 * records/packets of data, i.e. (D)TLS and QUIC. The term record is used to
 * refer to both contexts.
 */

/*
 * An OSSL_RECORD_METHOD is a protocol specific method which provides the
 * functions for reading and writing records for that protocol. Which
 * OSSL_RECORD_METHOD to use for a given protocol is defined by the SSL_METHOD.
 */
typedef struct ossl_record_method_st OSSL_RECORD_METHOD;

/*
 * An OSSL_RECORD_LAYER is just an externally defined opaque pointer created by
 * the method
 */
typedef struct ossl_record_layer_st OSSL_RECORD_LAYER;


# define OSSL_RECORD_ROLE_CLIENT 0
# define OSSL_RECORD_ROLE_SERVER 1

# define OSSL_RECORD_DIRECTION_READ  0
# define OSSL_RECORD_DIRECTION_WRITE 1

/*
 * Protection level. For <= TLSv1.2 only "NONE" and "APPLICATION" are used.
 */
# define OSSL_RECORD_PROTECTION_LEVEL_NONE        0
# define OSSL_RECORD_PROTECTION_LEVEL_EARLY       1
# define OSSL_RECORD_PROTECTION_LEVEL_HANDSHAKE   2
# define OSSL_RECORD_PROTECTION_LEVEL_APPLICATION 3

# define OSSL_RECORD_RETURN_SUCCESS           1
# define OSSL_RECORD_RETURN_RETRY             0
# define OSSL_RECORD_RETURN_NON_FATAL_ERR    -1
# define OSSL_RECORD_RETURN_FATAL            -2
# define OSSL_RECORD_RETURN_EOF              -3

/*
 * Template for creating a record. A record consists of the |type| of data it
 * will contain (e.g. alert, handshake, application data, etc) along with a
 * buffer of payload data in |buf| of length |buflen|.
 */
struct ossl_record_template_st {
    int type;
    unsigned int version;
    const unsigned char *buf;
    size_t buflen;
};

typedef struct ossl_record_template_st OSSL_RECORD_TEMPLATE;

/*
 * Rather than a "method" approach, we could make this fetchable - Should we?
 * There could be some complexity in finding suitable record layer implementations
 * e.g. we need to find one that matches the negotiated protocol, cipher,
 * extensions, etc. The selection_cb approach given above doesn't work so well
 * if unknown third party providers with OSSL_RECORD_METHOD implementations are
 * loaded.
 */

/*
 * If this becomes public API then we will need functions to create and
 * free an OSSL_RECORD_METHOD, as well as functions to get/set the various
 * function pointers....unless we make it fetchable.
 */
struct ossl_record_method_st {
    /*
     * Create a new OSSL_RECORD_LAYER object for handling the protocol version
     * set by |vers|. |role| is 0 for client and 1 for server. |direction|
     * indicates either read or write. |level| is the protection level as
     * described above. |settings| are mandatory settings that will cause the
     * new() call to fail if they are not understood (for example to require
     * Encrypt-Then-Mac support). |options| are optional settings that will not
     * cause the new() call to fail if they are not understood (for example
     * whether to use "read ahead" or not).
     *
     * The BIO in |transport| is the BIO for the underlying transport layer.
     * Where the direction is "read", then this BIO will only ever be used for
     * reading data. Where the direction is "write", then this BIO will only
     * every be used for writing data.
     *
     * An SSL object will always have at least 2 OSSL_RECORD_LAYER objects in
     * force at any one time (one for reading and one for writing). In some
     * protocols more than 2 might be used (e.g. in DTLS for retransmitting
     * messages from an earlier epoch).
     *
     * The created OSSL_RECORD_LAYER object is stored in *ret on success (or
     * NULL otherwise). The return value will be one of
     * OSSL_RECORD_RETURN_SUCCESS, OSSL_RECORD_RETURN_FATAL or
     * OSSL_RECORD_RETURN_NON_FATAL. A non-fatal return means that creation of
     * the record layer has failed because it is unsuitable, but an alternative
     * record layer can be tried instead.
     */

    /*
     * If we eventually make this fetchable then we will need to use something
     * other than EVP_CIPHER. Also mactype would not be a NID, but a string. For
     * now though, this works.
     */
    int (*new_record_layer)(OSSL_LIB_CTX *libctx,
                            const char *propq, int vers,
                            int role, int direction,
                            int level,
                            uint16_t epoch,
                            unsigned char *key,
                            size_t keylen,
                            unsigned char *iv,
                            size_t ivlen,
                            unsigned char *mackey,
                            size_t mackeylen,
                            const EVP_CIPHER *ciph,
                            size_t taglen,
                            int mactype,
                            const EVP_MD *md,
                            COMP_METHOD *comp,
                            BIO *prev,
                            BIO *transport,
                            BIO *next,
                            BIO_ADDR *local,
                            BIO_ADDR *peer,
                            const OSSL_PARAM *settings,
                            const OSSL_PARAM *options,
                            const OSSL_DISPATCH *fns,
                            void *cbarg,
                            OSSL_RECORD_LAYER **ret);
    int (*free)(OSSL_RECORD_LAYER *rl);

    int (*reset)(OSSL_RECORD_LAYER *rl); /* Is this needed? */

    /* Returns 1 if we have unprocessed data buffered or 0 otherwise */
    int (*unprocessed_read_pending)(OSSL_RECORD_LAYER *rl);

    /*
     * Returns 1 if we have processed data buffered that can be read or 0 otherwise
     * - not necessarily app data
     */
    int (*processed_read_pending)(OSSL_RECORD_LAYER *rl);

    /*
     * The amount of processed app data that is internally buffered and
     * available to read
     */
    size_t (*app_data_pending)(OSSL_RECORD_LAYER *rl);

    /*
     * Find out the maximum number of records that the record layer is prepared
     * to process in a single call to write_records. It is the caller's
     * responsibility to ensure that no call to write_records exceeds this
     * number of records. |type| is the type of the records that the caller
     * wants to write, and |len| is the total amount of data that it wants
     * to send. |maxfrag| is the maximum allowed fragment size based on user
     * configuration, or TLS parameter negotiation. |*preffrag| contains on
     * entry the default fragment size that will actually be used based on user
     * configuration. This will always be less than or equal to |maxfrag|. On
     * exit the record layer may update this to an alternative fragment size to
     * be used. This must always be less than or equal to |maxfrag|.
     */
    size_t (*get_max_records)(OSSL_RECORD_LAYER *rl, uint8_t type, size_t len,
                              size_t maxfrag, size_t *preffrag);

    /*
     * Write |numtempl| records from the array of record templates pointed to
     * by |templates|. Each record should be no longer than the value returned
     * by get_max_record_len(), and there should be no more records than the
     * value returned by get_max_records().
     * Where possible the caller will attempt to ensure that all records are the
     * same length, except the last record. This may not always be possible so
     * the record method implementation should not rely on this being the case.
     * In the event of a retry the caller should call retry_write_records()
     * to try again. No more calls to write_records() should be attempted until
     * retry_write_records() returns success.
     * Buffers allocated for the record templates can be freed immediately after
     * write_records() returns - even in the case a retry.
     * The record templates represent the plaintext payload. The encrypted
     * output is written to the |transport| BIO.
     * Returns:
     *  1 on success
     *  0 on retry
     * -1 on failure
     */
    int (*write_records)(OSSL_RECORD_LAYER *rl, OSSL_RECORD_TEMPLATE *templates,
                         size_t numtempl);

    /*
     * Retry a previous call to write_records. The caller should continue to
     * call this until the function returns with success or failure. After
     * each retry more of the data may have been incrementally sent.
     * Returns:
     *  1 on success
     *  0 on retry
     * -1 on failure
     */
    int (*retry_write_records)(OSSL_RECORD_LAYER *rl);

    /*
     * Read a record and return the record layer version and record type in
     * the |rversion| and |type| parameters. |*data| is set to point to a
     * record layer buffer containing the record payload data and |*datalen|
     * is filled in with the length of that data. The |epoch| and |seq_num|
     * values are only used if DTLS has been negotiated. In that case they are
     * filled in with the epoch and sequence number from the record.
     * An opaque record layer handle for the record is returned in |*rechandle|
     * which is used in a subsequent call to |release_record|. The buffer must
     * remain available until release_record is called.
     *
     * Internally the OSSL_RECORD_METHOD the implementation may read/process
     * multiple records in one go and buffer them.
     */
    int (*read_record)(OSSL_RECORD_LAYER *rl, void **rechandle, int *rversion,
                      uint8_t *type, unsigned char **data, size_t *datalen,
                      uint16_t *epoch, unsigned char *seq_num);
    /*
     * Release a buffer associated with a record previously read with
     * read_record. Records are guaranteed to be released in the order that they
     * are read.
     */
    int (*release_record)(OSSL_RECORD_LAYER *rl, void *rechandle);

    /*
     * In the event that a fatal error is returned from the functions above then
     * get_alert_code() can be called to obtain a more details identifier for
     * the error. In (D)TLS this is the alert description code.
     */
    int (*get_alert_code)(OSSL_RECORD_LAYER *rl);

    /*
     * Update the transport BIO from the one originally set in the
     * new_record_layer call
     */
    int (*set1_bio)(OSSL_RECORD_LAYER *rl, BIO *bio);

    /* Called when protocol negotiation selects a protocol version to use */
    int (*set_protocol_version)(OSSL_RECORD_LAYER *rl, int version);

    /*
     * Whether we are allowed to receive unencrypted alerts, even if we might
     * otherwise expect encrypted records. Ignored by protocol versions where
     * this isn't relevant
     */
    void (*set_plain_alerts)(OSSL_RECORD_LAYER *rl, int allow);

    /*
     * Called immediately after creation of the record layer if we are in a
     * first handshake. Also called at the end of the first handshake
     */
    void (*set_first_handshake)(OSSL_RECORD_LAYER *rl, int first);

    /*
     * Set the maximum number of pipelines that the record layer should process.
     * The default is 1.
     */
    void (*set_max_pipelines)(OSSL_RECORD_LAYER *rl, size_t max_pipelines);

    /*
     * Called to tell the record layer whether we are currently "in init" or
     * not. Default at creation of the record layer is "yes".
     */
    void (*set_in_init)(OSSL_RECORD_LAYER *rl, int in_init);

    /*
     * Get a short or long human readable description of the record layer state
     */
    void (*get_state)(OSSL_RECORD_LAYER *rl, const char **shortstr,
                      const char **longstr);

    /*
     * Set new options or modify ones that were originally specified in the
     * new_record_layer call.
     */
    int (*set_options)(OSSL_RECORD_LAYER *rl, const OSSL_PARAM *options);

    const COMP_METHOD *(*get_compression)(OSSL_RECORD_LAYER *rl);

    /*
     * Set the maximum fragment length to be used for the record layer. This
     * will override any previous value supplied for the "max_frag_len"
     * setting during construction of the record layer.
     */
    void (*set_max_frag_len)(OSSL_RECORD_LAYER *rl, size_t max_frag_len);

    /*
     * The maximum expansion in bytes that the record layer might add while
     * writing a record
     */
    size_t (*get_max_record_overhead)(OSSL_RECORD_LAYER *rl);

    /*
     * Increment the record sequence number
     */
    int (*increment_sequence_ctr)(OSSL_RECORD_LAYER *rl);

    /*
     * Allocate read or write buffers. Does nothing if already allocated.
     * Assumes default buffer length and 1 pipeline.
     */
    int (*alloc_buffers)(OSSL_RECORD_LAYER *rl);

    /*
     * Free read or write buffers. Fails if there is pending read or write
     * data. Buffers are automatically reallocated on next read/write.
     */
    int (*free_buffers)(OSSL_RECORD_LAYER *rl);
};