Path: blob/main/course-contents/2024-02-22--notes-ECC--shannon+block.md
917 views
------\newcommand{\F}{\mathbb{F}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\NN}{\mathbb{N}} \newcommand{\e}{\mathbf{e}} \newcommand{\h}{\mathbf{h}} \newcommand{\bb}{\mathbf{b}} \newcommand{\Null}{\operatorname{Null}}
\newcommand{\PP}{\mathbb{P}} \newcommand{\vv}{\mathbf{v}} \newcommand{\dist}{\operatorname{dist}}
Overview
So far, we have chosen to use the term code for a vector subspace of ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^n. The idea is that we are interested in encoding some data by identifying it with vectors in ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^k.
If is a generator matrix for our code in standard form, then we encode our data: given a vector ParseError: KaTeX parse error: Undefined control sequence: \F at position 7: v \in \̲F̲_q^k, the encoded version is ParseError: KaTeX parse error: Undefined control sequence: \F at position 15: v \cdot G \in \̲F̲_q^n.
Note that -- since is in standard form -- if then for some scalars ParseError: KaTeX parse error: Undefined control sequence: \F at position 9: w_j \in \̲F̲_q.
Our intent is to "transmit" this encoded data , possibly through some noisy channels that may result in transmission errors. At the other end, some vector in ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^n is received, and the hope is to recover the vector from the received vector .
The field of Information Theory formulates a way to reason about such systems; we are going to sketch a rudimentary description.
Noisy Channels and Shannon's Theorem
We consider finite sets and (say and ).
We consider random variables on and on . In particular, we may consider probabilities:
$$P(X=s) = p_s, \quad \text{the probability that the random var $Xs \in SParseError: KaTeX parse error: Expected 'EOF', got '}' at position 1: }̲$
$$P(Y=t) = q_t, \quad \text{the probability that the random var $Yt \in TParseError: KaTeX parse error: Expected 'EOF', got '}' at position 1: }̲$
We view the elements of the set as the data we send through some transmission channel, and as the data we receive.
In the case of our linear codes ParseError: KaTeX parse error: Undefined control sequence: \F at position 11: C \subset \̲F̲_q^n described above, would be ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^k and would be ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^n.
Block codes
More generally, we consider block codes . Here is an alphabet, and the code words in are just -tuples of elements from . We write . We say that the length of the code is .
In this setting, for our transmission channel we take and .
Channel
A channel for transmission amounts to the following matrix with rows indexed by the set and columns indexed by the set : i.e. the conditional probability that given that .
Example : An example is the binary symmetric channel, where and the channel matrix is given by $$(p_{st}) = \begin{bmatrix} \phi & 1 - \phi \\ 1 - \phi & \phi \end{bmatrix} \quad \text{for some $\phi \in [0,1]ParseError: KaTeX parse error: Expected 'EOF', got '}' at position 1: }̲.$
Note for example that if we know the channel matrix and if we know the probabilities for the random variable , we find the probabilities for via
Example : For the binary symmetric channel if we know that , then
Decoding
With notation as above, a decoding is a function . Think of it this way: given that was received, the decoding function "guesses" that was sent.
The question is: how to identify a good decoding function.
Well, we can consider the probabilities that reflect how often a decoding is correct:
represents the probability that was sent given that was received.
The average probability of a correct decoding is given by (remember that )
Maximum likelihood decoding
A decoding is said to be a maximal likelihood decoding if for every and every .
Under some circumstances, one achieves maximal likelihood decoding through minimum distance decoding.
For example, we have:
Lemma : For the binary symmetric channel and ParseError: KaTeX parse error: Undefined control sequence: \F at position 11: C \subset \̲F̲_2^n, consider the assignment where is the closest neighbor to in with respect to the Hamming distance.
Transmission rate and capacity
For a block code with , the transmission rate of is defined to be
If ParseError: KaTeX parse error: Undefined control sequence: \F at position 5: A = \̲F̲_q and is a linear code with ParseError: KaTeX parse error: Undefined control sequence: \F at position 11: k = \dim_{\̲F̲_q} C, then
There is a notion of the capacity of a channel ; it is a number which in some sense encodes the theoretical maximum rate at which information can be reliably transmitted over the channel; we omit the definition here.
Shannon's Theorem
Theorem (Shannon) : Let be given and let with less than the channel capacity.
This shows that, given a channel with non-zero capacity, there are codes which allow us to communicate using the channel and decode with a probability of a correct decoding arbitrary close to 1.
It is not constructive, of course -- in some sense, this motivates the subject: how to find the codes that work well?
Bounds for block codes
Here we consider an alphabet -- thus is just a finite set, of order -- and a set of codewords ; we call the length of the codewords.
We consider the Hamming distance on : for , ParseError: KaTeX parse error: Undefined control sequence: \dist at position 1: \̲d̲i̲s̲t̲(u,v) = \#\{i \… where e.g. . In words, the distance between and is the number of coordinates in which the tuples differ.
It is straightforward to check that ParseError: KaTeX parse error: Undefined control sequence: \dist at position 1: \̲d̲i̲s̲t̲ is a metric on the finite set ; in particular, the triangle inequality holds: for every we have ParseError: KaTeX parse error: Undefined control sequence: \dist at position 1: \̲d̲i̲s̲t̲(u,v) \le \dist…
The minimal distance of is given by ParseError: KaTeX parse error: Undefined control sequence: \dist at position 13: d = \min \{ \̲d̲i̲s̲t̲(u,v) \mid u,v …
Lemma : Using nearest neighbor decoding, a block code of minimal distance can correct up to errors.
Proof : For every and every we have ParseError: KaTeX parse error: Undefined control sequence: \dist at position 17: …*) \quad d \le \̲d̲i̲s̲t̲(u,v) \le \dist…
Example (Repetition code) : Consider a finite alphabet and the the codewords . Thus the data is encoded by the redundant codeword .
Counting codes with given parameters
Write for the maximum size of a block code having minimal distance .
We are going to prove some results about the quantity .
We first compute the size of a "ball" in the metric space ParseError: KaTeX parse error: Undefined control sequence: \dist at position 6: (A^n,\̲d̲i̲s̲t̲).
Lemma : For and a natural number write: ParseError: KaTeX parse error: Undefined control sequence: \dist at position 28: …v \in A^n \mid \̲d̲i̲s̲t̲(u,v) \le m\}.
Remark : Note that if , then we insist that ; in this case "there are 0 ways of choosing precisely elements from a set of size ."
Proof of Lemma : For each there are elements of at distance precisely from .
We may now state and prove the following:
Theorem (Gilbert-Varshamov Bound) :
Proof : Suppose that is a code with minimal distance for which .