Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
gmcninch-tufts
GitHub Repository: gmcninch-tufts/2024-Sp-Math190
Path: blob/main/course-contents/2024-02-22--notes-ECC--shannon+block.md
917 views
---
title: Shannon's Theorem; block codes date: 2024-02-22
---

\newcommand{\F}{\mathbb{F}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\NN}{\mathbb{N}} \newcommand{\e}{\mathbf{e}} \newcommand{\h}{\mathbf{h}} \newcommand{\bb}{\mathbf{b}} \newcommand{\Null}{\operatorname{Null}}

\newcommand{\PP}{\mathbb{P}} \newcommand{\vv}{\mathbf{v}} \newcommand{\dist}{\operatorname{dist}}

Overview

So far, we have chosen to use the term code for a vector subspace CC of ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^n. The idea is that we are interested in encoding some data by identifying it with vectors in ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^k.

If GG is a generator matrix for our code in standard form, then we encode our data: given a vector ParseError: KaTeX parse error: Undefined control sequence: \F at position 7: v \in \̲F̲_q^k, the encoded version is ParseError: KaTeX parse error: Undefined control sequence: \F at position 15: v \cdot G \in \̲F̲_q^n.

Note that -- since GG is in standard form -- if v=(v1,,vk)v = (v_1,\cdots,v_k) then vG=(v1,,vk,wk+1,,wn)v\cdot G = (v_1,\cdots,v_k,w_{k+1},\cdots,w_n) for some scalars ParseError: KaTeX parse error: Undefined control sequence: \F at position 9: w_j \in \̲F̲_q.

Our intent is to "transmit" this encoded data vGv \cdot G, possibly through some noisy channels that may result in transmission errors. At the other end, some vector ww in ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^n is received, and the hope is to recover the vector vv from the received vector ww.

The field of Information Theory formulates a way to reason about such systems; we are going to sketch a rudimentary description.

Noisy Channels and Shannon's Theorem

We consider finite sets SS and TT (say S={s1,,sn}S = \{s_1,\cdots,s_n\} and T={t1,,tm}T = \{t_1,\cdots,t_m\}).

We consider random variables XX on SS and YY on TT. In particular, we may consider probabilities:

$$P(X=s) = p_s, \quad \text{the probability that the random var $Xtakesvalue takes value s \in SParseError: KaTeX parse error: Expected 'EOF', got '}' at position 1: }̲$

$$P(Y=t) = q_t, \quad \text{the probability that the random var $Ytakesvalue takes value t \in TParseError: KaTeX parse error: Expected 'EOF', got '}' at position 1: }̲$

We view the elements of the set SS as the data we send through some transmission channel, and TT as the data we receive.

In the case of our linear codes ParseError: KaTeX parse error: Undefined control sequence: \F at position 11: C \subset \̲F̲_q^n described above, SS would be ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^k and TT would be ParseError: KaTeX parse error: Undefined control sequence: \F at position 1: \̲F̲_q^n.

Block codes

More generally, we consider block codes CAnC \subset A^n. Here AA is an alphabet, and the code words in CC are just nn-tuples of elements from AA. We write q=Aq = |A|. We say that the length of the code CC is nn.

In this setting, for our transmission channel we take S=CS = C and T=AnT = A^n.

Channel

A channel Γ\Gamma for transmission amounts to the following matrix with rows indexed by the set SS and columns indexed by the set TT: pst=P(Y=tX=s)p_{st} = P( Y = t \mid X = s) i.e. the conditional probability that Y=tY= t given that X=sX = s.

Example : An example is the binary symmetric channel, where S=T={0,1}S = T = \{0,1\} and the channel matrix Γ\Gamma is given by $$(p_{st}) = \begin{bmatrix} \phi & 1 - \phi \\ 1 - \phi & \phi \end{bmatrix} \quad \text{for some $\phi \in [0,1]ParseError: KaTeX parse error: Expected 'EOF', got '}' at position 1: }̲.$

Here $\phi$ represents the probability that $0$ was received given that $0$ was sent, and also the probability that $1$ was received given that $1$ was sent.

Note for example that if we know the channel matrix and if we know the probabilities for the random variable XX, we find the probabilities for YY via P(Y=t)=sSP(Y=tX=s)P(X=s).P(Y=t) = \sum_{s \in S} P(Y=t \mid X = s) \cdot P(X=s).

Example : For the binary symmetric channel if we know that P(X=0)=pP(X=0) = p, then P(Y=0)=ϕp+(1ϕ)(1p).P(Y=0) = \phi \cdot p + (1-\phi) (1-p).

Decoding

With notation as above, a decoding is a function Δ:TS\Delta:T \to S. Think of it this way: given that tTt \in T was received, the decoding function "guesses" that Δ(t)S\Delta(t) \in S was sent.

The question is: how to identify a good decoding function.

Well, we can consider the probabilities that reflect how often a decoding Δ\Delta is correct:

qΔ(t),t=P(X=Δ(t)Y=t)q_{\Delta(t),t} = P(X = \Delta(t) \mid Y = t)

represents the probability that Δ(t)\Delta(t) was sent given that tt was received.

The average probability of a correct decoding is given by PCOR=tTqtqΔ(t),tP_{\operatorname{COR}} = \sum_{t \in T} q_t \cdot q_{\Delta(t),t} (remember that qt=P(Y=t)q_t = P(Y = t))

Maximum likelihood decoding

A decoding Δ:TS\Delta:T \to S is said to be a maximal likelihood decoding if pΔ(t),tps,tp_{\Delta(t),t} \ge p_{s,t} for every sSs \in S and every tTt \in T.

Under some circumstances, one achieves maximal likelihood decoding through minimum distance decoding.

For example, we have:

Lemma : For the binary symmetric channel and ParseError: KaTeX parse error: Undefined control sequence: \F at position 11: C \subset \̲F̲_2^n, consider the assignment Δ(v)=u\Delta(v) = u where uu is the closest neighbor to vv in CC with respect to the Hamming distance.

Then $\Delta$ is *maximal likelihood decoding*.

Transmission rate and capacity

For a block code CAnC \subset A^n with A=q|A| = q, the transmission rate of CC is defined to be R=logq(C)nR = \dfrac{\log_q(|C|)}{n}

If ParseError: KaTeX parse error: Undefined control sequence: \F at position 5: A = \̲F̲_q and CC is a linear code with ParseError: KaTeX parse error: Undefined control sequence: \F at position 11: k = \dim_{\̲F̲_q} C, then R=k/n.R = k/n.

There is a notion of the capacity of a channel Γ\Gamma; it is a number which in some sense encodes the theoretical maximum rate at which information can be reliably transmitted over the channel; we omit the definition here.

Shannon's Theorem

Theorem (Shannon) : Let δ>0\delta > 0 be given and let 0<R0 < R with RR less than the channel capacity.

For every sufficiently large natural number $n$, there is a code of length $n$ and transmission rate $\ge R$ such that, using maximum likelihood decoding, the probability $P_{\operatorname{COR}}$ of a correct decoding satisfies $$P_{\operatorname{COR}} > 1 - \delta.$$

This shows that, given a channel with non-zero capacity, there are codes which allow us to communicate using the channel and decode with a probability of a correct decoding arbitrary close to 1.

It is not constructive, of course -- in some sense, this motivates the subject: how to find the codes that work well?


Bounds for block codes

Here we consider an alphabet AA -- thus AA is just a finite set, of order A=q|A| = q -- and a set of codewords CAnC \subset A^n; we call nn the length of the codewords.

We consider the Hamming distance on AnA^n: for u,vAnu,v \in A^n, ParseError: KaTeX parse error: Undefined control sequence: \dist at position 1: \̲d̲i̲s̲t̲(u,v) = \#\{i \… where e.g. u=(u1,,un)Anu = (u_1,\cdots,u_n) \in A^n. In words, the distance between uu and vv is the number of coordinates in which the tuples differ.

It is straightforward to check that ParseError: KaTeX parse error: Undefined control sequence: \dist at position 1: \̲d̲i̲s̲t̲ is a metric on the finite set AnA^n; in particular, the triangle inequality holds: for every u,v,wAnu,v,w \in A^n we have ParseError: KaTeX parse error: Undefined control sequence: \dist at position 1: \̲d̲i̲s̲t̲(u,v) \le \dist…

The minimal distance of CC is given by ParseError: KaTeX parse error: Undefined control sequence: \dist at position 13: d = \min \{ \̲d̲i̲s̲t̲(u,v) \mid u,v …

Lemma : Using nearest neighbor decoding, a block code of minimal distance dd can correct up to (d1)/2(d-1)/2 errors.

Proof : For every wAnw \in A^n and every u,vCu,v \in C we have ParseError: KaTeX parse error: Undefined control sequence: \dist at position 17: …*) \quad d \le \̲d̲i̲s̲t̲(u,v) \le \dist…

Now, if $\dist(u,w) \le (d-1)/2$ and $\dist(w,v) \le (d-1)/2$ then $\dist(u,v) \le d-1$, contrary to $(*)$. Thus for any $w$ there is at most one codeword $u \in C$ for which $\dist(u,w) \le (d-1)/2$. From the point of view of code transmission, if $w \in A^n$ is *received* and no more than $(d-1)/2$ of the components of $w = (w_1,w_2,\cdots,w_n)$ are erroneous, then nearest neighbor decoding will find the codeword in $C$ that was transmitted.

Example (Repetition code) : Consider a finite alphabet AA and the the codewords C={(a,a,,a)AraA}C=\{(a,a,\cdots,a) \in A^r \mid a \in A\}. Thus the data aAa \in A is encoded by the redundant codeword (a,a,,a)(a,a,\cdots,a).

The minimal distance between distinct codewords in $C$ is $r$, so the Lemma shows that using nearest neighbor decoding, this code can correct up to $(r-1)/2$ errors. (Note that in this case nearest neighbor decoding amounts to decoding $$(a_1,a_2,\cdots,a_r)$$ by "majority vote"; in other words, view $\{a_1,a_2,\cdots,a_r\}$ as a *multi-set* and choose an element with maximal multiplicity.)

Counting codes with given parameters

Write Aq(n,d)A_q(n,d) for the maximum size C|C| of a block code CAnC \subset A^n having minimal distance dd.

We are going to prove some results about the quantity Aq(n,d)A_q(n,d).

We first compute the size of a "ball" in the metric space ParseError: KaTeX parse error: Undefined control sequence: \dist at position 6: (A^n,\̲d̲i̲s̲t̲).

Lemma : For uAnu \in A^n and a natural number mm write: ParseError: KaTeX parse error: Undefined control sequence: \dist at position 28: …v \in A^n \mid \̲d̲i̲s̲t̲(u,v) \le m\}.

Let $$\delta(m) = 1 + \dbinom{n}{1}(q-1) + \dbinom{n}{2}(q-1)^2 + \cdots + \dbinom{n}{m}(q-1)^m = \sum_{j=0}^m \dbinom{n}{j}(q-1)^j.$$ Then $|B_m(u)| = \delta(m)$.

Remark : Note that if kNk \in \NN, k>nk > n then we insist that (nk)=0\dbinom{n}{k} = 0; in this case "there are 0 ways of choosing precisely kk elements from a set of size nn."

This is consistent e.g. with the formula $$\dbinom{n}{k} = \dfrac{n(n-1)\cdots(n-k+1)}{k!}$$ since the factor $(n-n) = 0$ appears in the numerator.

Proof of Lemma : For each j=0,1,,mj = 0,1,\cdots,m there are (nj)(q1)j\dbinom{n}{j} \cdot (q-1)^j elements of AnA^n at distance precisely jj from uu.

We may now state and prove the following:

Theorem (Gilbert-Varshamov Bound) : Aq(n,d)δ(d1)qn.A_q(n,d) \cdot \delta(d-1) \ge q^n.

Proof : Suppose that CAnC \subset A^n is a code with minimal distance dd for which C=Aq(n,d)|C| = A_q(n,d).

Notice that $|C| \cdot \delta(d-1)$ is the size of the disjoint union $$\bigsqcup_{u \in C} B_{d-1}(u).$$. Thus, if $$|C|\cdot \delta(d-1) < q^n = |A^n|$$ then $$\bigcup_{u \in C} B_{d-1}(u) \subsetneq A^n;$$ thus, there is some element $v \in A^n$ for which $$v \not \in \bigcup_{u \in C} B_{d-1}(u).$$ We then have $\dist(u,v) \ge d$ for every $u \in C$. This shows that $C \cup \{v\} \subset A^n$ is a code having minimal distance $d$, contradicting the assumption that $|C| = A_q(n,d)$; i.e. contradicting the assumption that $C$ has maximal size among codes $C' \subset A^n$ with minimal distance $d$.