Connect: Reed-Solomon Error Correcting Codes

Module 02 | Real-World Connections

Polynomial evaluation and interpolation over finite fields. Module 02 concepts, build the error correction behind QR codes, CDs, and deep-space communication.

Introduction

Reed-Solomon codes (1960) are one of the most widely deployed error-correcting codes:

QR codes: the black-and-white squares on everything from boarding passes to restaurant menus
CDs and DVDs: withstand scratches by encoding data with redundancy
Deep-space communication: NASA's Voyager probes use Reed-Solomon to transmit data across billions of miles
Blockchain: data availability sampling in Ethereum uses Reed-Solomon

The core idea is surprisingly simple: encode a message as a polynomial, evaluate it at extra points for redundancy, and use Lagrange interpolation to recover the original polynomial even if some evaluations are lost or corrupted.

Every concept here comes directly from Module 02.

The Idea: Polynomials as Messages

A message with $k$ symbols becomes the $k$ coefficients of a polynomial of degree $k - 1$ over a finite field. We evaluate this polynomial at $n > k$ points. The $n$ evaluation results are the codeword.

Since a degree- $(k-1)$ polynomial is uniquely determined by any $k$ of its evaluations (Lagrange interpolation), we can lose up to $n - k$ evaluation points and still recover the original polynomial (and hence the message).

We work over GF(7) to keep numbers small and visible.

In [ ]:

# === Step 1: Encode a message as a polynomial ===
F = GF(7)
R.<x> = PolynomialRing(F)

# Message: [1, 2, 3], three symbols from GF(7)
message = [F(1), F(2), F(3)]
k = len(message)  # message length = polynomial degree + 1

# Encode as polynomial: m(x) = 1 + 2x + 3x^2
m_poly = sum(c * x^i for i, c in enumerate(message))
print(f'Message: {[Integer(c) for c in message]}')
print(f'Message polynomial: m(x) = {m_poly}')
print(f'Degree: {m_poly.degree()} (= k - 1 = {k} - 1)')
print()

# Evaluate at n = 6 points (all nonzero elements of GF(7))
n = 6  # codeword length
eval_points = [F(i) for i in range(1, n + 1)]
codeword = [m_poly(pt) for pt in eval_points]

print(f'Evaluation points: {[Integer(p) for p in eval_points]}')
print(f'Codeword (evaluations):')
for pt, val in zip(eval_points, codeword):
    print(f'  m({Integer(pt)}) = {Integer(val)}')

print(f'\nCodeword: {[Integer(v) for v in codeword]}')
print(f'\nWe encoded {k} message symbols into {n} codeword symbols.')
print(f'Redundancy: {n - k} extra symbols. Can tolerate {n - k} erasures.')

Step 2: Corrupt the Codeword

Suppose one of the evaluation points gets corrupted during transmission (a scratch on a CD, a bit flip in a noisy channel). We simulate this by erasing one value.

In [ ]:

# === Step 2: Corrupt one evaluation ===
F = GF(7)
R.<x> = PolynomialRing(F)

message = [F(1), F(2), F(3)]
m_poly = sum(c * x^i for i, c in enumerate(message))
eval_points = [F(i) for i in range(1, 7)]
codeword = [m_poly(pt) for pt in eval_points]

# Simulate erasure: lose the value at evaluation point 3
erased_index = 2  # index of the lost point (0-based)
erased_point = eval_points[erased_index]

print('Original codeword:')
for i, (pt, val) in enumerate(zip(eval_points, codeword)):
    marker = '  <-- ERASED' if i == erased_index else ''
    print(f'  m({Integer(pt)}) = {Integer(val)}{marker}')

# Surviving points
surviving_points = [pt for i, pt in enumerate(eval_points) if i != erased_index]
surviving_values = [val for i, val in enumerate(codeword) if i != erased_index]

print(f'\nSurviving points: {[Integer(p) for p in surviving_points]}')
print(f'Surviving values: {[Integer(v) for v in surviving_values]}')
print(f'Number of surviving points: {len(surviving_points)} (need at least k = {len(message)})')

Step 3: Recover via Lagrange Interpolation

From Module 02, we know that a polynomial of degree $k - 1$ is uniquely determined by $k$ evaluation points. We have $n - 1 = 5$ surviving points, which is more than enough to recover the degree-2 polynomial ( $k = 3$ ).

Lagrange interpolation reconstructs the polynomial from its evaluations:

p(x) = \sum_{i=1}^{k} y_i \prod_{j \neq i} \frac{x - x_j}{x_i - x_j}

In [ ]:

# === Step 3: Lagrange interpolation to recover the polynomial ===
F = GF(7)
R.<x> = PolynomialRing(F)

message = [F(1), F(2), F(3)]
m_poly = sum(c * x^i for i, c in enumerate(message))
eval_points = [F(i) for i in range(1, 7)]
codeword = [m_poly(pt) for pt in eval_points]

# Remove erased point
erased_index = 2
surviving_points = [pt for i, pt in enumerate(eval_points) if i != erased_index]
surviving_values = [val for i, val in enumerate(codeword) if i != erased_index]

# We only need k = 3 points to recover a degree-2 polynomial.
# Use the first 3 surviving points.
k = len(message)
interp_points = surviving_points[:k]
interp_values = surviving_values[:k]

print(f'Using {k} points for interpolation:')
for pt, val in zip(interp_points, interp_values):
    print(f'  ({Integer(pt)}, {Integer(val)})')
print()

# Lagrange interpolation
recovered = R.lagrange_polynomial(list(zip(interp_points, interp_values)))

print(f'Recovered polynomial: {recovered}')
print(f'Original polynomial:  {m_poly}')
print(f'Match: {recovered == m_poly}')
print()

# Extract the message
recovered_msg = [Integer(recovered[i]) for i in range(k)]
print(f'Recovered message: {recovered_msg}')
print(f'Original message:  {[Integer(c) for c in message]}')
print(f'\nMessage recovered perfectly despite losing one codeword symbol!')

In [ ]:

# === Can we survive more erasures? ===
F = GF(7)
R.<x> = PolynomialRing(F)

message = [F(1), F(2), F(3)]
k = len(message)
m_poly = sum(c * x^i for i, c in enumerate(message))
eval_points = [F(i) for i in range(1, 7)]
codeword = [m_poly(pt) for pt in eval_points]

# Erase 3 out of 6 points (the maximum we can handle)
erased_indices = {0, 2, 4}  # Erase points at positions 0, 2, 4

print('Codeword with 3 erasures (half the symbols lost!):')
for i, (pt, val) in enumerate(zip(eval_points, codeword)):
    status = 'ERASED' if i in erased_indices else f'{Integer(val)}'
    print(f'  m({Integer(pt)}) = {status}')

surviving = [(pt, val) for i, (pt, val) in enumerate(zip(eval_points, codeword))
             if i not in erased_indices]

print(f'\nSurviving: {len(surviving)} points (exactly k = {k})')

# Interpolate from exactly k points
recovered = R.lagrange_polynomial(surviving)
print(f'Recovered polynomial: {recovered}')
print(f'Original polynomial:  {m_poly}')
print(f'Match: {recovered == m_poly}')
print()
print(f'With n = 6 codeword symbols and k = 3 message symbols,')
print(f'we can tolerate up to n - k = {6 - k} erasures.')
print(f'That is {6 - k} out of 6 = {(6-k)/6*100:.0f}% data loss tolerance!')

In [ ]:

# === Lagrange interpolation by hand (one basis polynomial) ===
F = GF(7)
R.<x> = PolynomialRing(F)

# The points we use: (1, 6), (2, 0), (4, 4)
points = [(F(1), F(6)), (F(2), F(0)), (F(4), F(4))]

print('Lagrange interpolation step by step over GF(7):')
print(f'Points: {[(Integer(p), Integer(v)) for p, v in points]}')
print()

# Build each Lagrange basis polynomial
total = R(0)
for i, (xi, yi) in enumerate(points):
    # L_i(x) = product of (x - xj) / (xi - xj) for j != i
    Li = R(1)
    for j, (xj, yj) in enumerate(points):
        if j != i:
            Li *= (x - xj) / (xi - xj)
    print(f'  L_{i}(x) = {Li}')
    print(f'  y_{i} * L_{i}(x) = {yi} * ({Li}) = {yi * Li}')
    total += yi * Li
    print()

print(f'Sum: p(x) = {total}')
print(f'\nAll the division (computing 1/(xi - xj)) happens in GF(7).')
print(f'This requires GF(7) to be a FIELD, every nonzero element needs an inverse.')
print(f'Over Z/12Z (not a field), Lagrange interpolation would fail.')

Concept Map: Module 02 Concepts in Reed-Solomon

Module 02 Concept	Reed-Solomon Application
Polynomial ring $\mathbb{F}_p[x]$	Messages encoded as polynomials
Polynomial evaluation	Encoding: evaluate message polynomial at $n$ points
Lagrange interpolation	Decoding: recover polynomial from $k$ evaluations
Finite field (not just ring)	Interpolation requires division, needs a field
Irreducible polynomial	Larger Reed-Solomon codes use GF( $2^8$ ) built via quotient ring
Degree of polynomial	Degree $k-1$ polynomial needs exactly $k$ points to reconstruct

Reed-Solomon codes are Module 02 applied to reliability engineering.

Where Reed-Solomon Uses Larger Fields

In practice, Reed-Solomon codes work over GF( $2^8$ ) = GF(256) so that each field element is exactly one byte. The field is built as:

\text{GF}(256) = \mathbb{F}_2[x] / (\text{irreducible of degree } 8)

This is the same quotient ring construction from Module 02. QR codes, CDs, and space probes all rely on polynomial arithmetic in GF( $2^8$ ), the identical algebraic structure that powers AES.

In [ ]:

# === Quick demo: Reed-Solomon over GF(2^8) ===
K.<a> = GF(2^8)
R.<x> = PolynomialRing(K)

# Encode a 3-byte message
msg_bytes = [K.fetch_int(0x48), K.fetch_int(0x65), K.fetch_int(0x6C)]  # 'H', 'e', 'l'
m_poly = sum(c * x^i for i, c in enumerate(msg_bytes))
print(f'Message bytes: {[hex(c.integer_representation()) for c in msg_bytes]}')
print(f'Message polynomial: {m_poly}')
print()

# Evaluate at 6 points
eval_pts = [K.fetch_int(i) for i in range(1, 7)]
codeword = [m_poly(pt) for pt in eval_pts]
print('Codeword (6 evaluations in GF(256)):')
for pt, val in zip(eval_pts, codeword):
    print(f'  m(0x{pt.integer_representation():02X}) = 0x{val.integer_representation():02X}')

# Erase 3 points and recover
surviving = list(zip(eval_pts, codeword))[:3]
recovered = R.lagrange_polynomial(surviving)
recovered_bytes = [hex(recovered[i].integer_representation()) for i in range(3)]
print(f'\nRecovered from 3 of 6 points: {recovered_bytes}')
print(f'Original:                     {[hex(c.integer_representation()) for c in msg_bytes]}')
print(f'Match: {recovered == m_poly}')

Summary

Concept	Key idea
Encoding	A $k$ -symbol message becomes the coefficients of a degree- $(k-1)$ polynomial, evaluated at $n > k$ field points
Redundancy	The $n - k$ extra evaluations let us lose up to $n - k$ symbols and still recover the message
Decoding	Lagrange interpolation reconstructs the polynomial from any $k$ surviving evaluations
Field arithmetic is essential	Interpolation requires division, which requires a field (not just a ring). A composite modulus would break the scheme
Real-world scale	Production codes use GF( $2^8$ ), built as a polynomial quotient ring, the same construction from Module 02

From QR codes on your coffee cup to data from the edge of the solar system, Reed-Solomon codes rely on the polynomial algebra you learned in Module 02.

Back to Module 02: Rings, Fields, and Polynomials