Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
duyuefeng0708
GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/02-rings-fields-polynomials/connect/reed-solomon-codes.ipynb
483 views
unlisted
Kernel: SageMath 10.0

Connect: Reed-Solomon Error Correcting Codes

Module 02 | Real-World Connections

Polynomial evaluation and interpolation over finite fields. Module 02 concepts, build the error correction behind QR codes, CDs, and deep-space communication.

Introduction

Reed-Solomon codes (1960) are one of the most widely deployed error-correcting codes:

  • QR codes: the black-and-white squares on everything from boarding passes to restaurant menus

  • CDs and DVDs: withstand scratches by encoding data with redundancy

  • Deep-space communication: NASA's Voyager probes use Reed-Solomon to transmit data across billions of miles

  • Blockchain: data availability sampling in Ethereum uses Reed-Solomon

The core idea is surprisingly simple: encode a message as a polynomial, evaluate it at extra points for redundancy, and use Lagrange interpolation to recover the original polynomial even if some evaluations are lost or corrupted.

Every concept here comes directly from Module 02.

The Idea: Polynomials as Messages

A message with kk symbols becomes the kk coefficients of a polynomial of degree k1k - 1 over a finite field. We evaluate this polynomial at n>kn > k points. The nn evaluation results are the codeword.

Since a degree-(k1)(k-1) polynomial is uniquely determined by any kk of its evaluations (Lagrange interpolation), we can lose up to nkn - k evaluation points and still recover the original polynomial (and hence the message).

We work over GF(7) to keep numbers small and visible.

# === Step 1: Encode a message as a polynomial === F = GF(7) R.<x> = PolynomialRing(F) # Message: [1, 2, 3], three symbols from GF(7) message = [F(1), F(2), F(3)] k = len(message) # message length = polynomial degree + 1 # Encode as polynomial: m(x) = 1 + 2x + 3x^2 m_poly = sum(c * x^i for i, c in enumerate(message)) print(f'Message: {[Integer(c) for c in message]}') print(f'Message polynomial: m(x) = {m_poly}') print(f'Degree: {m_poly.degree()} (= k - 1 = {k} - 1)') print() # Evaluate at n = 6 points (all nonzero elements of GF(7)) n = 6 # codeword length eval_points = [F(i) for i in range(1, n + 1)] codeword = [m_poly(pt) for pt in eval_points] print(f'Evaluation points: {[Integer(p) for p in eval_points]}') print(f'Codeword (evaluations):') for pt, val in zip(eval_points, codeword): print(f' m({Integer(pt)}) = {Integer(val)}') print(f'\nCodeword: {[Integer(v) for v in codeword]}') print(f'\nWe encoded {k} message symbols into {n} codeword symbols.') print(f'Redundancy: {n - k} extra symbols. Can tolerate {n - k} erasures.')

Step 2: Corrupt the Codeword

Suppose one of the evaluation points gets corrupted during transmission (a scratch on a CD, a bit flip in a noisy channel). We simulate this by erasing one value.

# === Step 2: Corrupt one evaluation === F = GF(7) R.<x> = PolynomialRing(F) message = [F(1), F(2), F(3)] m_poly = sum(c * x^i for i, c in enumerate(message)) eval_points = [F(i) for i in range(1, 7)] codeword = [m_poly(pt) for pt in eval_points] # Simulate erasure: lose the value at evaluation point 3 erased_index = 2 # index of the lost point (0-based) erased_point = eval_points[erased_index] print('Original codeword:') for i, (pt, val) in enumerate(zip(eval_points, codeword)): marker = ' <-- ERASED' if i == erased_index else '' print(f' m({Integer(pt)}) = {Integer(val)}{marker}') # Surviving points surviving_points = [pt for i, pt in enumerate(eval_points) if i != erased_index] surviving_values = [val for i, val in enumerate(codeword) if i != erased_index] print(f'\nSurviving points: {[Integer(p) for p in surviving_points]}') print(f'Surviving values: {[Integer(v) for v in surviving_values]}') print(f'Number of surviving points: {len(surviving_points)} (need at least k = {len(message)})')

Step 3: Recover via Lagrange Interpolation

From Module 02, we know that a polynomial of degree k1k - 1 is uniquely determined by kk evaluation points. We have n1=5n - 1 = 5 surviving points, which is more than enough to recover the degree-2 polynomial (k=3k = 3).

Lagrange interpolation reconstructs the polynomial from its evaluations:

p(x)=i=1kyijixxjxixjp(x) = \sum_{i=1}^{k} y_i \prod_{j \neq i} \frac{x - x_j}{x_i - x_j}
# === Step 3: Lagrange interpolation to recover the polynomial === F = GF(7) R.<x> = PolynomialRing(F) message = [F(1), F(2), F(3)] m_poly = sum(c * x^i for i, c in enumerate(message)) eval_points = [F(i) for i in range(1, 7)] codeword = [m_poly(pt) for pt in eval_points] # Remove erased point erased_index = 2 surviving_points = [pt for i, pt in enumerate(eval_points) if i != erased_index] surviving_values = [val for i, val in enumerate(codeword) if i != erased_index] # We only need k = 3 points to recover a degree-2 polynomial. # Use the first 3 surviving points. k = len(message) interp_points = surviving_points[:k] interp_values = surviving_values[:k] print(f'Using {k} points for interpolation:') for pt, val in zip(interp_points, interp_values): print(f' ({Integer(pt)}, {Integer(val)})') print() # Lagrange interpolation recovered = R.lagrange_polynomial(list(zip(interp_points, interp_values))) print(f'Recovered polynomial: {recovered}') print(f'Original polynomial: {m_poly}') print(f'Match: {recovered == m_poly}') print() # Extract the message recovered_msg = [Integer(recovered[i]) for i in range(k)] print(f'Recovered message: {recovered_msg}') print(f'Original message: {[Integer(c) for c in message]}') print(f'\nMessage recovered perfectly despite losing one codeword symbol!')
# === Can we survive more erasures? === F = GF(7) R.<x> = PolynomialRing(F) message = [F(1), F(2), F(3)] k = len(message) m_poly = sum(c * x^i for i, c in enumerate(message)) eval_points = [F(i) for i in range(1, 7)] codeword = [m_poly(pt) for pt in eval_points] # Erase 3 out of 6 points (the maximum we can handle) erased_indices = {0, 2, 4} # Erase points at positions 0, 2, 4 print('Codeword with 3 erasures (half the symbols lost!):') for i, (pt, val) in enumerate(zip(eval_points, codeword)): status = 'ERASED' if i in erased_indices else f'{Integer(val)}' print(f' m({Integer(pt)}) = {status}') surviving = [(pt, val) for i, (pt, val) in enumerate(zip(eval_points, codeword)) if i not in erased_indices] print(f'\nSurviving: {len(surviving)} points (exactly k = {k})') # Interpolate from exactly k points recovered = R.lagrange_polynomial(surviving) print(f'Recovered polynomial: {recovered}') print(f'Original polynomial: {m_poly}') print(f'Match: {recovered == m_poly}') print() print(f'With n = 6 codeword symbols and k = 3 message symbols,') print(f'we can tolerate up to n - k = {6 - k} erasures.') print(f'That is {6 - k} out of 6 = {(6-k)/6*100:.0f}% data loss tolerance!')
# === Lagrange interpolation by hand (one basis polynomial) === F = GF(7) R.<x> = PolynomialRing(F) # The points we use: (1, 6), (2, 0), (4, 4) points = [(F(1), F(6)), (F(2), F(0)), (F(4), F(4))] print('Lagrange interpolation step by step over GF(7):') print(f'Points: {[(Integer(p), Integer(v)) for p, v in points]}') print() # Build each Lagrange basis polynomial total = R(0) for i, (xi, yi) in enumerate(points): # L_i(x) = product of (x - xj) / (xi - xj) for j != i Li = R(1) for j, (xj, yj) in enumerate(points): if j != i: Li *= (x - xj) / (xi - xj) print(f' L_{i}(x) = {Li}') print(f' y_{i} * L_{i}(x) = {yi} * ({Li}) = {yi * Li}') total += yi * Li print() print(f'Sum: p(x) = {total}') print(f'\nAll the division (computing 1/(xi - xj)) happens in GF(7).') print(f'This requires GF(7) to be a FIELD, every nonzero element needs an inverse.') print(f'Over Z/12Z (not a field), Lagrange interpolation would fail.')

Concept Map: Module 02 Concepts in Reed-Solomon

Module 02 ConceptReed-Solomon Application
Polynomial ring Fp[x]\mathbb{F}_p[x]Messages encoded as polynomials
Polynomial evaluationEncoding: evaluate message polynomial at nn points
Lagrange interpolationDecoding: recover polynomial from kk evaluations
Finite field (not just ring)Interpolation requires division, needs a field
Irreducible polynomialLarger Reed-Solomon codes use GF(282^8) built via quotient ring
Degree of polynomialDegree k1k-1 polynomial needs exactly kk points to reconstruct

Reed-Solomon codes are Module 02 applied to reliability engineering.

Where Reed-Solomon Uses Larger Fields

In practice, Reed-Solomon codes work over GF(282^8) = GF(256) so that each field element is exactly one byte. The field is built as:

GF(256)=F2[x]/(irreducible of degree 8)\text{GF}(256) = \mathbb{F}_2[x] / (\text{irreducible of degree } 8)

This is the same quotient ring construction from Module 02. QR codes, CDs, and space probes all rely on polynomial arithmetic in GF(282^8), the identical algebraic structure that powers AES.

# === Quick demo: Reed-Solomon over GF(2^8) === K.<a> = GF(2^8) R.<x> = PolynomialRing(K) # Encode a 3-byte message msg_bytes = [K.fetch_int(0x48), K.fetch_int(0x65), K.fetch_int(0x6C)] # 'H', 'e', 'l' m_poly = sum(c * x^i for i, c in enumerate(msg_bytes)) print(f'Message bytes: {[hex(c.integer_representation()) for c in msg_bytes]}') print(f'Message polynomial: {m_poly}') print() # Evaluate at 6 points eval_pts = [K.fetch_int(i) for i in range(1, 7)] codeword = [m_poly(pt) for pt in eval_pts] print('Codeword (6 evaluations in GF(256)):') for pt, val in zip(eval_pts, codeword): print(f' m(0x{pt.integer_representation():02X}) = 0x{val.integer_representation():02X}') # Erase 3 points and recover surviving = list(zip(eval_pts, codeword))[:3] recovered = R.lagrange_polynomial(surviving) recovered_bytes = [hex(recovered[i].integer_representation()) for i in range(3)] print(f'\nRecovered from 3 of 6 points: {recovered_bytes}') print(f'Original: {[hex(c.integer_representation()) for c in msg_bytes]}') print(f'Match: {recovered == m_poly}')

Summary

ConceptKey idea
EncodingA kk-symbol message becomes the coefficients of a degree-(k1)(k-1) polynomial, evaluated at n>kn > k field points
RedundancyThe nkn - k extra evaluations let us lose up to nkn - k symbols and still recover the message
DecodingLagrange interpolation reconstructs the polynomial from any kk surviving evaluations
Field arithmetic is essentialInterpolation requires division, which requires a field (not just a ring). A composite modulus would break the scheme
Real-world scaleProduction codes use GF(282^8), built as a polynomial quotient ring, the same construction from Module 02

From QR codes on your coffee cup to data from the edge of the solar system, Reed-Solomon codes rely on the polynomial algebra you learned in Module 02.


Back to Module 02: Rings, Fields, and Polynomials