GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/03-galois-fields-aes/sage/03e-aes-mixcolumns-as-field-ops.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.5

AES MixColumns as Field Operations

Module 03e | Galois Fields and AES

Matrix multiplication over GF(256), the diffusion layer of AES.

Question: SubBytes provides confusion (nonlinearity). But confusion alone isn't enough, an attacker could analyze each byte independently. AES needs diffusion: changing one input byte should affect many output bytes. How?
MixColumns achieves this by treating each 4-byte column as a vector over GF(256) and multiplying by a carefully chosen matrix. One changed input byte cascades through the entire column.

Objectives

By the end of this notebook you will be able to:

Describe the AES state as a 4×4 matrix of GF(256) elements
Perform the MixColumns matrix-vector multiplication over GF(256)
Verify the MDS (Maximum Distance Separable) property of the MixColumns matrix
Implement InvMixColumns using the inverse matrix
Explain why MixColumns provides optimal diffusion

Bridge from 03d

In 03d you built the S-box, AES's nonlinear byte substitution. Now we add the linear mixing step. Together, SubBytes (nonlinear, per-byte) and MixColumns (linear, per-column) create the confusion-diffusion architecture that Shannon identified as essential for strong ciphers.

MixColumns operates on 4-byte columns using matrix multiplication over GF(256), the same field you've been working with since 03c.

Setup

In [ ]:

# Setup: GF(256) with AES polynomial
R.<x> = GF(2)[]
F.<a> = GF(2^8, modulus=x^8 + x^4 + x^3 + x + 1)

def byte_to_gf(b):
    return sum(GF(2)((b >> i) & 1) * a^i for i in range(8))

def gf_to_byte(elem):
    p = elem.polynomial()
    return sum(int(p[i]) << i for i in range(8))

def xtime(b):
    result = b << 1
    if result & 0x100:
        result ^^= 0x11B
    return result & 0xFF

def gf256_mul(a, b):
    result = 0
    temp = a
    for i in range(8):
        if b & (1 << i):
            result ^^= temp
        temp = xtime(temp)
    return result

print('GF(256) utilities loaded.')

The AES State Matrix

AES processes 16 bytes (128 bits) at a time, arranged as a 4×4 matrix of GF(256) elements. Bytes fill column-by-column:

\begin{pmatrix} b_0 & b_4 & b_8 & b_{12} \\ b_1 & b_5 & b_9 & b_{13} \\ b_2 & b_6 & b_{10} & b_{14} \\ b_3 & b_7 & b_{11} & b_{15} \end{pmatrix}

MixColumns operates on each column independently.

The MixColumns Matrix

Each column $[c_0, c_1, c_2, c_3]^T$ is left-multiplied by the fixed matrix:

M = \begin{pmatrix} \texttt{02} & \texttt{03} & \texttt{01} & \texttt{01} \\ \texttt{01} & \texttt{02} & \texttt{03} & \texttt{01} \\ \texttt{01} & \texttt{01} & \texttt{02} & \texttt{03} \\ \texttt{03} & \texttt{01} & \texttt{01} & \texttt{02} \end{pmatrix}

where the entries are GF(256) elements. This is a circulant matrix, each row is a cyclic shift of the first.

In [ ]:

# The MixColumns matrix over GF(256)
M_bytes = [
    [0x02, 0x03, 0x01, 0x01],
    [0x01, 0x02, 0x03, 0x01],
    [0x01, 0x01, 0x02, 0x03],
    [0x03, 0x01, 0x01, 0x02]
]

# As a SageMath matrix over GF(256)
M = matrix(F, [[byte_to_gf(b) for b in row] for row in M_bytes])
print('MixColumns matrix M (over GF(256)):')
print(M)
print()
print(f'Is invertible? {M.is_invertible()}')
print(f'Determinant: {M.det()} (= 0x{gf_to_byte(M.det()):02X})')
print()

# The matrix uses only 0x01, 0x02, 0x03:
print('Only three GF(256) constants appear:')
print('  0x01 = identity (multiply by 1 = no-op)')
print('  0x02 = xtime (shift left, conditional XOR with 0x1B)')
print('  0x03 = 0x02 + 0x01 = xtime(b) XOR b')

MixColumns in Action

Checkpoint: Before running the next cell, compute the first output byte by hand. Given column $[\texttt{0xDB}, \texttt{0x13}, \texttt{0x53}, \texttt{0x45}]^T$ , the first output byte is: $\texttt{02} \cdot \texttt{DB} \oplus \texttt{03} \cdot \texttt{13} \oplus \texttt{01} \cdot \texttt{53} \oplus \texttt{01} \cdot \texttt{45}$

In [ ]:

# MixColumns on a single column
def mix_column(col_bytes):
    """Apply MixColumns to a 4-byte column. Returns 4 bytes."""
    result = [0] * 4
    for i in range(4):
        for j in range(4):
            result[i] ^^= gf256_mul(M_bytes[i][j], col_bytes[j])
    return result

# AES spec test vector (from FIPS 197, Appendix B)
col = [0xDB, 0x13, 0x53, 0x45]
out = mix_column(col)

print(f'Input column:  [{" ".join(f"0x{b:02X}" for b in col)}]')
print(f'Output column: [{" ".join(f"0x{b:02X}" for b in out)}]')
print()

# Show the computation step by step for the first output byte
print('First output byte breakdown:')
terms = []
for j in range(4):
    product = gf256_mul(M_bytes[0][j], col[j])
    terms.append(product)
    print(f'  0x{M_bytes[0][j]:02X} * 0x{col[j]:02X} = 0x{product:02X}')
print(f'  XOR all: 0x{terms[0]:02X} ⊕ 0x{terms[1]:02X} ⊕ 0x{terms[2]:02X} ⊕ 0x{terms[3]:02X} = 0x{out[0]:02X}')
print()

# Verify using SageMath matrix multiplication
col_vec = vector(F, [byte_to_gf(b) for b in col])
sage_out = M * col_vec
sage_bytes = [gf_to_byte(e) for e in sage_out]
print(f'SageMath verification: [{" ".join(f"0x{b:02X}" for b in sage_bytes)}]')
print(f'Match: {out == sage_bytes}')

The Diffusion Property

Why this specific matrix? Because it is MDS (Maximum Distance Separable). This means: if you change $k$ input bytes, at least $5 - k$ output bytes change. The minimum number of active bytes (input + output) is always $\geq 5$ .

In [ ]:

# Demonstrating diffusion: change one byte, all four outputs change
col_a = [0xDB, 0x13, 0x53, 0x45]
col_b = [0xDB, 0x13, 0x53, 0x46]  # changed last byte by 1 bit

out_a = mix_column(col_a)
out_b = mix_column(col_b)

print('Diffusion: change one input byte, watch the output')
print(f'Input A: [{" ".join(f"0x{b:02X}" for b in col_a)}]')
print(f'Input B: [{" ".join(f"0x{b:02X}" for b in col_b)}]')
print(f'  (only byte 3 changed: 0x45 → 0x46)')
print()
print(f'Output A: [{" ".join(f"0x{b:02X}" for b in out_a)}]')
print(f'Output B: [{" ".join(f"0x{b:02X}" for b in out_b)}]')

diffs = sum(1 for i in range(4) if out_a[i] != out_b[i])
print(f'  {diffs} out of 4 output bytes changed!')
print()

# MDS property: check all submatrices are invertible
print('=== MDS verification ===')
is_mds = True
for size in range(1, 5):
    for rows in Combinations(range(4), size):
        for cols in Combinations(range(4), size):
            sub = M[list(rows), list(cols)]
            if sub.det() == 0:
                print(f'  Singular submatrix! rows={rows}, cols={cols}')
                is_mds = False

print(f'All square submatrices are invertible? {is_mds}')
print('This confirms M is MDS, maximum diffusion from any input change.')

Common mistake: "MixColumns is just matrix multiplication, so it's linear and therefore weak." MixColumns IS linear over GF(256), that's by design. The nonlinearity comes from SubBytes. AES's security comes from alternating nonlinear (SubBytes) and linear (MixColumns) layers. Neither alone is secure; together they create an avalanche effect.

InvMixColumns

In [ ]:

# Inverse MixColumns matrix
M_inv = M.inverse()
M_inv_bytes = [[gf_to_byte(M_inv[i,j]) for j in range(4)] for i in range(4)]

print('InvMixColumns matrix (over GF(256)):')
for row in M_inv_bytes:
    print(f'  [{" ".join(f"0x{b:02X}" for b in row)}]')
print()
print('Notice: the inverse uses 0x0E, 0x0B, 0x0D, 0x09')
print('These are larger constants, InvMixColumns is more expensive.')
print()

# Verify: M_inv * M = identity
product = M_inv * M
print(f'M_inv * M = identity? {product == matrix.identity(F, 4)}')
print()

# Verify roundtrip
def inv_mix_column(col_bytes):
    result = [0] * 4
    for i in range(4):
        for j in range(4):
            result[i] ^^= gf256_mul(M_inv_bytes[i][j], col_bytes[j])
    return result

col = [0xDB, 0x13, 0x53, 0x45]
mixed = mix_column(col)
recovered = inv_mix_column(mixed)
print(f'Original:  [{" ".join(f"0x{b:02X}" for b in col)}]')
print(f'Mixed:     [{" ".join(f"0x{b:02X}" for b in mixed)}]')
print(f'Recovered: [{" ".join(f"0x{b:02X}" for b in recovered)}]')
print(f'Roundtrip correct: {col == recovered}')

Checkpoint: The inverse MixColumns matrix uses the constants 0x0E, 0x0B, 0x0D, 0x09. These are larger because $M^{-1}$ has more complex GF(256) entries. This is why AES decryption is slightly slower than encryption, InvMixColumns requires more multiplications.

MixColumns as Polynomial Multiplication

There's an elegant alternative view: MixColumns is polynomial multiplication modulo $x^4 + 1$ in the ring GF(256) $[x]/(x^4 + 1)$ .

In [ ]:

# MixColumns as polynomial multiplication
S.<y> = PolynomialRing(F)
Q = S.quotient(y^4 + 1, 'z')
z = Q.gen()

# The MixColumns polynomial c(x) = 03*x^3 + 01*x^2 + 01*x + 02
c_poly = byte_to_gf(0x03)*z^3 + byte_to_gf(0x01)*z^2 + byte_to_gf(0x01)*z + byte_to_gf(0x02)
print(f'MixColumns polynomial c(x) = {c_poly}')
print()

# Column as polynomial: b(x) = b3*x^3 + b2*x^2 + b1*x + b0
col = [0xDB, 0x13, 0x53, 0x45]
b_poly = sum(byte_to_gf(col[i]) * z^i for i in range(4))
print(f'Column polynomial b(x) = {b_poly}')
print()

# Multiply mod (x^4 + 1)
result_poly = c_poly * b_poly
print(f'c(x) * b(x) mod (x^4 + 1) = {result_poly}')
print()

# Extract bytes
lift = result_poly.lift()
result_bytes = [gf_to_byte(lift[i]) if i <= lift.degree() else 0 for i in range(4)]
print(f'Result bytes: [{" ".join(f"0x{b:02X}" for b in result_bytes)}]')
print(f'Matrix method: [{" ".join(f"0x{b:02X}" for b in mix_column(col))}]')
print(f'Match: {result_bytes == mix_column(col)}')

Exercises

Exercise 1 (Worked)

Apply MixColumns to the column $[\texttt{0x63}, \texttt{0x53}, \texttt{0xE0}, \texttt{0x8C}]^T$ , showing the GF(256) multiplications for each output byte.

In [ ]:

# Exercise 1 (Worked), MixColumns step by step
col = [0x63, 0x53, 0xE0, 0x8C]
print(f'Input: [{" ".join(f"0x{b:02X}" for b in col)}]')
print()

for i in range(4):
    print(f'Output byte {i}:')
    terms = []
    for j in range(4):
        product = gf256_mul(M_bytes[i][j], col[j])
        terms.append(product)
        # Show how multiply-by-02 and multiply-by-03 work
        if M_bytes[i][j] == 0x02:
            detail = f'xtime(0x{col[j]:02X})'
        elif M_bytes[i][j] == 0x03:
            detail = f'xtime(0x{col[j]:02X}) ⊕ 0x{col[j]:02X}'
        else:
            detail = f'0x{col[j]:02X}'
        print(f'  0x{M_bytes[i][j]:02X} × 0x{col[j]:02X} = 0x{product:02X}  ({detail})')
    
    result = 0
    for t in terms:
        result ^^= t
    print(f'  XOR: 0x{result:02X}')
    print()

out = mix_column(col)
print(f'Result: [{" ".join(f"0x{b:02X}" for b in out)}]')

Exercise 2 (Guided)

Verify the MDS property: show that every square submatrix of $M$ (of any size 1×1, 2×2, 3×3, 4×4) has a nonzero determinant over GF(256).

In [ ]:

# Exercise 2 (Guided), Verify MDS property

print('Checking all square submatrices of M:')
for size in range(1, 5):
    count = 0
    invertible = 0
    for rows in Combinations(range(4), size):
        for cols in Combinations(range(4), size):
            count += 1
            sub = M[list(rows), list(cols)]
            # TODO: check if determinant is nonzero
            # Hint: sub.det() != 0 in GF(256)
            pass  # TODO: increment invertible if det != 0
    
    # TODO: print results for this size
    # Expected: ALL submatrices are invertible
    print(f'  {size}×{size}: {count} submatrices, TODO invertible')

# Question: why does MDS imply that changing k input bytes
# forces at least (5-k) output bytes to change?

Exercise 3 (Independent)

Apply MixColumns to the column $[\texttt{0x01}, \texttt{0x00}, \texttt{0x00}, \texttt{0x00}]^T$ . What do you get? (Hint: this extracts a column of $M$ , which one?)
Apply MixColumns twice to any column. Is $M^2$ also MDS? Compute $M^2$ over GF(256) and check.
The polynomial $c(x) = \texttt{03}x^3 + \texttt{01}x^2 + \texttt{01}x + \texttt{02}$ must be invertible modulo $x^4 + 1$ for InvMixColumns to exist. Verify this and find the inverse polynomial.

In [ ]:

# Exercise 3 (Independent), Your code here

Summary

Concept	Key idea
MixColumns	Multiplies each 4-byte column by a fixed $4 \times 4$ matrix over GF(256)
Efficient constants	The matrix uses only 0x01, 0x02, and 0x03, so multiplication needs just XOR and `xtime`
MDS property	Changing $k$ input bytes forces at least $5 - k$ output bytes to change, giving optimal diffusion
InvMixColumns	Uses the inverse matrix with constants 0x09, 0x0B, 0x0D, 0x0E, making decryption slightly more expensive
Linear by design	MixColumns is linear over GF(256). The nonlinearity comes from SubBytes, and the combination of both provides security
Polynomial view	Equivalently, MixColumns is multiplication by a fixed polynomial mod $x^4 + 1$ in GF(256) $[x]$

Crypto foreshadowing: In notebook 03f, we assemble all four AES round operations, SubBytes, ShiftRows, MixColumns, and AddRoundKey, into a complete AES round. You'll see how confusion (SubBytes) and diffusion (ShiftRows + MixColumns) combine with key mixing (AddRoundKey) to create a cipher that has resisted attack for over 25 years.

Next: Full AES Round, putting SubBytes, ShiftRows, MixColumns, and AddRoundKey together.