GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/03-galois-fields-aes/break/ecb-mode-pattern-leak.ipynb
⁴⁸³ views

unlisted

Kernel: SageMath 10.0

Break: ECB Mode Pattern Leakage

Module 03 | Breaking Weak Parameters

Identical plaintext blocks produce identical ciphertext blocks. This is catastrophic.

Why This Matters

A block cipher like AES encrypts fixed-size blocks (16 bytes each). But real messages are longer than one block. A mode of operation defines how to apply the block cipher to multi-block messages.

The simplest mode is ECB (Electronic Codebook): encrypt each block independently with the same key. This sounds reasonable, but it has a fatal flaw:

\text{If } P_i = P_j \text{, then } C_i = C_j

Identical plaintext blocks produce identical ciphertext blocks. Any pattern in the plaintext is preserved in the ciphertext, even though the individual block values are scrambled. An attacker learns the structure of your message without decrypting a single byte.

The Scenario

We'll use a toy block cipher on 8-bit blocks (1-byte blocks) to make the patterns visible. The cipher is the AES S-box itself --- a bijection on single bytes that provides good confusion but (in ECB mode) zero diffusion across blocks.

We'll encrypt structured messages and see how the structure leaks through.

In [ ]:

# === Setup: Build the AES S-box as our toy block cipher ===
R.<x> = GF(2)[]
F.<a> = GF(2^8, modulus=x^8 + x^4 + x^3 + x + 1)

def byte_to_gf(b):
    return sum(GF(2)((b >> i) & 1) * a^i for i in range(8))

def gf_to_byte(elem):
    p = elem.polynomial()
    return sum(int(p[i]) << i for i in range(8))

# Build S-box
A_mat = matrix(GF(2), [
    [1,0,0,0,1,1,1,1],[1,1,0,0,0,1,1,1],[1,1,1,0,0,0,1,1],[1,1,1,1,0,0,0,1],
    [1,1,1,1,1,0,0,0],[0,1,1,1,1,1,0,0],[0,0,1,1,1,1,1,0],[0,0,0,1,1,1,1,1]
])
c_vec = vector(GF(2), [(0x63 >> i) & 1 for i in range(8)])

SBOX = [0] * 256
INV_SBOX = [0] * 256
for b in range(256):
    if b == 0:
        inv_bits = vector(GF(2), [0]*8)
    else:
        inv_byte = gf_to_byte(byte_to_gf(b)^(-1))
        inv_bits = vector(GF(2), [(inv_byte >> i) & 1 for i in range(8)])
    result_bits = A_mat * inv_bits + c_vec
    SBOX[b] = sum(int(result_bits[i]) << i for i in range(8))
    INV_SBOX[SBOX[b]] = b

def encrypt_block(b):
    """Toy block cipher: encrypt one byte using the AES S-box."""
    return SBOX[b]

def decrypt_block(b):
    """Toy block cipher: decrypt one byte."""
    return INV_SBOX[b]

print('Toy block cipher ready (AES S-box on 8-bit blocks).')
print(f'Example: encrypt(0x41) = 0x{encrypt_block(0x41):02X}')
print(f'         decrypt(0x{encrypt_block(0x41):02X}) = 0x{decrypt_block(encrypt_block(0x41)):02X}')

Step 1: ECB Mode Encryption

In ECB mode, we encrypt each block independently:

C_i = E_K(P_i)

No chaining, no IV, no interaction between blocks. Each block is a standalone encryption.

In [ ]:

def ecb_encrypt(plaintext_bytes):
    """Encrypt a list of bytes in ECB mode."""
    return [encrypt_block(b) for b in plaintext_bytes]

def ecb_decrypt(ciphertext_bytes):
    """Decrypt a list of bytes in ECB mode."""
    return [decrypt_block(b) for b in ciphertext_bytes]

# Encrypt a message with repeating structure
message = 'AAAA BBBB AAAA CCCC AAAA BBBB'
plaintext = [ord(c) for c in message]
ciphertext = ecb_encrypt(plaintext)

print(f'Plaintext:  {message}')
print(f'PT bytes:   {" ".join(f"{b:02X}" for b in plaintext)}')
print(f'CT bytes:   {" ".join(f"{b:02X}" for b in ciphertext)}')
print()

# Highlight the pattern preservation
print('Pattern analysis:')
print(f'  PT "A" (0x41) always encrypts to 0x{encrypt_block(0x41):02X}')
print(f'  PT "B" (0x42) always encrypts to 0x{encrypt_block(0x42):02X}')
print(f'  PT " " (0x20) always encrypts to 0x{encrypt_block(0x20):02X}')
print()
print('The ciphertext has the SAME repetition structure as the plaintext!')

In [ ]:

# Visualize with a larger structured message: a simple 16x16 "image"
# Create a toy grayscale image with clear structure

# Build a 16x16 image with vertical stripes and a block pattern
width, height = 32, 32
image = []
for row in range(height):
    for col in range(width):
        if row < 8:
            # Top band: alternating light/dark columns
            image.append(0x20 if col % 4 < 2 else 0xE0)
        elif row < 16:
            # Middle-upper band: solid medium gray
            image.append(0x80)
        elif row < 24:
            # Middle-lower band: checkerboard
            image.append(0x40 if (row + col) % 2 == 0 else 0xC0)
        else:
            # Bottom band: gradient
            image.append((col * 8) % 256)

# Encrypt in ECB mode
ecb_image = ecb_encrypt(image)

print(f'Image size: {width}x{height} = {len(image)} bytes')
print(f'Unique plaintext values:  {len(set(image))}')
print(f'Unique ciphertext values: {len(set(ecb_image))}')
print()
print('Plaintext image (hex, first 8 rows):')
for row in range(8):
    print(' '.join(f'{image[row*width+col]:02X}' for col in range(min(16, width))))
print()
print('ECB-encrypted image (hex, first 8 rows):')
for row in range(8):
    print(' '.join(f'{ecb_image[row*width+col]:02X}' for col in range(min(16, width))))

Step 2: Visualize the Pattern Leakage

Even though individual byte values are different (the S-box scrambled them), the pattern structure is perfectly preserved. Let's visualize this with a histogram and a structural comparison.

In [ ]:

# Block frequency analysis: does the ciphertext reveal structure?
from collections import Counter

pt_counts = Counter(image)
ct_counts = Counter(ecb_image)

print('=== Block Frequency Analysis ===')
print()
print('Plaintext byte frequencies (top 10):')
for val, count in pt_counts.most_common(10):
    bar = '#' * (count // 4)
    print(f'  0x{val:02X}: {count:3d}  {bar}')
print()

print('ECB ciphertext byte frequencies (top 10):')
for val, count in ct_counts.most_common(10):
    bar = '#' * (count // 4)
    print(f'  0x{val:02X}: {count:3d}  {bar}')
print()

print('Observation: the FREQUENCY DISTRIBUTION is identical!')
print('The S-box just relabels the bars --- their heights don\'t change.')
print()

# Verify: sorted frequency lists should match
pt_freqs = sorted(pt_counts.values(), reverse=True)
ct_freqs = sorted(ct_counts.values(), reverse=True)
print(f'Sorted frequency lists match: {pt_freqs == ct_freqs}')

In [ ]:

# Structural leakage: detect repeating blocks
print('=== Detecting Repeating Blocks ===')
print()

# An attacker doesn't know what the bytes mean, but can detect repetitions
def detect_patterns(data, block_size=1):
    """Detect positions of repeated blocks."""
    seen = {}
    repeats = 0
    for i in range(0, len(data), block_size):
        block = tuple(data[i:i+block_size])
        if block in seen:
            repeats += 1
        else:
            seen[block] = i
    return repeats, len(seen)

pt_reps, pt_unique = detect_patterns(image)
ct_reps, ct_unique = detect_patterns(ecb_image)

print(f'Plaintext:  {pt_unique} unique blocks, {pt_reps} repeated positions')
print(f'Ciphertext: {ct_unique} unique blocks, {ct_reps} repeated positions')
print()
print(f'The ciphertext has exactly the same repetition count as the plaintext.')
print(f'An attacker can recover the STRUCTURE of the plaintext from ECB ciphertext.')
print()

# Demonstrate: attacker can tell which blocks are equal
print('Attacker\'s view (block equality map):')
print('Encoding each unique ciphertext block as a letter...')
block_to_label = {}
label_idx = 0
labels = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop'
for b in ecb_image:
    if b not in block_to_label:
        block_to_label[b] = labels[label_idx % len(labels)]
        label_idx += 1

print('First 4 rows of 32-byte image, labeled by block identity:')
for row in range(4):
    row_data = ecb_image[row*width:(row+1)*width]
    print('  ' + ''.join(block_to_label[b] for b in row_data))
print()
print('Clear repeating patterns visible, even without knowing the key!')

Step 3: Compare with CBC Mode

In CBC (Cipher Block Chaining) mode, each block is XORed with the previous ciphertext block before encryption:

C_i = E_K(P_i \oplus C_{i-1}), \quad C_0 = E_K(P_0 \oplus \text{IV})

The chaining means identical plaintext blocks produce different ciphertext blocks (unless $C_{i-1}$ also happens to be the same, which is astronomically unlikely).

In [ ]:

random.seed(42)  # reproducible

def cbc_encrypt(plaintext_bytes, iv):
    """Encrypt a list of bytes in CBC mode."""
    ciphertext = []
    prev = iv
    for b in plaintext_bytes:
        # XOR with previous ciphertext block, then encrypt
        encrypted = encrypt_block(b ^^ prev)
        ciphertext.append(encrypted)
        prev = encrypted
    return ciphertext

# Encrypt the same structured image with CBC
iv = randint(0, 255)
cbc_image = cbc_encrypt(image, iv)

print('=== ECB vs CBC Comparison ===')
print()
print(f'Same plaintext image ({width}x{height}), two modes:')
print()

# Frequency analysis
cbc_counts = Counter(cbc_image)

print(f'ECB unique ciphertext values: {len(ct_counts)}')
print(f'CBC unique ciphertext values: {len(cbc_counts)}')
print()

print('ECB ciphertext (first 4 rows):')
for row in range(4):
    print(' '.join(f'{ecb_image[row*width+col]:02X}' for col in range(min(16, width))))
print()

print('CBC ciphertext (first 4 rows):')
for row in range(4):
    print(' '.join(f'{cbc_image[row*width+col]:02X}' for col in range(min(16, width))))
print()

# Repetition comparison
cbc_reps, cbc_unique = detect_patterns(cbc_image)
print(f'Repeated block positions: ECB = {ct_reps}, CBC = {cbc_reps}')
print(f'CBC breaks the pattern: chaining makes identical plaintexts produce different ciphertexts.')

In [ ]:

# Quantify the information leak: mutual information between
# plaintext block identity and ciphertext block identity

def block_equality_vector(data):
    """For each pair (i,j), record whether data[i] == data[j]."""
    n = len(data)
    equalities = []
    for i in range(min(n, 200)):  # sample to keep tractable
        for j in range(i+1, min(n, 200)):
            equalities.append(1 if data[i] == data[j] else 0)
    return equalities

pt_eq = block_equality_vector(image)
ecb_eq = block_equality_vector(ecb_image)
cbc_eq = block_equality_vector(cbc_image)

# Correlation: do equal-plaintext pairs correspond to equal-ciphertext pairs?
ecb_match = sum(1 for a, b in zip(pt_eq, ecb_eq) if a == b)
cbc_match = sum(1 for a, b in zip(pt_eq, cbc_eq) if a == b)
total = len(pt_eq)

print('=== Pattern Correlation ===')
print()
print(f'Do equal plaintext blocks produce equal ciphertext blocks?')
print(f'  ECB: {ecb_match}/{total} pairs match ({100*ecb_match/total:.1f}%)')
print(f'  CBC: {cbc_match}/{total} pairs match ({100*cbc_match/total:.1f}%)')
print()
print(f'ECB: 100% correlation = complete pattern leakage.')
print(f'CBC: ~50% correlation = no meaningful leakage (random chance).')

The Fix: Chained Modes of Operation

Never use ECB for multi-block messages. Use a mode that chains blocks together:

Mode	How it works	Advantage
CBC	$C_i = E_K(P_i \oplus C_{i-1})$	Hides patterns, widely deployed
CTR	$C_i = P_i \oplus E_K(\text{nonce} \| i)$	Parallelizable, random access
GCM	CTR + GHASH authentication	Encryption + integrity (gold standard)

All of these ensure that identical plaintext blocks produce different ciphertext blocks.

AES-GCM is the standard choice in TLS 1.3, and we'll explore it in the Connect notebook on AES-GCM authenticated encryption.

Exercises

Exercise 1

Encrypt the string 'HELLO HELLO HELLO HELLO HELLO' in both ECB and CBC mode. How many repeated ciphertext blocks does each mode produce?

Exercise 2

Implement CTR (Counter) mode: $C_i = P_i \oplus E_K(\text{nonce} + i)$ . Encrypt the same structured image. Does it hide patterns like CBC?

Exercise 3

In CBC mode, what happens if you reuse the same IV for two different messages that share the same first block? What does the attacker learn from $C_1 \oplus C_1'$ where $C_1 = E_K(P_1 \oplus \text{IV})$ and $C_1' = E_K(P_1' \oplus \text{IV})$ ?

In [ ]:

# Exercise space

# Exercise 1: Encrypt and compare
msg = [ord(c) for c in 'HELLO HELLO HELLO HELLO HELLO']
ecb_msg = ecb_encrypt(msg)
cbc_msg = cbc_encrypt(msg, randint(0, 255))

ecb_reps_msg, _ = detect_patterns(ecb_msg)
cbc_reps_msg, _ = detect_patterns(cbc_msg)
print(f'ECB repeated blocks: {ecb_reps_msg}')
print(f'CBC repeated blocks: {cbc_reps_msg}')

# Exercise 2: Implement CTR mode
# TODO: def ctr_encrypt(plaintext_bytes, nonce): ...

# Exercise 3: CBC IV reuse analysis
# TODO

Summary

Property	ECB	CBC / CTR / GCM
Equal plaintext blocks → equal ciphertext?	Yes (fatal)	No
Pattern leakage	Complete	None
Block independence	Each block isolated	Blocks chained together
Safe for multi-block messages?	No	Yes

Key takeaways:

ECB mode encrypts blocks independently, so patterns in the plaintext are perfectly preserved in the ciphertext.
An attacker can detect which plaintext blocks are equal without knowing the key.
Chained modes (CBC, CTR, GCM) break this by making each ciphertext block depend on more than just its plaintext block.
This is why ECB should never be used for messages longer than one block.
The underlying block cipher (AES) is perfectly fine --- the weakness is entirely in the mode.

Back to Module 03: Galois Fields and AES