Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
duyuefeng0708
GitHub Repository: duyuefeng0708/Cryptography-From-First-Principle
Path: blob/main/foundations/03-galois-fields-aes/break/ecb-mode-pattern-leak.ipynb
483 views
unlisted
Kernel: SageMath 10.0

Break: ECB Mode Pattern Leakage

Module 03 | Breaking Weak Parameters

Identical plaintext blocks produce identical ciphertext blocks. This is catastrophic.

Why This Matters

A block cipher like AES encrypts fixed-size blocks (16 bytes each). But real messages are longer than one block. A mode of operation defines how to apply the block cipher to multi-block messages.

The simplest mode is ECB (Electronic Codebook): encrypt each block independently with the same key. This sounds reasonable, but it has a fatal flaw:

If Pi=Pj, then Ci=Cj\text{If } P_i = P_j \text{, then } C_i = C_j

Identical plaintext blocks produce identical ciphertext blocks. Any pattern in the plaintext is preserved in the ciphertext, even though the individual block values are scrambled. An attacker learns the structure of your message without decrypting a single byte.

The Scenario

We'll use a toy block cipher on 8-bit blocks (1-byte blocks) to make the patterns visible. The cipher is the AES S-box itself --- a bijection on single bytes that provides good confusion but (in ECB mode) zero diffusion across blocks.

We'll encrypt structured messages and see how the structure leaks through.

# === Setup: Build the AES S-box as our toy block cipher === R.<x> = GF(2)[] F.<a> = GF(2^8, modulus=x^8 + x^4 + x^3 + x + 1) def byte_to_gf(b): return sum(GF(2)((b >> i) & 1) * a^i for i in range(8)) def gf_to_byte(elem): p = elem.polynomial() return sum(int(p[i]) << i for i in range(8)) # Build S-box A_mat = matrix(GF(2), [ [1,0,0,0,1,1,1,1],[1,1,0,0,0,1,1,1],[1,1,1,0,0,0,1,1],[1,1,1,1,0,0,0,1], [1,1,1,1,1,0,0,0],[0,1,1,1,1,1,0,0],[0,0,1,1,1,1,1,0],[0,0,0,1,1,1,1,1] ]) c_vec = vector(GF(2), [(0x63 >> i) & 1 for i in range(8)]) SBOX = [0] * 256 INV_SBOX = [0] * 256 for b in range(256): if b == 0: inv_bits = vector(GF(2), [0]*8) else: inv_byte = gf_to_byte(byte_to_gf(b)^(-1)) inv_bits = vector(GF(2), [(inv_byte >> i) & 1 for i in range(8)]) result_bits = A_mat * inv_bits + c_vec SBOX[b] = sum(int(result_bits[i]) << i for i in range(8)) INV_SBOX[SBOX[b]] = b def encrypt_block(b): """Toy block cipher: encrypt one byte using the AES S-box.""" return SBOX[b] def decrypt_block(b): """Toy block cipher: decrypt one byte.""" return INV_SBOX[b] print('Toy block cipher ready (AES S-box on 8-bit blocks).') print(f'Example: encrypt(0x41) = 0x{encrypt_block(0x41):02X}') print(f' decrypt(0x{encrypt_block(0x41):02X}) = 0x{decrypt_block(encrypt_block(0x41)):02X}')

Step 1: ECB Mode Encryption

In ECB mode, we encrypt each block independently:

Ci=EK(Pi)C_i = E_K(P_i)

No chaining, no IV, no interaction between blocks. Each block is a standalone encryption.

def ecb_encrypt(plaintext_bytes): """Encrypt a list of bytes in ECB mode.""" return [encrypt_block(b) for b in plaintext_bytes] def ecb_decrypt(ciphertext_bytes): """Decrypt a list of bytes in ECB mode.""" return [decrypt_block(b) for b in ciphertext_bytes] # Encrypt a message with repeating structure message = 'AAAA BBBB AAAA CCCC AAAA BBBB' plaintext = [ord(c) for c in message] ciphertext = ecb_encrypt(plaintext) print(f'Plaintext: {message}') print(f'PT bytes: {" ".join(f"{b:02X}" for b in plaintext)}') print(f'CT bytes: {" ".join(f"{b:02X}" for b in ciphertext)}') print() # Highlight the pattern preservation print('Pattern analysis:') print(f' PT "A" (0x41) always encrypts to 0x{encrypt_block(0x41):02X}') print(f' PT "B" (0x42) always encrypts to 0x{encrypt_block(0x42):02X}') print(f' PT " " (0x20) always encrypts to 0x{encrypt_block(0x20):02X}') print() print('The ciphertext has the SAME repetition structure as the plaintext!')
# Visualize with a larger structured message: a simple 16x16 "image" # Create a toy grayscale image with clear structure # Build a 16x16 image with vertical stripes and a block pattern width, height = 32, 32 image = [] for row in range(height): for col in range(width): if row < 8: # Top band: alternating light/dark columns image.append(0x20 if col % 4 < 2 else 0xE0) elif row < 16: # Middle-upper band: solid medium gray image.append(0x80) elif row < 24: # Middle-lower band: checkerboard image.append(0x40 if (row + col) % 2 == 0 else 0xC0) else: # Bottom band: gradient image.append((col * 8) % 256) # Encrypt in ECB mode ecb_image = ecb_encrypt(image) print(f'Image size: {width}x{height} = {len(image)} bytes') print(f'Unique plaintext values: {len(set(image))}') print(f'Unique ciphertext values: {len(set(ecb_image))}') print() print('Plaintext image (hex, first 8 rows):') for row in range(8): print(' '.join(f'{image[row*width+col]:02X}' for col in range(min(16, width)))) print() print('ECB-encrypted image (hex, first 8 rows):') for row in range(8): print(' '.join(f'{ecb_image[row*width+col]:02X}' for col in range(min(16, width))))

Step 2: Visualize the Pattern Leakage

Even though individual byte values are different (the S-box scrambled them), the pattern structure is perfectly preserved. Let's visualize this with a histogram and a structural comparison.

# Block frequency analysis: does the ciphertext reveal structure? from collections import Counter pt_counts = Counter(image) ct_counts = Counter(ecb_image) print('=== Block Frequency Analysis ===') print() print('Plaintext byte frequencies (top 10):') for val, count in pt_counts.most_common(10): bar = '#' * (count // 4) print(f' 0x{val:02X}: {count:3d} {bar}') print() print('ECB ciphertext byte frequencies (top 10):') for val, count in ct_counts.most_common(10): bar = '#' * (count // 4) print(f' 0x{val:02X}: {count:3d} {bar}') print() print('Observation: the FREQUENCY DISTRIBUTION is identical!') print('The S-box just relabels the bars --- their heights don\'t change.') print() # Verify: sorted frequency lists should match pt_freqs = sorted(pt_counts.values(), reverse=True) ct_freqs = sorted(ct_counts.values(), reverse=True) print(f'Sorted frequency lists match: {pt_freqs == ct_freqs}')
# Structural leakage: detect repeating blocks print('=== Detecting Repeating Blocks ===') print() # An attacker doesn't know what the bytes mean, but can detect repetitions def detect_patterns(data, block_size=1): """Detect positions of repeated blocks.""" seen = {} repeats = 0 for i in range(0, len(data), block_size): block = tuple(data[i:i+block_size]) if block in seen: repeats += 1 else: seen[block] = i return repeats, len(seen) pt_reps, pt_unique = detect_patterns(image) ct_reps, ct_unique = detect_patterns(ecb_image) print(f'Plaintext: {pt_unique} unique blocks, {pt_reps} repeated positions') print(f'Ciphertext: {ct_unique} unique blocks, {ct_reps} repeated positions') print() print(f'The ciphertext has exactly the same repetition count as the plaintext.') print(f'An attacker can recover the STRUCTURE of the plaintext from ECB ciphertext.') print() # Demonstrate: attacker can tell which blocks are equal print('Attacker\'s view (block equality map):') print('Encoding each unique ciphertext block as a letter...') block_to_label = {} label_idx = 0 labels = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop' for b in ecb_image: if b not in block_to_label: block_to_label[b] = labels[label_idx % len(labels)] label_idx += 1 print('First 4 rows of 32-byte image, labeled by block identity:') for row in range(4): row_data = ecb_image[row*width:(row+1)*width] print(' ' + ''.join(block_to_label[b] for b in row_data)) print() print('Clear repeating patterns visible, even without knowing the key!')

Step 3: Compare with CBC Mode

In CBC (Cipher Block Chaining) mode, each block is XORed with the previous ciphertext block before encryption:

Ci=EK(PiCi1),C0=EK(P0IV)C_i = E_K(P_i \oplus C_{i-1}), \quad C_0 = E_K(P_0 \oplus \text{IV})

The chaining means identical plaintext blocks produce different ciphertext blocks (unless Ci1C_{i-1} also happens to be the same, which is astronomically unlikely).

random.seed(42) # reproducible def cbc_encrypt(plaintext_bytes, iv): """Encrypt a list of bytes in CBC mode.""" ciphertext = [] prev = iv for b in plaintext_bytes: # XOR with previous ciphertext block, then encrypt encrypted = encrypt_block(b ^^ prev) ciphertext.append(encrypted) prev = encrypted return ciphertext # Encrypt the same structured image with CBC iv = randint(0, 255) cbc_image = cbc_encrypt(image, iv) print('=== ECB vs CBC Comparison ===') print() print(f'Same plaintext image ({width}x{height}), two modes:') print() # Frequency analysis cbc_counts = Counter(cbc_image) print(f'ECB unique ciphertext values: {len(ct_counts)}') print(f'CBC unique ciphertext values: {len(cbc_counts)}') print() print('ECB ciphertext (first 4 rows):') for row in range(4): print(' '.join(f'{ecb_image[row*width+col]:02X}' for col in range(min(16, width)))) print() print('CBC ciphertext (first 4 rows):') for row in range(4): print(' '.join(f'{cbc_image[row*width+col]:02X}' for col in range(min(16, width)))) print() # Repetition comparison cbc_reps, cbc_unique = detect_patterns(cbc_image) print(f'Repeated block positions: ECB = {ct_reps}, CBC = {cbc_reps}') print(f'CBC breaks the pattern: chaining makes identical plaintexts produce different ciphertexts.')
# Quantify the information leak: mutual information between # plaintext block identity and ciphertext block identity def block_equality_vector(data): """For each pair (i,j), record whether data[i] == data[j].""" n = len(data) equalities = [] for i in range(min(n, 200)): # sample to keep tractable for j in range(i+1, min(n, 200)): equalities.append(1 if data[i] == data[j] else 0) return equalities pt_eq = block_equality_vector(image) ecb_eq = block_equality_vector(ecb_image) cbc_eq = block_equality_vector(cbc_image) # Correlation: do equal-plaintext pairs correspond to equal-ciphertext pairs? ecb_match = sum(1 for a, b in zip(pt_eq, ecb_eq) if a == b) cbc_match = sum(1 for a, b in zip(pt_eq, cbc_eq) if a == b) total = len(pt_eq) print('=== Pattern Correlation ===') print() print(f'Do equal plaintext blocks produce equal ciphertext blocks?') print(f' ECB: {ecb_match}/{total} pairs match ({100*ecb_match/total:.1f}%)') print(f' CBC: {cbc_match}/{total} pairs match ({100*cbc_match/total:.1f}%)') print() print(f'ECB: 100% correlation = complete pattern leakage.') print(f'CBC: ~50% correlation = no meaningful leakage (random chance).')

The Fix: Chained Modes of Operation

Never use ECB for multi-block messages. Use a mode that chains blocks together:

ModeHow it worksAdvantage
CBCCi=EK(PiCi1)C_i = E_K(P_i \oplus C_{i-1})Hides patterns, widely deployed
CTRCi=PiEK(noncei)C_i = P_i \oplus E_K(\text{nonce} | i)Parallelizable, random access
GCMCTR + GHASH authenticationEncryption + integrity (gold standard)

All of these ensure that identical plaintext blocks produce different ciphertext blocks.

AES-GCM is the standard choice in TLS 1.3, and we'll explore it in the Connect notebook on AES-GCM authenticated encryption.

Exercises

Exercise 1

Encrypt the string 'HELLO HELLO HELLO HELLO HELLO' in both ECB and CBC mode. How many repeated ciphertext blocks does each mode produce?

Exercise 2

Implement CTR (Counter) mode: Ci=PiEK(nonce+i)C_i = P_i \oplus E_K(\text{nonce} + i). Encrypt the same structured image. Does it hide patterns like CBC?

Exercise 3

In CBC mode, what happens if you reuse the same IV for two different messages that share the same first block? What does the attacker learn from C1C1C_1 \oplus C_1' where C1=EK(P1IV)C_1 = E_K(P_1 \oplus \text{IV}) and C1=EK(P1IV)C_1' = E_K(P_1' \oplus \text{IV})?

# Exercise space # Exercise 1: Encrypt and compare msg = [ord(c) for c in 'HELLO HELLO HELLO HELLO HELLO'] ecb_msg = ecb_encrypt(msg) cbc_msg = cbc_encrypt(msg, randint(0, 255)) ecb_reps_msg, _ = detect_patterns(ecb_msg) cbc_reps_msg, _ = detect_patterns(cbc_msg) print(f'ECB repeated blocks: {ecb_reps_msg}') print(f'CBC repeated blocks: {cbc_reps_msg}') # Exercise 2: Implement CTR mode # TODO: def ctr_encrypt(plaintext_bytes, nonce): ... # Exercise 3: CBC IV reuse analysis # TODO

Summary

PropertyECBCBC / CTR / GCM
Equal plaintext blocks → equal ciphertext?Yes (fatal)No
Pattern leakageCompleteNone
Block independenceEach block isolatedBlocks chained together
Safe for multi-block messages?NoYes

Key takeaways:

  • ECB mode encrypts blocks independently, so patterns in the plaintext are perfectly preserved in the ciphertext.

  • An attacker can detect which plaintext blocks are equal without knowing the key.

  • Chained modes (CBC, CTR, GCM) break this by making each ciphertext block depend on more than just its plaintext block.

  • This is why ECB should never be used for messages longer than one block.

  • The underlying block cipher (AES) is perfectly fine --- the weakness is entirely in the mode.


Back to Module 03: Galois Fields and AES