Connect: RSA-OAEP Padding

Module 04 | Real-World Connections

Why textbook RSA is never used in practice and how OAEP fixes it.

Introduction

Textbook RSA --- $c = m^e \bmod n$ --- is mathematically elegant but completely unsuitable for real-world encryption. It has two fatal flaws:

Deterministic: the same message always encrypts to the same ciphertext
Malleable: an attacker can manipulate ciphertexts to produce related plaintexts

Every real RSA deployment uses padding to fix these problems. The modern standard is OAEP (Optimal Asymmetric Encryption Padding), specified in PKCS#1 v2.2.

This notebook demonstrates why textbook RSA fails and how OAEP fixes each flaw.

Problem 1: Deterministic Encryption

Textbook RSA is a deterministic function: given the same $m$ , $e$ , and $n$ , it always produces the same $c$ .

This means an attacker can:

Detect when the same message is sent twice
Build a dictionary mapping known plaintexts to ciphertexts
Distinguish between two candidate messages (breaks IND-CPA security)

In [ ]:

# === Setup: generate RSA key pair ===

set_random_seed(2024)

p = random_prime(2^50, lbound=2^49)
q = random_prime(2^50, lbound=2^49)
n = p * q
phi_n = (p - 1) * (q - 1)
e = 65537
d = inverse_mod(e, phi_n)

print(f'RSA key pair (n ~ {n.nbits()} bits):')
print(f'  Public:  (n={n}, e={e})')
print(f'  Private: d={d}')

In [ ]:

# === Demonstration: deterministic encryption leaks equality ===

# Suppose the plaintext is a vote: 0 = "No", 1 = "Yes"
m_yes = 1
m_no = 0

c_yes = power_mod(m_yes, e, n)
c_no = power_mod(m_no, e, n)

print('Encrypted votes (textbook RSA):')
print(f'  "Yes" (m=1): c = {c_yes}')
print(f'  "No"  (m=0): c = {c_no}')
print()

# Alice votes
alice_vote = power_mod(1, e, n)
bob_vote = power_mod(1, e, n)
carol_vote = power_mod(0, e, n)

print('Intercepted encrypted votes:')
print(f'  Alice: {alice_vote}')
print(f'  Bob:   {bob_vote}')
print(f'  Carol: {carol_vote}')
print()

# Attacker can tell who voted the same way!
print('Attacker observes:')
print(f'  Alice and Bob have the SAME ciphertext: {alice_vote == bob_vote}')
print(f'  Carol has a DIFFERENT ciphertext: {alice_vote != carol_vote}')
print()
print('Without decrypting, the attacker knows Alice and Bob voted the same way.')
print('With only 2 possible messages, the attacker can try both and learn everything:')
print(f'  Encrypt(1) = {c_yes} matches Alice -> Alice voted Yes')
print(f'  Encrypt(0) = {c_no} matches Carol -> Carol voted No')

Problem 2: Malleability (Chosen Ciphertext Attack)

Textbook RSA is multiplicatively homomorphic:

\text{Enc}(m_1) \cdot \text{Enc}(m_2) = m_1^e \cdot m_2^e = (m_1 \cdot m_2)^e = \text{Enc}(m_1 \cdot m_2)

An attacker who sees $c = m^e \bmod n$ can compute:

$c' = 2^e \cdot c \bmod n = (2m)^e \bmod n$ --- an encryption of $2m$ !

Without knowing $m$ , the attacker can manipulate the ciphertext to produce a related plaintext. This enables devastating chosen-ciphertext attacks.

In [ ]:

# === Demonstration: multiplicative malleability ===

m = 42  # Secret message
c = power_mod(m, e, n)

print(f'Original: m = {m}, c = Enc({m}) = {c}')
print()

# Attacker computes c' = 2^e * c mod n
c_prime = (power_mod(2, e, n) * c) % n

# Decrypt c' to see what we get
m_prime = power_mod(c_prime, d, n)

print(f'Attacker computes: c\' = 2^e * c mod n = {c_prime}')
print(f'Decryption of c\':  m\' = {m_prime}')
print(f'Is m\' = 2 * m?     {m_prime} = 2 * {m} = {2*m}? {m_prime == 2*m}')
print()

# More generally: attacker can multiply by any known factor
factor_val = 7
c_scaled = (power_mod(factor_val, e, n) * c) % n
m_scaled = power_mod(c_scaled, d, n)

print(f'Multiply by {factor_val}: Enc({factor_val}) * Enc({m}) mod n')
print(f'Decrypts to: {m_scaled} = {factor_val} * {m} = {factor_val * m}? {m_scaled == factor_val * m}')
print()
print('The attacker can manipulate ciphertexts without knowing the plaintext.')
print('This breaks any protocol that assumes ciphertexts are tamper-proof.')

The Bleichenbacher Attack (1998)

Daniel Bleichenbacher demonstrated a practical attack against RSA PKCS#1 v1.5 encryption padding. The attacker sends millions of modified ciphertexts to a server and observes whether the server reports a padding error. By exploiting the multiplicative malleability, each response leaks a small amount of information about the plaintext, eventually recovering it entirely.

This attack affected real SSL/TLS implementations and is the primary motivation for OAEP.

OAEP: Optimal Asymmetric Encryption Padding

OAEP (Bellare and Rogaway, 1994) pads the message with randomness using a Feistel-like structure before applying RSA:

Input:  message m, random seed r

Step 1: X = m || 0...0  XOR  G(r)     (G is a hash/mask-generation function)
Step 2: Y = r  XOR  H(X)              (H is another hash function)
Step 3: padded = X || Y
Step 4: c = padded^e mod n             (standard RSA)

To decrypt:

Compute $\text{padded} = c^d \bmod n$
Split into $X \| Y$
Recover $r = Y \oplus H(X)$
Recover $m \| 0\ldots0 = X \oplus G(r)$
Check the zero padding; reject if invalid

In [ ]:

# === Toy OAEP implementation ===

import hashlib
import os

def mgf(seed_bytes, length):
    """Simplified mask generation function (MGF1 with SHA-256)."""
    output = b''
    counter = 0
    while len(output) < length:
        c_bytes = counter.to_bytes(4, 'big')
        output += hashlib.sha256(seed_bytes + c_bytes).digest()
        counter += 1
    return output[:length]

def xor_bytes(a, b):
    """XOR two byte strings of equal length."""
    return bytes(x ^^ y for x, y in zip(a, b))

def oaep_pad(message_bytes, key_size_bytes, seed=None):
    """Simplified OAEP padding."""
    hash_len = 16  # Use 16 bytes for our toy version
    
    # Pad message with zeros to fill the data block
    db_len = key_size_bytes - hash_len - 1
    padding_len = db_len - len(message_bytes) - 1
    assert padding_len >= 0, 'Message too long'
    
    # Data block: [zero padding] [0x01] [message]
    db = bytes(padding_len) + bytes([0x01]) + message_bytes
    
    # Random seed
    if seed is None:
        seed = os.urandom(hash_len)
    
    # Feistel-like structure
    db_mask = mgf(seed, db_len)      # G(r)
    masked_db = xor_bytes(db, db_mask)  # X = DB xor G(r)
    
    seed_mask = mgf(masked_db, hash_len)  # H(X)
    masked_seed = xor_bytes(seed, seed_mask)  # Y = r xor H(X)
    
    # Final padded message: 0x00 || masked_seed || masked_db
    padded = bytes([0x00]) + masked_seed + masked_db
    return padded, seed

def oaep_unpad(padded_bytes, key_size_bytes):
    """Simplified OAEP unpadding."""
    hash_len = 16
    db_len = key_size_bytes - hash_len - 1
    
    # Split
    assert padded_bytes[0] == 0x00
    masked_seed = padded_bytes[1:1+hash_len]
    masked_db = padded_bytes[1+hash_len:]
    
    # Reverse Feistel
    seed_mask = mgf(masked_db, hash_len)
    seed = xor_bytes(masked_seed, seed_mask)
    
    db_mask = mgf(seed, db_len)
    db = xor_bytes(masked_db, db_mask)
    
    # Find the 0x01 separator
    sep_idx = db.index(0x01)
    message = db[sep_idx+1:]
    return message

# Test the OAEP padding
key_bytes = (n.nbits() + 7) // 8
msg = b'Hello!'
padded, seed_used = oaep_pad(msg, key_bytes)

print(f'Message:        {msg}')
print(f'Key size:       {key_bytes} bytes')
print(f'Random seed:    {seed_used.hex()}')
print(f'Padded (hex):   {padded.hex()}')
print(f'Padded length:  {len(padded)} bytes')
print()

# Verify unpadding
recovered_msg = oaep_unpad(padded, key_bytes)
print(f'Unpadded:       {recovered_msg}')
print(f'Match: {recovered_msg == msg}')

In [ ]:

# === OAEP defeats determinism ===

def rsa_oaep_encrypt(message_bytes, e, n):
    """Encrypt with RSA-OAEP."""
    key_bytes = (n.nbits() + 7) // 8
    padded, _ = oaep_pad(message_bytes, key_bytes)
    m_int = Integer(int.from_bytes(padded, 'big'))
    return power_mod(m_int, e, n)

# Encrypt the same message multiple times
msg = b'Vote: Yes'
c1 = rsa_oaep_encrypt(msg, e, n)
c2 = rsa_oaep_encrypt(msg, e, n)
c3 = rsa_oaep_encrypt(msg, e, n)

print(f'Same message encrypted 3 times with RSA-OAEP:')
print(f'  c1 = {c1}')
print(f'  c2 = {c2}')
print(f'  c3 = {c3}')
print()
print(f'All different? c1 != c2: {c1 != c2}, c1 != c3: {c1 != c3}, c2 != c3: {c2 != c3}')
print()
print('Each encryption uses fresh randomness, so the same message')
print('produces a DIFFERENT ciphertext every time.')
print('The attacker can no longer detect when the same message is sent twice.')

In [ ]:

# === OAEP defeats malleability ===

# Encrypt a message with OAEP
msg = b'secret'
key_bytes = (n.nbits() + 7) // 8
padded_bytes, _ = oaep_pad(msg, key_bytes)
m_int = Integer(int.from_bytes(padded_bytes, 'big'))
c = power_mod(m_int, e, n)

print(f'Original ciphertext: c = {c}')
print()

# Attacker tries to multiply: c' = 2^e * c mod n
c_modified = (power_mod(2, e, n) * c) % n
m_modified_int = power_mod(c_modified, d, n)
m_modified_bytes = int(m_modified_int).to_bytes(key_bytes, 'big')

print(f'Attacker computes: c\' = 2^e * c mod n = {c_modified}')
print(f'Decrypted integer: {m_modified_int}')
print(f'Decrypted bytes (hex): {m_modified_bytes.hex()}')
print()

# Try to unpad - the Feistel structure will produce garbage
try:
    result = oaep_unpad(m_modified_bytes, key_bytes)
    print(f'Unpadded: {result}')
    print('WARNING: Unpadding unexpectedly succeeded (unlikely with toy params)')
except Exception as ex:
    print(f'OAEP unpadding FAILED: {ex}')
    print()
    print('The Feistel structure in OAEP means that multiplying the ciphertext')
    print('by 2^e does NOT multiply the underlying message by 2.')
    print('Instead, it corrupts the padding structure, and the recipient')
    print('rejects the ciphertext as invalid.')

Concept Map: Module 04 and OAEP

Module 04 Concept	RSA-OAEP Application
RSA encryption ( $m^e \bmod n$ )	Applied to the OAEP-padded message, not raw plaintext
RSA decryption ( $c^d \bmod n$ )	Recovers padded message; OAEP unpadding extracts plaintext
CRT (Notebook 04d)	RSA-CRT optimization: compute $c^d \bmod p$ and $c^d \bmod q$ separately
Euler's theorem (Notebook 04c)	Still guarantees $(m^e)^d \equiv m$ ; OAEP adds security on top

OAEP transforms textbook RSA (which is a trapdoor permutation) into a proper encryption scheme with provable security against chosen-ciphertext attacks.

Summary

Property	Textbook RSA	RSA-OAEP
Deterministic?	Yes (same $m$ always gives same $c$ )	No (fresh randomness each time)
Malleable?	Yes ( $c' = r^e \cdot c$ encrypts $r \cdot m$ )	No (Feistel structure prevents manipulation)
IND-CPA secure?	No	Yes
IND-CCA2 secure?	No	Yes (in the random oracle model)
Used in practice?	Never	Yes (PKCS#1 v2.2, RFC 8017)

Key takeaways:

Textbook RSA is a mathematical building block, not a cryptosystem.
Determinism lets an attacker detect equal plaintexts and perform dictionary attacks.
Malleability lets an attacker manipulate ciphertexts without knowing the plaintext.
OAEP fixes both problems by padding the message with randomness through a Feistel-like structure.
Every real RSA encryption uses OAEP (or is being migrated to it).
The underlying RSA math from Module 04 is unchanged --- OAEP adds a layer of security on top.

Back to Module 04: Number Theory and RSA