Padding Blocks with PyCrypto in Google App Engine

I was thinking of a different kind of Zero.

Photo credit: Tora! Tora! Tora!.

Assume (for the sake of argument; no need to tell us why) that one day you find yourself working with Python in Google App Engine, using PyCrypto to encrypt secrets. Unless your plaintexts are always a multiple of 16 bytes in length, you are likely to run into this error:

ValueError: Input strings must be a multiple of 16 in length

The answer is to pad out your plaintext to an appropriate length, but the version of PyCrypto available in App Engine can’t do this for you.

Background

PyCrypto is one of the libraries that Google helpfully provides for you in App Engine (though you do need to specify in your app.yaml file that you want it loaded). They do this not merely out of the kindness of their hearts, but because PyCrypto relies on a C extension, and Google does not want you compiling arbitrary C extensions on their servers. This is quite wise of them. It does mean that you are locked in to the version of PyCrypto that Google has chosen to include in App Engine, though, and that version (2.6) happens to be outdated.

A newer version of PyCrypto was released in 2013, and it includes some significant improvements, such as additional block cipher modes and helper functions for handling block padding. If you are using, say, AES with CBC, you must add correct padding to the final block of plaintext in some situations. Sadly, the newer version (2.7a1) is marked as “experimental”, and the project is apparently dead. PyCrypto has been forked as PyCryptoDome, which is currently maintained, but which isn’t usable in standard App Engine environments.

While it is unfortunate that Google App Engine remains locked to the older version of PyCrypto that lacks the new features, the padding helpers are written cleanly enough that they can easily be included in your project.

Disclaimers, warnings, and/or provisos:

We are not professional cryptographers or cryptanalysts. All blog posts dealing with cryptography that are not written by professional cryptographers or cryptanalysts should warn you against:

If you have no other choice, pay an expert instead of relying on blog posts.

The padding problem

Since you are still reading, you clearly have no choice but to work with AES in CBC mode.

Your encryption code might look like:

from Crypto.Cipher import AES
from Crypto import Random

secret_key = Random.new().read(16)
iv = Random.new().read(16)
cipher = AES.new(secret_key, AES.MODE_CBC, iv)

plaintext = "Don't forget to save the secret key and the iv."
ciphertext = cipher.encrypt(plaintext)

When you try to encrypt a plaintext with a length that is not an exact multiple of the block size, you get that error:

ValueError: Input strings must be a multiple of 16 in length

The call to action seems clear: you must add additional bytes to your plaintext. But what bytes should you add? How can you ensure that these “padding” bytes are distinguishable from the actual message? Filling out the plaintext with zeros, for instance, won’t work: what if the last byte of your message is a zero?

Fortunately, you don’t need to devise an unambiguous way to pad a plaintext block, because there are several standard schemes for doing so. The padding helpers in the experimental version of PyCrypto and in PyCryptoDome support three of the most common schemes. If you can use one of these versions in your project, then you already have access to the padding helpers. Otherwise, as when writing for Google App Engine, you can copy the source of the padding module into your own project.

With padding.py copied to the appropriate location, you can use the helper functions thusly:

from Crypto.Cipher import AES
from Crypto import Random
from padding import pad, unpad

secret_key = Random.new().read(16)
# initialization vectors should always be random, and never reused.
iv = Random.new().read(16)
cipher = AES.new(secret_key, AES.MODE_CBC, iv)

plaintext = "Don't forget to save the secret key and the iv."
padded = pad(plaintext, AES.block_size, style='pkcs7')
ciphertext = cipher.encrypt(padded)

cipher = AES.new(secret_key, AES.MODE_CBC, iv)
result = cipher.decrypt(ciphertext)
assert result == padded
assert unpad(result, AES.block_size, style='pkcs7') == plaintext

It’s usually better to wrap such behavior in functions with simple interfaces. Here’s a possible approach:

from Crypto.Cipher import AES
from Crypto import Random
from padding import pad, unpad

padding_style = 'pkcs7'

def encrypt(plaintext):
    iv = create_iv()
    cipher = create_cipher(iv)
    padded = pad(plaintext, AES.block_size, style=padding_style)
    ciphertext = cipher.encrypt(padded)
    return iv, ciphertext

def decrypt(iv, ciphertext):
    cipher = create_cipher(iv)
    padded = cipher.decrypt(ciphertext)
    return unpad(padded, AES.block_size, style=padding_style)

def create_cipher(iv):
    return AES.new(secret_key, AES.MODE_CBC, iv)

def create_iv():
    return Random.new().read(16)


plaintext = "Don't forget to save the secret key and the iv."
iv, ciphertext = encrypt(plaintext)
assert decrypt(iv, ciphertext) == plaintext

Technical notes

Block ciphers like AES operate on plaintext chunks of fixed length, meaning that the cipher algorithm can only encrypt one such chunk of data. To encrypt a plaintext longer than the block length (which is 128 bits for AES), you must employ some scheme for encrypting each successive block.

The obvious and straightforward way is to simply divide the plaintext into correctly-sized blocks and encrypt each separately. This is called ECB mode (for Electronic Codebook). ECB is a poor choice for most applications, because for any block of plaintext, it always produces identical ciphertext. It amounts to a giant substitution cipher, causing any structure and repetition in your plaintext to become evident.

Other modes prevent this problem by combining each block of plaintext with a value that has an extremely high chance of being different every time, so that the ciphertext block is always different. These additional values must be reproducible or predictable, of course, or there would be no way to decrypt the message. Several of the modes involve feeding some form of block 1 into the encryption of block 2, and so forth. Another uses a counter to produce the successive values for combination with the plaintext blocks. Some modes require padding for the last block, some don’t.

CBC (for Cipher Block Chaining), a very commonly used mode, is one of those that requires padding the last block. It also has some known problems, and cryptanalysts recommend avoiding it unless required to interact with existing systems. But when you have to use it, you have to use padding.

PyCrypto’s padding module implements three padding schemes:

If you’re using CBC for interoperability reasons, the system you’re integrating with is likely to be using one of these schemes, making the choice obvious. If you’re in a situation where you must choose a padding scheme, you’re probably also in a situation where you can choose not to use CBC, opting instead for a better mode. Read Colin Percival’s “Cryptographic Right Answers” post for recommendations from a professional.

PKCS #7 is specified quite reasonably clearly in what amounts to a footnote in an RFC. You may find Wikipedia’s description a bit easier to read, though.

ANSI X.923 was presumably defined in an ANSI publication, but I cannot find the original document anywhere. This page has a simple explanation.

As this Stack Overflow answer explains,

ANSI x.923 padding and the padding used within PKCS#7 are functionally equivalent. The last byte of the padding stream is the length of the padding stream. The difference is the value of the other bytes, which are all 0x00 in ANSI, and all the identical to the last byte in PKCS#7.

ISO 7816-4 was defined as part of a smart card standard. It’s value, again from that Stack Overflow answer, is that

it is more flexible and can technically support an unlimited length of padding bytes, however removal of the padding can be more computationally intensive… The padding scheme is also known as bit padding, as it simply places a single 1 bit after the plaintext, followed by 0 valued bits up to the block size; it therefore works for bit oriented protocols as well