Jupyter notebook text-to-integer-conversion/text-conversion.ipynb
In this Jupyter Notebook we solve the problem of converting integers to base- arrays, and vice versa. By a base- array, we mean the array of base- digits that defines the integer when expressed in base- notation.
Converting integer to base-
Here is a Python function that converts a given nonnegative integer into a base- array such that where for each . For example, if then the array gives the binary representation of .
Let's test the code.
Okay, our code seems to be working.
Converting from base- to integer
Now we want to code a function that reverses the above. Given an array of base- digits, we want to convert back to the original integer .
As usual, we had better test it.
Everything checks. This seems to be working.
Converting a text block to an integer (part one)
Next, I want to apply the functions defined above to convert a block of text to an integer, and from the integer back into text again. Such encodings are of crucial importance in cryptography. First we look at a childish solution, based on stipping all puncuation and white space from the text, and assuming only lower case letters of the English alphabet. Since there are 26 letters in the alphabet, we can work in base-26.
For example, let's convert the string helloworld
to an integer using base-26 encoding.
Next we define the inverse function to go backwards again.
This should convert the integer back into text.
Converting a text block to an integer (part two)
But the previous solution is childish, since there is no good reason to avoid punctuation and white space. Also, we ought to be able to distinguish between upper and lower case letters, and we should be able to handle special characters. In other words, we are now looking for a robust, industrial-strength, solution. There are many good ways to solve this problem. Here I will use the ASCII encodings of characters used in modern digital computers. If you don't know what ASCII means, then look it up!
All we really need to know about ASCII is that it is an alphabet of 256 characters, in which all the keyboard characters appear. The number appears here precisely because our computers are designed to work with bytes, which are bit strings of 8-bits. There are 256 possible bit strings of length 8. The second thing we need to know is that Python has a builtin bytearray
function that converts text to an array of its numerical ASCII representations, character by character.
We can understand how the function definitions work by running them, one step at a time, on some test data. For example, let's suppose we have the text string "6 = half Dozen!"
. Let's run the text2int
definition on this text string, one step at a time, showing all the intermediate steps.