Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

Jupyter notebook text-to-integer-conversion/text-conversion.ipynb

625 views
Kernel: Python 2

In this Jupyter Notebook we solve the problem of converting integers to base-bb arrays, and vice versa. By a base-bb array, we mean the array of base-bb digits that defines the integer when expressed in base-bb notation.

Converting integer to base-bb

Here is a Python function that converts a given nonnegative integer xx into a base-bb array [dN,dN−1,…,d2,d1,d0][d_N, d_{N-1}, \dots, d_2, d_1, d_0] such that x=∑k=0Ndkbk,x = \sum_{k=0}^N d_k b^k, where 0≤dk<b0 \le d_k < b for each k=0,1,…,Nk = 0, 1, \dots, N. For example, if b=2b=2 then the array gives the binary representation of xx.

def int_to_base_array(b,x): ans = [] while x > 0: ans = [x % b] + ans # prepend digit to ans x = x//b return ans

Let's test the code.

base_array = int_to_base_array # shorter alias for typing convenience base_array(2,7)
[1, 1, 1]
base_array(10,357)
[3, 5, 7]
base_array(5,357)
[2, 4, 1, 2]
# better CHECK the last result 2*5**3 + 4*5**2 + 1*5**1 + 2*5**0
357

Okay, our code seems to be working.

Converting from base-bb to integer

Now we want to code a function that reverses the above. Given an array of base-bb digits, we want to convert back to the original integer xx.

def base_array_to_int(b,digits): ans = 0 for r in digits: ans = ans*b + r return ans

As usual, we had better test it.

to_int = base_array_to_int # shorter alias for typing convenience to_int(5, [2,4,1,2])
357
to_int(10,[3, 5, 7])
357
to_int(2, [1, 1, 1])
7

Everything checks. This seems to be working.

Converting a text block to an integer (part one)

Next, I want to apply the functions defined above to convert a block of text to an integer, and from the integer back into text again. Such encodings are of crucial importance in cryptography. First we look at a childish solution, based on stipping all puncuation and white space from the text, and assuming only lower case letters of the English alphabet. Since there are 26 letters in the alphabet, we can work in base-26.

def text2int(text): alphabet = 'abcdefghijklmnopqrstuvwxyz' base26array = [alphabet.index(char) for char in text] return base_array_to_int(26, base26array)

For example, let's convert the string helloworld to an integer using base-26 encoding.

text2int('helloworld')
38933758647189

Next we define the inverse function to go backwards again.

def int2text(x): alphabet = 'abcdefghijklmnopqrstuvwxyz' base26array = int_to_base_array(26, x) text = '' for digit in base26array: text = text + alphabet[digit] return text

This should convert the integer back into text.

int2text(38933758647189)
'helloworld'

Converting a text block to an integer (part two)

But the previous solution is childish, since there is no good reason to avoid punctuation and white space. Also, we ought to be able to distinguish between upper and lower case letters, and we should be able to handle special characters. In other words, we are now looking for a robust, industrial-strength, solution. There are many good ways to solve this problem. Here I will use the ASCII encodings of characters used in modern digital computers. If you don't know what ASCII means, then look it up!

All we really need to know about ASCII is that it is an alphabet of 256 characters, in which all the keyboard characters appear. The number 256=28256 = 2^8 appears here precisely because our computers are designed to work with bytes, which are bit strings of 8-bits. There are 256 possible bit strings of length 8. The second thing we need to know is that Python has a builtin bytearray function that converts text to an array of its numerical ASCII representations, character by character.

def text2int(text): barray = list(bytearray(text)) return base_array_to_int(256, barray)
text2int("Now is the time to worry! Indeed, 'tis!")
2556412079982006913653956174919156864365338629148352603563830428006298034728105396100932399905L
def int2text(x): base256array = int_to_base_array(256, x) barray = bytearray(base256array) # coerce to a bytearray return str(barray)
int2text(2556412079982006913653956174919156864365338629148352603563830428006298034728105396100932399905L)
"Now is the time to worry! Indeed, 'tis!"

We can understand how the function definitions work by running them, one step at a time, on some test data. For example, let's suppose we have the text string "6 = half Dozen!". Let's run the text2int definition on this text string, one step at a time, showing all the intermediate steps.

# text2int simulation text = "6 = half Dozen!" b = bytearray(text); b
bytearray(b'6 = half Dozen!')
barray = list(b); barray
[54, 32, 61, 32, 104, 97, 108, 102, 32, 68, 111, 122, 101, 110, 33]