ubuntu2004
Strings I - working with text
In biology, many types of data are predominantly text, for example, DNA and protein sequences, survey replies, species names or gene identifiers.
Python refers to text as strings - as in a string of characters. A string can be any length and made up of any letters, numbers and punctuation.
We've already used a string: "Hello, world!", in Notebook 2.
A string must start and end with either single quotes or double quotes, as long as both quotes are the same. So both
are identical strings.
Length of a string
The number of characters in a string is called its length. This includes all letters and any spaces and punctuation.
The function len()
returns the length of a string as demonstrated in the following code.
String variables
Just like numbers, we can assign strings to variables so that we can store them in memory, recall them and modify them.
The code below shows how to assign a string to a variable called sentence
. As for a number variable, we can print out the value and type of a string variable. A string variable has type str
.
Notice that when you print a string the quotes are not included.
Also notice that the variable sentence
is not in quotes inside the print()
function:
This is because sentence
is a string variable and not a string. In other words, sentence
is a variable with value 'Hello, world!'
and type string. Whereas 'sentence'
is just a string.
The empty string
The empty string is simply a pair of quotes, either single or double, with nothing in between them.
Adding to the end of a string: Concatenation
Why is an empty string useful? Sometimes we want to build up a string one letter or word at a time.
There are two ways of doing this: with the +
operator or the +=
operator. Both are shown in the following code.
Notice the explicit space character in ' Potter'
so that the two names are separated by a space.
Input
To ask the user for input we use the input()
function.
Python asks for input with the prompt "Enter a name:". It assigns your input to the string variable called name
which it then prints in an f-string.
Casting: Converting a string to a number
Say you wanted to input two numbers and print their product.
2. Run it and enter two numbers when prompted; any numbers will do.
The code produced an error, in this case a TypeError
. A TypeError means you are trying to do something with the wrong type of variable. Remember, as well as a variable having a name and a value it also has a type.
So far we've seen integer (int), decimal (float) and string (str) variables.
The code above doesn't work because the input()
function only returns a string variable. Even though you entered a number it is treated as a string. For example, "3.1" is a string because it is enclosed by double quotes.
This means the line
is trying to multiply two strings together rather than two numbers. And that we cannot do.
To fix this we need to convert the strings assigned to number1
and number2
to floats. This is called casting: the conversion of one data type to another. To do this we use the float()
function.
Newlines and tabs: Escape characters
Sometimes text includes invisible characters that indicate how the text should be formatted - such as tabs or new line characters. Since it would be difficult to see these characters in our code, we use escape characters in Python strings instead. Escape characters are characters prefixed with a backslash, for example tabs and newline characters are represented by the following escape characters:
format type | escape character |
---|---|
new line | \n |
tab | \t |