ubuntu2004
Strings II - indicies and substrings
Accessing a character in a string
Individual characters in a string can be accessed by specifying the character's index in square brackets after the string's name. The first character has index 0, the second character has index 1, and so on. This can be a bit confusing to start with: just remember that to get the index (0, 1, etc.) subtract 1 from its position (first, second, etc).
For instance, the table below shows the indicies of the characters in the string "Hello, world!". The letter "H" is at index 0, the comma is at index 5, the letter "d" is at index 11 and the exclamation mark is at index 12.
string | H | e | l | l | o | , | W | o | r | l | d | ! | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
position | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th | 10th | 11th | 12th | 13th |
index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
index from end | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |
It is also possible to specify characters based on their position relative to the end of the string: the last character can be accessed using index -1, the second to last at index -2, etc., as shown above.
Accessing several characters: Substrings
Parts of strings (known as substrings) can be accessed using something called the slice operator. Instead of a single index in the square brackets, we use two numbers to refer to the start and end indicies like so: [start index : end index]
.
The character at the location of the stop index is not included, so [3:7]
will access characters from index 3 to 6 (or the fourth to seventh characters).
Where no number is specified before the colon e.g. [:5]
, this will automatically start from the first character of the string.
Where no number is specified after the colon e.g. [5:]
, this will automatically end at the last character of the string.
If no numbers are specified then all characters are used.
Stepping over characters
The last example using [:]
might seem pointless - why not just do print( sentence )
instead? The reason why this is useful will become clear in a moment.
Say we wanted every third character in a string. Then we can use what's called the extended slice notation: [start index : stop index : step size]
.
To access every third character starting from index 2 and stopping at index 9 we would write
where step size is 3.
To access every second character starting from the first character and stopping at the last character we would write
where step size is 2.
Reversing a string
Now here's the clever bit. If the step size is negative we go through the string backwards. Which means that the notation
reverses the string because step size is -1: We step backwards accessing every character. This is a quick and easy way of reversing strings; something we often want to do with DNA and RNA sequences.