In Golang string is a sequence of bytes. A string literal actually represents a UTF-8 sequence of bytes. In UTF-8, ASCII characters are single-byte corresponding to the first 128 Unicode characters. All other characters are between 1 -4 bytes. Due to this, it is not possible to index a character in a string.
For example, see below the program and its output.
package main
import "fmt"
func main() {
sample := "ab£c"
for i := 0; i < 4; i++ {
fmt.Printf("%c\n", sample[i])
}
fmt.Printf("Length is %d\n", len(sample))
}
Output:
a
b
Â
£
Length is 5
As you might have noticed, it prints different characters than expected and length is also 5 instead of 4. Why is that? To answer please remember we said that a string is essentially a slice of bytes. Let's print that slice of bytes using
sample := "ab£c"
fmt.Println([]byte(sample))
The output will be
[97 98 194 163 99]
This is the mapping of each of character to its byte sequence. As you can notice a, b, c take each 1 byte but £ takes two bytes. That is why the length of the string is 5 and not 4
a | 97 |
b | 98 |
£ | 194, 163 |
c | 99 |
Then how we can index into a string. This is where rune data type comes into picture In GO, rune data type represents a Unicode point. You can learn more about rune here - https://golangbyexample.com/understanding-rune-in-golang
Once a string is converted to an array of rune then it is possible to index a character in that array of rune. See below code
package main
import "fmt"
func main() {
sample := "ab£c"
sampleRune := []rune(sample)
fmt.Printf("%c\n", sampleRune[0])
fmt.Printf("%c\n", sampleRune[1])
fmt.Printf("%c\n", sampleRune[2])
fmt.Printf("%c\n", sampleRune[3])
}
Output:
a
b
£
c
Also to mention you can use range operator to iterate over all Unicode characters in the string, but to index character in a string, you can convert it to an array of rune.