Welcome To Golang By Example

Index character in a string in Go (Golang)

In Golang string is a sequence of bytes. A string literal actually represents a UTF-8 sequence of bytes. In UTF-8, ASCII characters are single-byte corresponding to the first 128 Unicode characters. All other characters are between 1 -4 bytes. Due to this, it is not possible to index a character in a string.

 For example, see below the program and its output.

package main

import "fmt"

func main() {
    sample := "ab£c"
    for i := 0; i < 4; i++ {
        fmt.Printf("%c\n", sample[i])
    }
    fmt.Printf("Length is %d\n", len(sample))
}

Output:

a
b
Â
£
Length is 5

As you might have noticed, it prints different characters than expected and length is also 5 instead of 4. Why is that? To answer please remember we said that a string is essentially a slice of bytes. Let's print that slice of bytes using

sample := "ab£c"
fmt.Println([]byte(sample))

The output will be

[97 98 194 163 99]

This is the mapping of each of character to its byte sequence. As you can notice a, b, c take each 1 byte but £ takes two bytes. That is why the length of the string is 5 and not 4

a97
b98
£194, 163
c99

Then how we can index into a string. This is where rune data type comes into picture In GO, rune data type represents a Unicode point. You can learn more about rune here - https://golangbyexample.com/understanding-rune-in-golang

Once a string is converted to an array of rune then it is possible to index a character in that array of rune. See below code

package main

import "fmt"

func main() {
    sample := "ab£c"
    sampleRune := []rune(sample)

    fmt.Printf("%c\n", sampleRune[0])
    fmt.Printf("%c\n", sampleRune[1])
    fmt.Printf("%c\n", sampleRune[2])
    fmt.Printf("%c\n", sampleRune[3])
}

Output:

a
b
£
c

Also to mention you can use range operator to iterate over all Unicode characters in the string, but to index character in a string, you can convert it to an array of rune.