Welcome To Golang By Example

Golang Regex: Understanding dot ‘.’ character

Overview

Dot ‘.’ character is one of the most commonly used metacharacters in the regular expression. It is used to match any character. It can also match a new line if a specific flag is added to the regular expression which we going to discuss later. By default, it doesn’t match a new line.

Before looking into the regex itself and usage of Dot ‘.’ character, let’s look at some basic functions or methods provided by Go to do a regex match.

MatchCompile Function

https://golang.org/pkg/regexp/#MustCompile . Below is the signature of the function

func MustCompile(str string) *Regexp

We first compile the given regex string using the MustCompile function. This function panics if the given regex is not valid. After it can successfully compile the given regex, it returns the instance of regexp struct.

sampleRegexp := regexp.MustCompile("some_regular_expression")

Match Method

https://golang.org/pkg/regexp/#Regexp.Match

Below is the signature of the method

func (re *Regexp) Match(b []byte) bool

We can call the Match method on the regexp struct instance to match the given pattern with the regex. It returns true if the regex matches with the input string otherwise false. We need to pass in bytes of the input string to this method.

match := sampleRegexp.Match([]byte("some_string"))

We will see these two functions in action later in the examples.

Now let’s see a simple program for Dot ‘.’ character

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile(".")

	match := sampleRegexp.Match([]byte("a"))
	fmt.Printf("For a: %t\n", match)

	match = sampleRegexp.Match([]byte("b"))
	fmt.Printf("For b: %t\n", match)

	match = sampleRegexp.Match([]byte("ab"))
	fmt.Printf("For ab: %t\n", match)

	match = sampleRegexp.Match([]byte(""))
	fmt.Printf("For empty string: %t\n", match)
}

Output

For a: true
For b: true
For ab: true
For empty string: false

In the above program, we have a simple regex containing only one dot character.

sampleRegexp := regexp.MustCompile(".")

It matches below characters and string.

a
b
ab

It matches ab because by default the regex doesn’t do the match the full string unless we use the anchor characters (Caret and Dollar character). That is why it matches the first character ‘a’ in ‘ab’ and reports a match.

It doesn’t match an empty string.

Let’s see another example where we have two dots in the regex.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("..")
	match := sampleRegexp.Match([]byte("ab"))
	fmt.Printf("For ab: %t\n", match)

	match = sampleRegexp.Match([]byte("ba"))
	fmt.Printf("For ba: %t\n", match)

	match = sampleRegexp.Match([]byte("abc"))
	fmt.Printf("For abc: %t\n", match)

	match = sampleRegexp.Match([]byte("a"))
	fmt.Printf("For a: %t\n", match)
}

Output

For ab: true
For ba: true
For abc: true
For a: false

In the above program, we have a simple regex containing two dots.

sampleRegexp := regexp.MustCompile("..")

It will match any given string which has at least two characters as a substring.

That is why it gives a match for

ab
ba
abc

and doesn’t give a match for

a

The dot ‘.’ as we mentioned before as well doesn’t match the new line. But the default behavior can be changed by adding a set of flags to the beginning of the regular expression. The flag we need to add to the beginning of regex is:

(?s)

Let’s see a program for the same

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile(".")

	match := sampleRegexp.Match([]byte("\n"))
	fmt.Printf("For \\n: %t\n", match)

	sampleRegexp = regexp.MustCompile("(?s).")

	match = sampleRegexp.Match([]byte("\n"))
	fmt.Printf("For \\n: %t\n", match)
}

Output

For \n: false
For \n: true
sampleRegexp := regexp.MustCompile(".")

and

sampleRegexp = regexp.MustCompile("(?s).")

In the second regex, we have added the additional flag. That is why it gives a match for a new line while the first regex without a flag doesn’t give a match. 

Using Dot as a literal character

If you want to use Dot ‘.’ as a literal character, we need to escape it with a backslash. Once escaped it will match a literal dot character.  For example, if we want to match the literal below string or text

a.b

Then the regex for the same will be

a\.b

Here is the program for the same

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("a\\.b")

	match := sampleRegexp.Match([]byte("a.b"))

	fmt.Printf("For a.b string: %t\n", match)
}

Output

For a.b string: true

Dot character inside a character class

Dot or ‘.’ is treated as a literal character inside the square brackets or character class. It doesn’t need to be escaped inside that. Let’s see a working program for the same as well

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("[.]")
	match := sampleRegexp.Match([]byte("."))

	fmt.Println(match)

}

Output

true
Also, check out our Golang advance tutorial Series – Golang Advance Tutorial