Table of Contents
Overview
Dot ‘.’ character is one of the most commonly used metacharacters in the regular expression. It is used to match any character. It can also match a new line if a specific flag is added to the regular expression which we going to discuss later. By default, it doesn’t match a new line.
Before looking into the regex itself and usage of Dot ‘.’ character, let’s look at some basic functions or methods provided by Go to do a regex match.
MatchCompile Function
https://golang.org/pkg/regexp/#MustCompile . Below is the signature of the function
func MustCompile(str string) *Regexp
We first compile the given regex string using the MustCompile function. This function panics if the given regex is not valid. After it can successfully compile the given regex, it returns the instance of regexp struct.
sampleRegexp := regexp.MustCompile("some_regular_expression")
Match Method
https://golang.org/pkg/regexp/#Regexp.Match
Below is the signature of the method
func (re *Regexp) Match(b []byte) bool
We can call the Match method on the regexp struct instance to match the given pattern with the regex. It returns true if the regex matches with the input string otherwise false. We need to pass in bytes of the input string to this method.
match := sampleRegexp.Match([]byte("some_string"))
We will see these two functions in action later in the examples.
Now let’s see a simple program for Dot ‘.’ character
package main
import (
"fmt"
"regexp"
)
func main() {
sampleRegexp := regexp.MustCompile(".")
match := sampleRegexp.Match([]byte("a"))
fmt.Printf("For a: %t\n", match)
match = sampleRegexp.Match([]byte("b"))
fmt.Printf("For b: %t\n", match)
match = sampleRegexp.Match([]byte("ab"))
fmt.Printf("For ab: %t\n", match)
match = sampleRegexp.Match([]byte(""))
fmt.Printf("For empty string: %t\n", match)
}
Output
For a: true
For b: true
For ab: true
For empty string: false
In the above program, we have a simple regex containing only one dot character.
sampleRegexp := regexp.MustCompile(".")
It matches below characters and string.
a
b
ab
It matches ab because by default the regex doesn’t do the match the full string unless we use the anchor characters (Caret and Dollar character). That is why it matches the first character ‘a’ in ‘ab’ and reports a match.
It doesn’t match an empty string.
Let’s see another example where we have two dots in the regex.
package main
import (
"fmt"
"regexp"
)
func main() {
sampleRegexp := regexp.MustCompile("..")
match := sampleRegexp.Match([]byte("ab"))
fmt.Printf("For ab: %t\n", match)
match = sampleRegexp.Match([]byte("ba"))
fmt.Printf("For ba: %t\n", match)
match = sampleRegexp.Match([]byte("abc"))
fmt.Printf("For abc: %t\n", match)
match = sampleRegexp.Match([]byte("a"))
fmt.Printf("For a: %t\n", match)
}
Output
For ab: true
For ba: true
For abc: true
For a: false
In the above program, we have a simple regex containing two dots.
sampleRegexp := regexp.MustCompile("..")
It will match any given string which has at least two characters as a substring.
That is why it gives a match for
ab
ba
abc
and doesn’t give a match for
a
The dot ‘.’ as we mentioned before as well doesn’t match the new line. But the default behavior can be changed by adding a set of flags to the beginning of the regular expression. The flag we need to add to the beginning of regex is:
(?s)
Let’s see a program for the same
package main
import (
"fmt"
"regexp"
)
func main() {
sampleRegexp := regexp.MustCompile(".")
match := sampleRegexp.Match([]byte("\n"))
fmt.Printf("For \\n: %t\n", match)
sampleRegexp = regexp.MustCompile("(?s).")
match = sampleRegexp.Match([]byte("\n"))
fmt.Printf("For \\n: %t\n", match)
}
Output
For \n: false
For \n: true
sampleRegexp := regexp.MustCompile(".")
and
sampleRegexp = regexp.MustCompile("(?s).")
In the second regex, we have added the additional flag. That is why it gives a match for a new line while the first regex without a flag doesn’t give a match.
Using Dot as a literal character
If you want to use Dot ‘.’ as a literal character, we need to escape it with a backslash. Once escaped it will match a literal dot character. For example, if we want to match the literal below string or text
a.b
Then the regex for the same will be
a\.b
Here is the program for the same
package main
import (
"fmt"
"regexp"
)
func main() {
sampleRegexp := regexp.MustCompile("a\\.b")
match := sampleRegexp.Match([]byte("a.b"))
fmt.Printf("For a.b string: %t\n", match)
}
Output
For a.b string: true
Dot character inside a character class
Dot or ‘.’ is treated as a literal character inside the square brackets or character class. It doesn’t need to be escaped inside that. Let’s see a working program for the same as well
package main
import (
"fmt"
"regexp"
)
func main() {
sampleRegexp := regexp.MustCompile("[.]")
match := sampleRegexp.Match([]byte("."))
fmt.Println(match)
}
Output
true
Also, check out our Golang advance tutorial Series – Golang Advance Tutorial