Welcome To Golang By Example

Golang Regex: Optional Operator or question mark (?) in regular expression

Overview

Question Mark is the optional operator in regex. This means that it optionally matches the preceding character before the question mark

Eg. 

abcd?

This will match both “abc” and “abcd”.

Program

Let’s see an example for the same.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("abcd?")

	match := sampleRegexp.Match([]byte("abc"))
	fmt.Printf("For abc: %t\n", match)

	match = sampleRegexp.Match([]byte("abcd"))
	fmt.Printf("For abcd: %t\n", match)

}

Output

For abc: true
For abcd: true

Several characters can also be made optional by closing them with parentheses and then putting the question mark behind them. Eg

abc(de)?
package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("abc(de)?")

	match := sampleRegexp.Match([]byte("abc"))
	fmt.Printf("For abc: %t\n", match)

	match = sampleRegexp.Match([]byte("abcde"))
	fmt.Printf("For abcde: %t\n", match)

	match = sampleRegexp.Match([]byte("abcd"))
	fmt.Printf("For abcd: %t\n", match)
}

Output

For abc: true
For abcde: true
For abcd: true

It matches “abc”  and “abcde”

It also matched “abcd”. You must be wondering why it matched “abcd”.

If the given string or text contains the regex as a substring then also it gives a match. That is why it gave a match as “abcd” contains “abc” as a substring which is a match for regex. If we want to do full string matches then we need to use the anchor characters at the start and end of the regex.  Caret anchor character will be used at the start and Dollar anchor character will be used at the end.

Let’s see an example for the same.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("^abc(de)?$")

	match := sampleRegexp.Match([]byte("abc"))
	fmt.Printf("For abc: %t\n", match)

	match = sampleRegexp.Match([]byte("abcde"))
	fmt.Printf("For abcde: %t\n", match)

	match = sampleRegexp.Match([]byte("abcd"))
	fmt.Printf("For abcd: %t\n", match)
}

Output

For abc: true
For abcde: true
For abcd: false

Question Mark Operator is non-lazy

Question Mark Operator is non-lazy or greedy. This means that it will match the optional pattern first. 

In the regular expression world, non-lazy(or sometimes also called greedy)  means trying to match as much as possible. While lazy(or sometimes also called non-greedy) means matching only as much as needed. 

For example for the given regex

https? 

If you try to match the below input string

Better is https

Then there are two options

Then it will always match https and never http. The reason is that it is non-lazy.  Even when it matches http it doesn’t stop and tries to match the optional character as well. If the optional character matches then it returns https otherwise it returns http

Let’s see an example for the same

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("https?")

	match := sampleRegexp.Find([]byte("Better is https"))
	fmt.Printf("Match: %s\n", match)
}

Output

Match: https

In the above program, we have used the Find function which returns the actual substring that matched with a regex. As you can notice in the output it matched “https” and not “http” because the question mark operator is non-lazy

About the double question mark operator

It is lazy.  As soon it finds the first match, it does not try to match further. So for the above text, it will always give result as “http” and never “https”.

Let’s see an example

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("https??")

	match := sampleRegexp.Find([]byte("Better is https"))
	fmt.Printf("Match: %s\n", match)
}

Output

Match: http

Question Mark after quantifiers

The question mark ‘?’ after quantifiers is lazy or non-greedy. The quantifiers could be

See below example

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("http(s+?)")

	match := sampleRegexp.Find([]byte("Better is httpsss"))
	fmt.Printf("Match: %s\n", match)

	sampleRegexp = regexp.MustCompile("http(s*?)")

	match = sampleRegexp.Find([]byte("Better is httpsss"))
	fmt.Printf("Match: %s\n", match)
}

Output

Match: https
Match: http

In the above program, we have two cases

In both the case the input string is

Better is httpsss

In the first case, we used a question mark after the plus operator in the regex

"http(s+?)"

It gives below match as “https” instead of “httpsss” as question mark when used after the Plus operator is non-greedy

In the second case, we used a question mark after the asterisk sign in the regex

"http(s*?)"

It gives match as “http” instead of “httpsss” as the question mark when used after the Asterisk operator is non-greedy

Let’s see another example

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegexp := regexp.MustCompile("(a+?)(a*)")

	match := sampleRegexp.FindStringSubmatch("aaaaaaa")
	fmt.Printf("Match: %s Length: %d\n", match, len(match))

	sampleRegexp = regexp.MustCompile("(a*?)(a*)")

	match = sampleRegexp.FindStringSubmatch("aaaaaaa")
	fmt.Printf("Match: %s Length: %d\n", match, len(match))
}

Output

Match: [aaaaaaa a aaaaaa] Length: 3
Match: [aaaaaaa  aaaaaaa] Length: 3

In the above program, again we have two cases

In the first case, we have regex with two capture groups

(a+?)(a*)

The first capture group gives a single match of ‘a’ while the second capture group gives the rest. This show question mark operator used after the Plus operator is non-greedy or lazy

In the second case, we have regex again with two capture groups

(a*?)(a*)

The first capture group gives a zero match of ‘a’ while the second capture group gives the rest. This show question mark operator used after Asterisk operator is non-greedy or lazy

This is all about the question mark operator in Go. Hope you have liked this article. Please share feedback in the comments.

Also, check out our Golang advance tutorial Series – Golang Advance Tutorial