There are two regular expression algorithms out there,
- Deterministic Finite Automaton (DFA) — Checks a character in a string only once.
- Nondeterministic Finite Automaton (NFA) — Checks a character multiple times until the best match is found.
To get a better understanding, let’s consider the following regex.
This regex seems simple. But don’t underestimate, it can cost you a lot 😯. So first, let’s understand the meaning behind this regex.
- (g|i+) — This is a group that checks if a given string starts with ‘g’ or one or more occurrences of ‘i’.
- The next ‘+’ will check for one or more appearances of the previous group.
- The string should end with the letter ‘t.’
The following texts will evaluate as valid under the above regex.
Now let’s check the time taken to execute the above regex on a valid string. I will use the
Here we can see that the execution is pretty fast, even though the string is a bit long.
But you will be surprised when you see the time taken to validate an invalid text.
In the below example, the string ends with a ‘v’ which is invalid according to the regex. Therefore, It took around 429 milliseconds, approximately 400 times slower than the execution time taken to check a valid string.
When backtracking becomes too complex, the algorithm will consume more computational power, resulting in catastrophic backtracking.
Note: To understand the complexity of backtracking, you can visit regex101.com and test your regex. regex101.com indicates that ten steps are needed to validate
giiiitusing the above-discussed regex, while it took 189 steps to validate