Threats of Using Regular Expressions in JavaScript | by Dulanka Karunasena | Sep, 2021


  • Deterministic Finite Automaton (DFA) — Checks a character in a string only once.
  • Nondeterministic Finite Automaton (NFA) — Checks a character multiple times until the best match is found.

JavaScript uses the NFA approach in its regex engine, and this NFA behavior causes catastrophic backtracking.

To get a better understanding, let’s consider the following regex.

/(g|i+)+t/

This regex seems simple. But don’t underestimate, it can cost you a lot 😯. So first, let’s understand the meaning behind this regex.

  • (g|i+) — This is a group that checks if a given string starts with ‘g’ or one or more occurrences of ‘i’.
  • The next ‘+’ will check for one or more appearances of the previous group.
  • The string should end with the letter ‘t.’

The following texts will evaluate as valid under the above regex.

git
giit
gggt
gigiggt
igggt

Now let’s check the time taken to execute the above regex on a valid string. I will use the console.time() method.

Valid text

Here we can see that the execution is pretty fast, even though the string is a bit long.

But you will be surprised when you see the time taken to validate an invalid text.

In the below example, the string ends with a ‘v’ which is invalid according to the regex. Therefore, It took around 429 milliseconds, approximately 400 times slower than the execution time taken to check a valid string.

Invalid text

The main reason for this performance difference is the NFA algorithm used by JavaScript.

JavaScript Regex engine validates a sequence of characters on its first successful validation attempt and continues. If it fails at any particular position, it will backtrack to a previous position and find an alternative path.

When backtracking becomes too complex, the algorithm will consume more computational power, resulting in catastrophic backtracking.

Note: To understand the complexity of backtracking, you can visit regex101.com and test your regex. regex101.com indicates that ten steps are needed to validate giiiit using the above-discussed regex, while it took 189 steps to validate giiiiv.



Source link

Latest articles

Related articles

Leave a reply

Please enter your comment!
Please enter your name here