As you can see here, with lazy quantification the engine backtracks forward after each character which is not followed by ">".Īs you can see, 20 fewer steps are required, and there is no backtracking. The text ok means the engine completed a successful, zero-length match, and backtrack shows where the engine backtracks to the last choice point.
Note: If you're reading this in a feed reader or aggregator which strips out inline styles, see the original post, which should make things much easier to follow. I'll use the RegexBuddy debugger style to illustrate each step. So what exactly is different about how a backtracking, leftmost-first regex engine (your traditional NFA engine type, which describes most modern, Perl-derivative flavors) goes about matching our subject text when working with these two regexes? Let's first take a look at what happens with. Hand-optimization of regex patterns largely revolves around the ideas of reducing backtracking and the steps which are potentially required to match or fail at any given character position, and here we've reduced both cases to the absolute minimum. Defined this way, it requires no backtracking in the case of any successful match, and only one backtracking step in the case of any unsuccessful match. However, the latter regex (which uses a greedy quantifier) is more efficient, because it describes what the user really means: match the character "", and finally, match the character ">". The only difference is how the regex engine goes about generating the match. Hence, the previously mentioned misconception stems from that fact that lazy quantifiers often result in much less backtracking when combined with long subject strings and the (mis)use of overly flexible regex tokens such as the dot.Ĭonsider the following simple example: When the regexes and ] *> are applied to the subject string "", they are functionally equivalent. This seeming contradiction is due to the fact that many people rely on backtracking to compensate for the impreciseness of their patterns, often without realizing this is what they're doing. That's generally not true, but with an important qualifier: in practice, lazy quantifiers often are faster. A common misconception about regular expression performance is that lazy quantifiers (also called non-greedy, reluctant, minimal, or ungreedy) are faster than their greedy equivalents.