Implement regular expression matching with support for '.'
and '*'
.
'.' Matches any single character. '*' Matches zero or more of the preceding element. The matching should cover the entire input string (not partial). The function prototype should be: bool isMatch(const char *s, const char *p) Some examples: isMatch("aa","a") → false isMatch("aa","aa") → true isMatch("aaa","aa") → false isMatch("aa", "a*") → true isMatch("aa", ".*") → true isMatch("ab", ".*") → true isMatch("aab", "c*a*b") → true
Intuition
\nIf there were no Kleene stars (the *
wildcard character for regular expressions), the problem would be easier - we simply check from left to right if each character of the text matches the pattern.
When a star is present, we may need to check many different suffixes of the text and see if they match the rest of the pattern. A recursive solution is a straightforward way to represent this relationship.
\nAlgorithm
\nWithout a Kleene star, our solution would look like this:
\n\nIf a star is present in the pattern, it will be in the second position . Then, we may ignore this part of the pattern, or delete a matching character in the text. If we have a match on the remaining strings after any of these operations, then the initial inputs matched.
\n\n\nComplexity Analysis
\nTime Complexity: Let be the lengths of the text and the pattern respectively. In the worst case, a call to match(text[i:], pattern[2j:])
will be made times, and strings of the order and will be made. Thus, the complexity has the order . With some effort outside the scope of this article, we can show this is bounded by .
Space Complexity: For every call to match
, we will create those strings as described above, possibly creating duplicates. If memory is not freed, this will also take a total of space, even though there are only order unique suffixes of and that are actually required.
Intuition
\nAs the problem has an optimal substructure, it is natural to cache intermediate results. We ask the question : does and match? We can describe our answer in terms of answers to questions involving smaller strings.
\nAlgorithm
\nWe proceed with the same recursion as in Approach #1, except because calls will only ever be made to match(text[i:], pattern[j:])
, we use to handle those calls instead, saving us expensive string-building operations and allowing us to cache the intermediate results.
Top-Down Variation\n
\nBottom-Up Variation
\n\nComplexity Analysis
\nTime Complexity: Let be the lengths of the text and the pattern respectively. The work for every call to dp(i, j)
for ; is done once, and it is work. Hence, the time complexity is .
Space Complexity: The only memory we use is the boolean entries in our cache. Hence, the space complexity is .
\nAnalysis written by: @awice
\n