718. Maximum Length of Repeated Subarray


Given two integer arrays A and B, return the maximum length of an subarray that appears in both arrays.

Example 1:

Input:
A: [1,2,3,2,1]
B: [3,2,1,4,7]
Output: 3
Explanation: 
The repeated subarray with maximum length is [3, 2, 1].

Note:

  1. 1 <= len(A), len(B) <= 1000
  2. 0 <= A[i], B[i] < 100


b'
\n\n

Approach #1: Brute Force with Initial Character Map [Time Limit Exceeded]

\n

Intuition and Algorithm

\n

In a typical brute force, for all starting indices i of A and j of B, we will check for the longest matching subarray A[i:i+k] == B[j:j+k] of length k. This would look roughly like the following psuedocode:

\n
ans = 0\nfor i in [0 .. A.length - 1]:\n    for j in [0 .. B.length - 1]:\n        k = 0\n        while (A[i+k] == B[j+k]): k += 1 #and i+k < A.length etc.\n        ans = max(ans, k)\n
\n

Our insight is that in typical cases, most of the time A[i] != B[j]. We could instead keep a hashmap Bstarts[A[i]] = all j such that B[j] == A[i], and only loop through those in our j loop.

\n

Python

\n
class Solution(object):\n    def findLength(self, A, B):\n        ans = 0\n        Bstarts = collections.defaultdict(list)\n        for j, y in enumerate(B):\n            Bstarts[y].append(j)\n\n        for i, x in enumerate(A):\n            for j in Bstarts[x]:\n                k = 0\n                while i+k < len(A) and j+k < len(B) and A[i+k] == B[j+k]:\n                    k += 1\n                ans = max(ans, k)\n        return ans\n
\n

Java

\n
class Solution {\n    public int findLength(int[] A, int[] B) {\n        int ans = 0;\n        Map<Integer, ArrayList<Integer>> Bstarts = new HashMap();\n        for (int j = 0; j < B.length; j++) {\n            Bstarts.computeIfAbsent(B[j], x -> new ArrayList()).add(j);\n        }\n\n        for (int i = 0; i < A.length; i++) if (Bstarts.containsKey(A[i])) {\n            for (int j: Bstarts.get(A[i])) {\n                int k = 0;\n                while (i+k < A.length && j+k < B.length && A[i+k] == B[j+k]) {\n                    k++\n                }\n                ans = Math.max(ans, k);\n            }\n        }\n        return ans;\n    }\n}\n
\n

Complexity Analysis

\n
    \n
  • \n

    Time Complexity: , where are the lengths of A, B. The worst case is when all the elements are equal.

    \n
  • \n
  • \n

    Space Complexity: , the space used by Bstarts. (Of course, we could amend our algorithm to make this .)

    \n
  • \n
\n
\n

Approach #2: Binary Search with Naive Check [Time Limit Exceeded]

\n

Intuition

\n

If there is a length k subarray common to A and B, then there is a length j <= k subarray as well.

\n

Let check(length) be the answer to the question "Is there a subarray with length length, common to A and B?" This is a function with range that must take the form [True, True, ..., True, False, False, ..., False] with at least one True. We can binary search on this function.

\n

Algorithm

\n

Focusing on the binary search, our invariant is that check(hi) will always be False. We\'ll start with hi = min(len(A), len(B)) + 1; clearly check(hi) is False.

\n

Now we perform our check in the midpoint mi of lo and hi. When it is possible, then lo = mi + 1, and when it isn\'t, hi = mi. This maintains the invariant. At the end of our binary search, hi == lo and lo is the lowest value such that check(lo) is False, so we want lo - 1.

\n

As for the check itself, we can naively check whether any A[i:i+k] == B[j:j+k] using set structures.

\n

Python

\n
class Solution(object):\n    def findLength(self, A, B):\n        def check(length):\n            seen = set(tuple(A[i:i+length])\n                       for i in xrange(len(A) - length + 1))\n            return any(tuple(B[j:j+length]) in seen\n                       for j in xrange(len(B) - length + 1))\n\n        lo, hi = 0, min(len(A), len(B)) + 1\n        while lo < hi:\n            mi = (lo + hi) / 2\n            if check(mi):\n                lo = mi + 1\n            else:\n                hi = mi\n        return lo - 1\n
\n

Java

\n
class Solution {\n    public boolean check(int length, int[] A, int[] B) {\n        Set<String> seen = new HashSet();\n        for (int i = 0; i + length <= A.length; ++i) {\n            seen.add(Arrays.toString(Arrays.copyOfRange(A, i, i+length)));\n        }\n        for (int j = 0; j + length <= B.length; ++j) {\n            if (seen.contains(Arrays.toString(Arrays.copyOfRange(B, j, j+length)))) {\n                return true;\n            }\n        }\n        return false;\n    }\n\n    public int findLength(int[] A, int[] B) {\n        int lo = 0, hi = Math.min(A.length, B.length) + 1;\n        while (lo < hi) {\n            int mi = (lo + hi) / 2;\n            if (check(mi, A, B)) lo = mi + 1;\n            else hi = mi;\n        }\n        return lo - 1;\n    }\n}\n
\n

Complexity Analysis

\n
    \n
  • \n

    Time Complexity: , where are the lengths of A, B. The log factor comes from the binary search. The complexity of our naive check of a given is , as we will create the seen strings with complexity , then search for them with complexity , and our total complexity when performing our check is the addition of these two.

    \n
  • \n
  • \n

    Space Complexity: , the space used by seen.

    \n
  • \n
\n
\n

Approach #3: Dynamic Programming [Accepted]

\n

Intuition and Algorithm

\n

Since a common subarray of A and B must start at some A[i] and B[j], let dp[i][j] be the longest common prefix of A[i:] and B[j:]. Whenever A[i] == B[j], we know dp[i][j] = dp[i+1][j+1] + 1. Also, the answer is max(dp[i][j]) over all i, j.

\n

We can perform bottom-up dynamic programming to find the answer based on this recurrence. Our loop invariant is that the answer is already calculated correctly and stored in dp for any larger i, j.

\n

Python

\n
class Solution(object):\n    def findLength(self, A, B):\n        memo = [[0] * (len(B) + 1) for _ in range(len(A) + 1)]\n        for i in range(len(A) - 1, -1, -1):\n            for j in range(len(B) - 1, -1, -1):\n                if A[i] == B[j]:\n                    memo[i][j] = memo[i+1][j+1]+1\n        return max(max(row) for row in memo)\n
\n

Java

\n
class Solution {\n    public int findLength(int[] A, int[] B) {\n        int ans = 0;\n        int[][] memo = new int[A.length + 1][B.length + 1];\n        for (int i = A.length - 1; i >= 0; --i) {\n            for (int j = B.length - 1; j >= 0; --j) {\n                if (A[i] == B[j]) {\n                    memo[i][j] = memo[i+1][j+1] + 1;\n                    if (ans < memo[i][j]) ans = memo[i][j];\n                }\n            }\n        }\n        return ans;\n    }\n}\n
\n

Complexity Analysis

\n
    \n
  • \n

    Time Complexity: , where are the lengths of A, B.

    \n
  • \n
  • \n

    Space Complexity: , the space used by dp.

    \n
  • \n
\n
\n

Approach #4: Binary Search with Rolling Hash [Accepted]

\n

Intuition

\n

As in Approach #2, we will binary search for the answer. However, we will use a rolling hash (Rabin-Karp algorithm) to store hashes in our set structure.

\n

Algorithm

\n

For some prime , consider the following function modulo some prime modulus :

\n

\n\n

\n

Notably, . This shows we can get the hash of all in linear time. We will also use the fact that .

\n

For every i >= length - 1, we will want to record the hash of A[i-length+1], A[i-length+2], ..., A[i]. After, we will truncate the first element by h = (h - A[i - (length - 1)]) * Pinv % MOD to get ready to add the next element.

\n

To make our algorithm air tight, we also make a naive check when our work with rolling hashes says that we have found a match.

\n
class Solution(object):\n    def findLength(self, A, B):\n        P, MOD = 113, 10**9 + 7\n        Pinv = pow(P, MOD-2, MOD)\n        def check(guess):\n            def rolling(A, length):\n                if length == 0:\n                    yield 0, 0\n                    return\n\n                h, power = 0, 1\n                for i, x in enumerate(A):\n                    h = (h + x * power) % MOD\n                    if i < length - 1:\n                        power = (power * P) % MOD\n                    else:\n                        yield h, i - (length - 1)\n                        h = (h - A[i - (length - 1)]) * Pinv % MOD\n\n            hashes = collections.defaultdict(list)\n            for ha, start in rolling(A, guess):\n                hashes[ha].append(start)\n            for ha, start in rolling(B, guess):\n                iarr = hashes.get(ha, [])\n                if any(A[i:i+guess] == B[start:start+guess] for i in iarr):\n                    return True\n            return False\n\n        lo, hi = 0, min(len(A), len(B)) + 1\n        while lo < hi:\n            mi = (lo + hi) / 2\n            if check(mi):\n                lo = mi + 1\n            else:\n                hi = mi\n        return lo - 1\n
\n

Java

\n
import java.math.BigInteger;\n\nclass Solution {\n    int P = 113;\n    int MOD = 1_000_000_007;\n    int Pinv = BigInteger.valueOf(P).modInverse(BigInteger.valueOf(MOD)).intValue();\n\n    private int[] rolling(int[] source, int length) {\n        int[] ans = new int[source.length - length + 1];\n        long h = 0, power = 1;\n        if (length == 0) return ans;\n        for (int i = 0; i < source.length; ++i) {\n            h = (h + source[i] * power) % MOD;\n            if (i < length - 1) {\n                power = (power * P) % MOD;\n            } else {\n                ans[i - (length - 1)] = (int) h;\n                h = (h - source[i - (length - 1)]) * Pinv % MOD;\n                if (h < 0) h += MOD;\n            }\n        }\n        return ans;\n    }\n\n    private boolean check(int guess, int[] A, int[] B) {\n        Map<Integer, List<Integer>> hashes = new HashMap();\n        int k = 0;\n        for (int x: rolling(A, guess)) {\n            hashes.computeIfAbsent(x, z -> new ArrayList()).add(k++);\n        }\n        int j = 0;\n        for (int x: rolling(B, guess)) {\n            for (int i: hashes.getOrDefault(x, new ArrayList<Integer>()))\n                if (Arrays.equals(Arrays.copyOfRange(A, i, i+guess),\n                                  Arrays.copyOfRange(B, j, j+guess))) {\n                    return true;\n                }\n            j++;\n        }\n        return false;\n    }\n\n    public int findLength(int[] A, int[] B) {\n        int lo = 0, hi = Math.min(A.length, B.length) + 1;\n        while (lo < hi) {\n            int mi = (lo + hi) / 2;\n            if (check(mi, A, B)) lo = mi + 1;\n            else hi = mi;\n        }\n        return lo - 1;\n    }\n}\n
\n

Complexity Analysis

\n
    \n
  • \n

    Time Complexity: , where are the lengths of A, B. The log factor is contributed by the binary search, while creating the rolling hashes is . The checks for duplicate hashes are . If we perform a naive check to make sure our answer is correct, it adds a factor of to our cost of check, which keeps the complexity the same.

    \n
  • \n
  • \n

    Space Complexity: , the space used to store hashes and the subarrays in our final naive check.

    \n
  • \n
\n
\n

Analysis written by: @awice. Idea for Solution #2 by @StefanPochmann.

\n
'