Given a string, find the length of the longest substring without repeating characters.
Examples:
Given "abcabcbb"
, the answer is "abc"
, which the length is 3.
Given "bbbbb"
, the answer is "b"
, with the length of 1.
Given "pwwkew"
, the answer is "wke"
, with the length of 3. Note that the answer must be a substring, "pwke"
is a subsequence and not a substring.
Intuition
\nCheck all the substring one by one to see if it has no duplicate character.
\nAlgorithm
\nSuppose we have a function boolean allUnique(String substring)
which will return true if the characters in the substring are all unique, otherwise false. We can iterate through all the possible substrings of the given string s
and call the function allUnique
. If it turns out to be true, then we update our answer of the maximum length of substring without duplicate characters.
Now let\'s fill the missing parts:
\nTo enumerate all substrings of a given string, we enumerate the start and end indices of them. Suppose the start and end indices are and , respectively. Then we have (here end index is exclusive by convention). Thus, using two nested loops with from 0 to and from to , we can enumerate all the substrings of s
.
To check if one string has duplicate characters, we can use a set. We iterate through all the characters in the string and put them into the set
one by one. Before putting one character, we check if the set already contains it. If so, we return false
. After the loop, we return true
.
Java
\npublic class Solution {\n public int lengthOfLongestSubstring(String s) {\n int n = s.length();\n int ans = 0;\n for (int i = 0; i < n; i++)\n for (int j = i + 1; j <= n; j++)\n if (allUnique(s, i, j)) ans = Math.max(ans, j - i);\n return ans;\n }\n\n public boolean allUnique(String s, int start, int end) {\n Set<Character> set = new HashSet<>();\n for (int i = start; i < end; i++) {\n Character ch = s.charAt(i);\n if (set.contains(ch)) return false;\n set.add(ch);\n }\n return true;\n }\n}\n
Complexity Analysis
\nTo verify if characters within index range are all unique, we need to scan all of them. Thus, it costs time.
\nFor a given i
, the sum of time costed by each is
\n\n
\nThus, the sum of all the time consumption is:
\n\n\n
\nSet
. The size of the Set is upper bounded by the size of the string and the size of the charset/alphabet .Algorithm
\nThe naive approach is very straightforward. But it is too slow. So how can we optimize it?
\nIn the naive approaches, we repeatedly check a substring to see if it has duplicate character. But it is unnecessary. If a substring from index to is already checked to have no duplicate characters. We only need to check if is already in the substring .
\nTo check if a character is already in the substring, we can scan the substring, which leads to an algorithm. But we can do better.
\nBy using HashSet as a sliding window, checking if a character in the current can be done in .
\nA sliding window is an abstract concept commonly used in array/string problems. A window is a range of elements in the array/string which usually defined by the start and end indices, i.e. (left-closed, right-open). A sliding window is a window "slides" its two boundaries to the certain direction. For example, if we slide to the right by element, then it becomes (left-closed, right-open).
\nBack to our problem. We use HashSet to store the characters in current window ( initially). Then we slide the index to the right. If it is not in the HashSet, we slide further. Doing so until s[j] is already in the HashSet. At this point, we found the maximum size of substrings without duplicate characters start with index . If we do this for all , we get our answer.
\nJava
\npublic class Solution {\n public int lengthOfLongestSubstring(String s) {\n int n = s.length();\n Set<Character> set = new HashSet<>();\n int ans = 0, i = 0, j = 0;\n while (i < n && j < n) {\n // try to extend the range [i, j]\n if (!set.contains(s.charAt(j))){\n set.add(s.charAt(j++));\n ans = Math.max(ans, j - i);\n }\n else {\n set.remove(s.charAt(i++));\n }\n }\n return ans;\n }\n}\n
Complexity Analysis
\nTime complexity : . In the worst case each character will be visited twice by and .
\nSpace complexity : . Same as the previous approach. We need space for the sliding window, where is the size of the Set
. The size of the Set is upper bounded by the size of the string and the size of the charset/alphabet .
The above solution requires at most 2n steps. In fact, it could be optimized to require only n steps. Instead of using a set to tell if a character exists or not, we could define a mapping of the characters to its index. Then we can skip the characters immediately when we found a repeated character.
\nThe reason is that if have a duplicate in the range with index , we don\'t need to increase little by little. We can skip all the elements in the range and let to be directly.
\nJava (Using HashMap)
\npublic class Solution {\n public int lengthOfLongestSubstring(String s) {\n int n = s.length(), ans = 0;\n Map<Character, Integer> map = new HashMap<>(); // current index of character\n // try to extend the range [i, j]\n for (int j = 0, i = 0; j < n; j++) {\n if (map.containsKey(s.charAt(j))) {\n i = Math.max(map.get(s.charAt(j)), i);\n }\n ans = Math.max(ans, j - i + 1);\n map.put(s.charAt(j), j + 1);\n }\n return ans;\n }\n}\n
Java (Assuming ASCII 128)
\nThe previous implements all have no assumption on the charset of the string s
.
If we know that the charset is rather small, we can replace the Map
with an integer array as direct access table.
Commonly used tables are:
\nint[26]
for Letters \'a\' - \'z\' or \'A\' - \'Z\'int[128]
for ASCIIint[256]
for Extended ASCIIpublic class Solution {\n public int lengthOfLongestSubstring(String s) {\n int n = s.length(), ans = 0;\n int[] index = new int[128]; // current index of character\n // try to extend the range [i, j]\n for (int j = 0, i = 0; j < n; j++) {\n i = Math.max(index[s.charAt(j)], i);\n ans = Math.max(ans, j - i + 1);\n index[s.charAt(j)] = j + 1;\n }\n return ans;\n }\n}\n
Complexity Analysis
\nTime complexity : . Index will iterate times.
\nSpace complexity (HashMap) : . Same as the previous approach.
\nSpace complexity (Table): . is the size of the charset.
\n