Given an array of integers, find if the array contains any duplicates. Your function should return true if any value appears at least twice in the array, and it should return false if every element is distinct.
This article is for beginners. It introduces the following ideas:\nLoop Invariant, Linear Search, Sorting and Hash Table.
\nIntuition
\nFor an array of integers, there are pairs of integers. Thus, we may check all pairs and see if there is any pair with duplicates.
\nAlgorithm
\nTo apply this idea, we employ the linear search algorithm which is the simplest search algorithm. Linear search is a method of finding if a particular value is in a list by checking each of its elements, one at a time and in sequence until the desired one is found.
\nFor our problem, we loop through all integers. For the th integer nums[i]
, we search in the previous i-1
integers for the duplicate of nums[i]
. If we find one, we return true; if not, we continue. Return false at the end of the program.
To prove the correctness of the algorithm, we define the loop invariant. A loop invariant is a property that holds before (and after) each iteration. Knowing its invariant(s) is essential for understanding the effect of a loop. Here is the loop invariant:
\n\n\nBefore the next search, there are no duplicate integers in the searched integers.
\n
The loop invariant holds true before the loop because there is no searched integer.\nEach time through the loop we look for any any possible duplicate of the current element.\nIf we found a duplicate, the function exits by returning true; If not, the invariant still holds true.
\nTherefore, if the loop finishes, the invariant tells us that there is no duplicate in all integers.
\nJava
\npublic boolean containsDuplicate(int[] nums) {\n for (int i = 0; i < nums.length; ++i) {\n for (int j = 0; j < i; ++j) {\n if (nums[j] == nums[i]) return true; \n }\n }\n return false;\n}\n// Time Limit Exceeded\n
Complexity Analysis
\nTime complexity : . In the worst case, there are pairs of integers to check. Therefore, the time complexity is .
\nSpace complexity : .\nWe only used constant extra space.
\nNote
\nThis approach will get Time Limit Exceeded on LeetCode. Usually, if an algorithm is , it can handle up to around . It gets Time Limit Exceeded when .
\nIntuition
\nIf there are any duplicate integers, they will be consecutive after sorting.
\nAlgorithm
\nThis approach employs sorting algorithm. Since comparison sorting algorithm like heapsort is known to provide worst-case performance, sorting is often a good preprocessing step. After sorting, we can sweep the sorted array to find if there are any two consecutive duplicate elements.
\nJava
\npublic boolean containsDuplicate(int[] nums) {\n Arrays.sort(nums);\n for (int i = 0; i < nums.length - 1; ++i) {\n if (nums[i] == nums[i + 1]) return true;\n }\n return false;\n}\n
Complexity Analysis
\nTime complexity : .\nSorting is and the sweeping is . The entire algorithm is dominated by the sorting step, which is .
\nSpace complexity : .\nSpace depends on the sorting implementation which, usually, costs auxiliary space if heapsort
is used.
Note
\nThe implementation here modifies the original array by sorting it. In general, it is not a good practice to modify the input unless it is clear to the caller that the input will be modified. One may make a copy of nums
and operate on the copy instead.
Intuition
\nUtilize a dynamic data structure that supports fast search and insert operations.
\nAlgorithm
\nFrom Approach #1 we know that search operations is in an unsorted array and we did so repeatedly. Utilizing a data structure with faster search time will speed up the entire algorithm.
\nThere are many data structures commonly used as dynamic sets such as Binary Search Tree and Hash Table. The operations we need to support here are search()
and insert()
. For a self-balancing Binary Search Tree (TreeSet or TreeMap in Java), search()
and insert()
are both time. For a Hash Table (HashSet or HashMap in Java), search()
and insert()
are both on average. Therefore, by using hash table, we can achieve linear time complexity for finding the duplicate in an unsorted array.
Java
\npublic boolean containsDuplicate(int[] nums) {\n Set<Integer> set = new HashSet<>(nums.length);\n for (int x: nums) {\n if (set.contains(x)) return true;\n set.add(x);\n }\n return false;\n}\n
Complexity Analysis
\nTime complexity : .\nWe do search()
and insert()
for times and each operation takes constant time.
Space complexity : .\nThe space used by a hash table is linear with the number of elements in it.
\nNote
\nFor certain test cases with not very large , the runtime of this method can be slower than Approach #2. The reason is hash table has some overhead in maintaining its property. One should keep in mind that real world performance can be different from what the Big-O notation says. The Big-O notation only tells us that for sufficiently large input, one will be faster than the other. Therefore, when is not sufficiently large, an algorithm can be slower than an algorithm.
\n