Given an array of integers, find out whether there are two distinct indices i and j in the array such that the absolute difference between nums[i] and nums[j] is at most t and the absolute difference between i and j is at most k.
This article is for intermediate readers. It introduces the following ideas:\nBinary Search Tree, HashMap, and Buckets.
\nIntuition
\nCompare each element with the previous elements and see if their difference is at most .
\nAlgorithm
\nThis problem requires us to find and such that the following conditions are satisfied:
\n\nThe naive approach is the same as Approach #1 in Contains Duplicate II solution, which keeps a virtual sliding window that holds the newest elements. In this way, Condition 1 above is always satisfied. We then check if Condition 2 is also satisfied by applying linear search.
\nJava
\npublic boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {\n for (int i = 0; i < nums.length; ++i) {\n for (int j = Math.max(i - k, 0); j < i; ++j) {\n if (Math.abs(nums[i] - nums[j]) <= t) return true;\n }\n }\n return false;\n}\n// Time limit exceeded.\n
Complexity Analysis
\nTime complexity : .\nIt costs time for each linear search. Note that we do at most comparisons in one search even if can be larger than .
\nSpace complexity : .\nWe only used constant auxiliary space.
\nIntuition
\nIf elements in the window are maintained in sorted order, we can apply binary search twice to check if Condition 2 is satisfied.
\nBy utilizing self-balancing Binary Search Tree, one can keep the window ordered at all times with logarithmic time insert
and delete
.
Algorithm
\nThe real bottleneck of Approach #1 is due to all elements in the sliding window are being scanned to check if Condition 2 is satisfied. Could we do better?
\nIf elements in the window are in sorted order, we can apply Binary Search twice to search for the two boundaries and for each element .
\nUnfortunately, the window is unsorted. A common mistake here is attempting to maintain a sorted array. Although searching in a sorted array costs only logarithmic time, keeping the order of the elements after insert
and delete
operation is not as efficient. Imagine you have a sorted array with elements and you are adding a new item . Even if you can find the correct position in time, you still need time to insert into the sorted array. The reason is that you need to shift all elements after the insert position one step backward. The same reasoning applies to removal as well. After removing an item from position , you need to shift all elements after one step forward. Thus, we gain nothing in speed compared to the naive linear search approach above.
To gain an actual speedup, we need a dynamic data structure that supports faster insert
, search
and delete
. Self-balancing Binary Search Tree (BST) is the right data structure. The term Self-balancing means the tree automatically keeps its height small after arbitrary insert
and delete
operations. Why does self-balancing matter? That is because most operations on a BST take time directly proportional to the height of the tree. Take a look at the following non-balanced BST which is skewed to the left:
6\n /\n 5\n /\n 4\n /\n 3\n /\n 2\n /\n 1\n
Figure 1. A non-balanced BST that is skewed to the left.
\nSearching in the above BST degrades to linear time, which is like searching in a linked list. Now compare to the BST below which is balanced:
\n4\n / \\\n 2 6\n / \\ /\n 1 3 5\n
Figure 2. A balanced BST.
\nAssume that is the total number of nodes in the tree, a balanced binary tree maintains its height in the order of . Thus it supports time for each of insert
, search
and delete
operations.
Here is the entire algorithm in pseudocode:
\nset
set
that is greater than or equal to , return true if \nset
that is smaller than or equal to , return true if \nset
One may consider the smallest element that is greater or equal to as the successor of in the BST, as in: "What is the next greater value of ?". Similarly, we consider the greatest element that is smaller or equal to as the predecessor of in the BST, as in: "What is the previous smaller value of ?". These two values and are the two closest neighbors from . Thus by checking the distance from them to , we can conclude if Condition 2 is satisfied.
\nJava
\npublic boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {\n TreeSet<Integer> set = new TreeSet<>();\n for (int i = 0; i < nums.length; ++i) {\n // Find the successor of current element\n Integer s = set.ceiling(nums[i]);\n if (s != null && s <= nums[i] + t) return true;\n\n // Find the predecessor of current element\n Integer g = set.floor(nums[i]);\n if (g != null && nums[i] <= g + t) return true;\n\n set.add(nums[i]);\n if (set.size() > k) {\n set.remove(nums[i - k]);\n }\n }\n return false;\n}\n
Complexity Analysis
\nTime complexity : .\nWe iterate through the array of size . For each iteration, it costs time (search
, insert
or delete
) in the BST, since the size of the BST is upper bounded by both and .
Space complexity : .\nSpace is dominated by the size of the BST, which is upper bounded by both and .
\nNote
\nWhen the array\'s elements and \'s value are large, they can cause overflow in arithmetic operation. Consider using a larger size data type instead, such as long.
\nC++\'s std::set
, std::set::upper_bound
and std::set::lower_bound
are equivalent to Java\'s TreeSet
, TreeSet::ceiling
and TreeSet::floor
, respectively. Python does not provide a Self-balancing BST through its library.
Intuition
\nInspired by bucket sort
, we can achieve linear time complexity in our problem using buckets as window.
Algorithm
\nBucket sort is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, using a different sorting algorithm. Here is an illustration of buckets.
\nFigure 3. Illustration of buckets.
\nFrom the above example, we have 8 unsorted integers. We create 5 buckets covering the inclusive ranges of individually. Each of the eight elements is in a particular bucket. For element with value , its bucket label is and here we have . Sort each bucket using some other sorting algorithm and then collect all of them bucket by bucket.
\nBack to our problem, the critical issue we are trying to solve is:
\n\n\n\n
\n- For a given element is there an item in the window that is within the range of ?
\n- Could we do this in constant time?
\n
Let us consider an example where each element is a person\'s birthday. Your birthday, say some day in March, is the new element . Suppose that each month has days and you want to know if anyone has a birthday within days of yours. Immediately, we can rule out all other months except February, March, April.
\nThe reason we know this is because each birthday belongs to a bucket we called month! And the range covered by the buckets are the same as distance which simplifies things a lot. Any two elements that are not in the same or adjacent buckets must have a distance greater than .
\nWe apply the above bucketing principle and design buckets covering the ranges of . We keep the window using this buckets. Then, each time, all we need to check is the bucket that belongs to and its two adjacent buckets. Thus, we have a constant time algorithm for searching almost duplicate in the window.
\nOne thing worth mentioning is the difference from bucket sort \xe2\x80\x93 Each of our buckets contains at most one element at any time, because two elements in a bucket means "almost duplicate" and we can return early from the function. Therefore, a HashMap with an element associated with a bucket label is enough for our purpose.
\nJava
\npublic class Solution {\n // Get the ID of the bucket from element value x and bucket width w\n // In Java, `-3 / 5 = 0` and but we need `-3 / 5 = -1`.\n private long getID(long x, long w) {\n return x < 0 ? (x + 1) / w - 1 : x / w;\n }\n\n public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {\n if (t < 0) return false;\n Map<Long, Long> d = new HashMap<>();\n long w = (long)t + 1;\n for (int i = 0; i < nums.length; ++i) {\n long m = getID(nums[i], w);\n // check if bucket m is empty, each bucket may contain at most one element\n if (d.containsKey(m))\n return true;\n // check the neighbor buckets for almost duplicate\n if (d.containsKey(m - 1) && Math.abs(nums[i] - d.get(m - 1)) < w)\n return true;\n if (d.containsKey(m + 1) && Math.abs(nums[i] - d.get(m + 1)) < w)\n return true;\n // now bucket m is empty and no almost duplicate in neighbor buckets\n d.put(m, (long)nums[i]);\n if (i >= k) d.remove(getID(nums[i - k], w));\n }\n return false;\n }\n}\n
Complexity Analysis
\nTime complexity : .\nFor each of the elements, we do at most three searches, one insert, and one delete on the HashMap, which costs constant time on average. Thus, the entire algorithm costs time.
\nSpace complexity : .\nSpace is dominated by the HashMap, which is linear to the size of its elements. The size of the HashMap is upper bounded by both and . Thus the space complexity is .
\n