220. Contains Duplicate III


Given an array of integers, find out whether there are two distinct indices i and j in the array such that the absolute difference between nums[i] and nums[j] is at most t and the absolute difference between i and j is at most k.


b'
\n\n

Summary

\n

This article is for intermediate readers. It introduces the following ideas:\nBinary Search Tree, HashMap, and Buckets.

\n

Solutions

\n
\n

Approach #1 (Naive Linear Search) [Time Limit Exceeded]

\n

Intuition

\n

Compare each element with the previous elements and see if their difference is at most .

\n

Algorithm

\n

This problem requires us to find and such that the following conditions are satisfied:

\n
    \n
  1. \n
  2. \n
  3. \n
  4. \n
\n

The naive approach is the same as Approach #1 in Contains Duplicate II solution, which keeps a virtual sliding window that holds the newest elements. In this way, Condition 1 above is always satisfied. We then check if Condition 2 is also satisfied by applying linear search.

\n

Java

\n
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {\n    for (int i = 0; i < nums.length; ++i) {\n        for (int j = Math.max(i - k, 0); j < i; ++j) {\n            if (Math.abs(nums[i] - nums[j]) <= t) return true;\n        }\n    }\n    return false;\n}\n// Time limit exceeded.\n
\n

Complexity Analysis

\n
    \n
  • \n

    Time complexity : .\nIt costs time for each linear search. Note that we do at most comparisons in one search even if can be larger than .

    \n
  • \n
  • \n

    Space complexity : .\nWe only used constant auxiliary space.

    \n
  • \n
\n
\n

Approach #2 (Binary Search Tree) [Accepted]

\n

Intuition

\n
    \n
  • \n

    If elements in the window are maintained in sorted order, we can apply binary search twice to check if Condition 2 is satisfied.

    \n
  • \n
  • \n

    By utilizing self-balancing Binary Search Tree, one can keep the window ordered at all times with logarithmic time insert and delete.

    \n
  • \n
\n

Algorithm

\n

The real bottleneck of Approach #1 is due to all elements in the sliding window are being scanned to check if Condition 2 is satisfied. Could we do better?

\n

If elements in the window are in sorted order, we can apply Binary Search twice to search for the two boundaries and for each element .

\n

Unfortunately, the window is unsorted. A common mistake here is attempting to maintain a sorted array. Although searching in a sorted array costs only logarithmic time, keeping the order of the elements after insert and delete operation is not as efficient. Imagine you have a sorted array with elements and you are adding a new item . Even if you can find the correct position in time, you still need time to insert into the sorted array. The reason is that you need to shift all elements after the insert position one step backward. The same reasoning applies to removal as well. After removing an item from position , you need to shift all elements after one step forward. Thus, we gain nothing in speed compared to the naive linear search approach above.

\n

To gain an actual speedup, we need a dynamic data structure that supports faster insert, search and delete. Self-balancing Binary Search Tree (BST) is the right data structure. The term Self-balancing means the tree automatically keeps its height small after arbitrary insert and delete operations. Why does self-balancing matter? That is because most operations on a BST take time directly proportional to the height of the tree. Take a look at the following non-balanced BST which is skewed to the left:

\n
            6\n           /\n          5\n         /\n        4\n       /\n      3\n     /\n    2\n   /\n  1\n
\n

Figure 1. A non-balanced BST that is skewed to the left.

\n

Searching in the above BST degrades to linear time, which is like searching in a linked list. Now compare to the BST below which is balanced:

\n
          4\n        /   \\\n       2     6\n      / \\   /\n     1   3  5\n
\n

Figure 2. A balanced BST.

\n

Assume that is the total number of nodes in the tree, a balanced binary tree maintains its height in the order of . Thus it supports time for each of insert, search and delete operations.

\n

Here is the entire algorithm in pseudocode:

\n
    \n
  • Initialize an empty BST set
  • \n
  • Loop through the array, for each element \n
      \n
    • Find the smallest element in set that is greater than or equal to , return true if \n
    • \n
    • Find the greatest element in set that is smaller than or equal to , return true if \n
    • \n
    • Put in set
    • \n
    • If the size of the set is larger than , remove the oldest item.
    • \n
    \n
  • \n
  • Return false
  • \n
\n

One may consider the smallest element that is greater or equal to as the successor of in the BST, as in: "What is the next greater value of ?". Similarly, we consider the greatest element that is smaller or equal to as the predecessor of in the BST, as in: "What is the previous smaller value of ?". These two values and are the two closest neighbors from . Thus by checking the distance from them to , we can conclude if Condition 2 is satisfied.

\n

Java

\n
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {\n    TreeSet<Integer> set = new TreeSet<>();\n    for (int i = 0; i < nums.length; ++i) {\n        // Find the successor of current element\n        Integer s = set.ceiling(nums[i]);\n        if (s != null && s <= nums[i] + t) return true;\n\n        // Find the predecessor of current element\n        Integer g = set.floor(nums[i]);\n        if (g != null && nums[i] <= g + t) return true;\n\n        set.add(nums[i]);\n        if (set.size() > k) {\n            set.remove(nums[i - k]);\n        }\n    }\n    return false;\n}\n
\n

Complexity Analysis

\n
    \n
  • \n

    Time complexity : .\nWe iterate through the array of size . For each iteration, it costs time (search, insert or delete) in the BST, since the size of the BST is upper bounded by both and .

    \n
  • \n
  • \n

    Space complexity : .\nSpace is dominated by the size of the BST, which is upper bounded by both and .

    \n
  • \n
\n

Note

\n
    \n
  • \n

    When the array\'s elements and \'s value are large, they can cause overflow in arithmetic operation. Consider using a larger size data type instead, such as long.

    \n
  • \n
  • \n

    C++\'s std::set, std::set::upper_bound and std::set::lower_bound are equivalent to Java\'s TreeSet, TreeSet::ceiling and TreeSet::floor, respectively. Python does not provide a Self-balancing BST through its library.

    \n
  • \n
\n
\n

Approach #3 (Buckets) [Accepted]

\n

Intuition

\n

Inspired by bucket sort, we can achieve linear time complexity in our problem using buckets as window.

\n

Algorithm

\n

Bucket sort is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, using a different sorting algorithm. Here is an illustration of buckets.

\n

Illustration of buckets

\n

Figure 3. Illustration of buckets.

\n

From the above example, we have 8 unsorted integers. We create 5 buckets covering the inclusive ranges of individually. Each of the eight elements is in a particular bucket. For element with value , its bucket label is and here we have . Sort each bucket using some other sorting algorithm and then collect all of them bucket by bucket.

\n

Back to our problem, the critical issue we are trying to solve is:

\n
\n
    \n
  1. For a given element is there an item in the window that is within the range of ?
  2. \n
  3. Could we do this in constant time?
  4. \n
\n
\n

Let us consider an example where each element is a person\'s birthday. Your birthday, say some day in March, is the new element . Suppose that each month has days and you want to know if anyone has a birthday within days of yours. Immediately, we can rule out all other months except February, March, April.

\n

The reason we know this is because each birthday belongs to a bucket we called month! And the range covered by the buckets are the same as distance which simplifies things a lot. Any two elements that are not in the same or adjacent buckets must have a distance greater than .

\n

We apply the above bucketing principle and design buckets covering the ranges of . We keep the window using this buckets. Then, each time, all we need to check is the bucket that belongs to and its two adjacent buckets. Thus, we have a constant time algorithm for searching almost duplicate in the window.

\n

One thing worth mentioning is the difference from bucket sort \xe2\x80\x93 Each of our buckets contains at most one element at any time, because two elements in a bucket means "almost duplicate" and we can return early from the function. Therefore, a HashMap with an element associated with a bucket label is enough for our purpose.

\n

Java

\n
public class Solution {\n    // Get the ID of the bucket from element value x and bucket width w\n    // In Java, `-3 / 5 = 0` and but we need `-3 / 5 = -1`.\n    private long getID(long x, long w) {\n        return x < 0 ? (x + 1) / w - 1 : x / w;\n    }\n\n    public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {\n        if (t < 0) return false;\n        Map<Long, Long> d = new HashMap<>();\n        long w = (long)t + 1;\n        for (int i = 0; i < nums.length; ++i) {\n            long m = getID(nums[i], w);\n            // check if bucket m is empty, each bucket may contain at most one element\n            if (d.containsKey(m))\n                return true;\n            // check the neighbor buckets for almost duplicate\n            if (d.containsKey(m - 1) && Math.abs(nums[i] - d.get(m - 1)) < w)\n                return true;\n            if (d.containsKey(m + 1) && Math.abs(nums[i] - d.get(m + 1)) < w)\n                return true;\n            // now bucket m is empty and no almost duplicate in neighbor buckets\n            d.put(m, (long)nums[i]);\n            if (i >= k) d.remove(getID(nums[i - k], w));\n        }\n        return false;\n    }\n}\n
\n

Complexity Analysis

\n
    \n
  • \n

    Time complexity : .\nFor each of the elements, we do at most three searches, one insert, and one delete on the HashMap, which costs constant time on average. Thus, the entire algorithm costs time.

    \n
  • \n
  • \n

    Space complexity : .\nSpace is dominated by the HashMap, which is linear to the size of its elements. The size of the HashMap is upper bounded by both and . Thus the space complexity is .

    \n
  • \n
\n

See Also

\n\n
'