Given two sentences words1, words2
(each represented as an array of strings), and a list of similar word pairs pairs
, determine if two sentences are similar.
For example, words1 = ["great", "acting", "skills"]
and words2 = ["fine", "drama", "talent"]
are similar, if the similar word pairs are pairs = [["great", "good"], ["fine", "good"],
["acting","drama"], ["skills","talent"]]
.
Note that the similarity relation is transitive. For example, if "great" and "good" are similar, and "fine" and "good" are similar, then "great" and "fine" are similar.
Similarity is also symmetric. For example, "great" and "fine" being similar is the same as "fine" and "great" being similar.
Also, a word is always similar with itself. For example, the sentences words1 = ["great"], words2 = ["great"], pairs = []
are similar, even though there are no specified similar word pairs.
Finally, sentences can only be similar if they have the same number of words. So a sentence like words1 = ["great"]
can never be similar to words2 = ["doubleplus","good"]
.
Note:
words1
and words2
will not exceed 1000
.pairs
will not exceed 2000
.pairs[i]
will be 2
.words[i]
and pairs[i][j]
will be in the range [1, 20]
.Intuition
\nTwo words are similar if they are the same, or there is a path connecting them from edges represented by pairs
.
We can check whether this path exists by performing a depth-first search from a word and seeing if we reach the other word. The search is performed on the underlying graph specified by the edges in pairs
.
Algorithm
\nWe start by building our graph
from the edges in pairs
.
The specific algorithm we go for is an iterative depth-first search. The implementation we go for is a typical "visitor pattern": when searching whether there is a path from w1 = words1[i]
to w2 = words2[i]
, stack
will contain all the nodes that are queued up for processing, while seen
will be all the nodes that have been queued for processing (whether they have been processed or not).
Complexity Analysis
\nTime Complexity: , where is the maximum length of words1
and words2
, and is the length of pairs
. Each of searches could search the entire graph.
Space Complexity: , the size of pairs
.
Intuition
\nAs in Approach #1, we want to know if there is path connecting two words from edges represented by pairs
.
Our problem comes down to finding the connected components of a graph. This is a natural fit for a Disjoint Set Union (DSU) structure.
\nAlgorithm
\nDraw edges between words if they are similar. For easier interoperability between our DSU template, we will map each word
to some integer ix = index[word]
. Then, dsu.find(ix)
will tell us a unique id representing what component that word is in.
For more information on DSU, please look at Approach #2 in the article here. For brevity, the solutions showcased below do not use union-by-rank.
\nAfter putting each word in pairs
into our DSU template, we check successive pairs of words w1, w2 = words1[i], words2[i]
. We require that w1 == w2
, or w1
and w2
are in the same component. This is easily checked using dsu.find
.
Complexity Analysis
\nTime Complexity: , where is the maximum length of words1
and words2
, and is the length of pairs
. If we used union-by-rank, this complexity improves to , where is the Inverse-Ackermann function.
Space Complexity: , the size of pairs
.
Analysis written by: @awice.
\n