# UnionFind

ClassS04CS141 | recent changes | Preferences

Described the UNION-FIND data type, with operations: make-set, union, find.

Described the following implementation: each set is stored as a doubly linked list with a header containing the name of the set, with each element of the list having a "parent" pointer to the header.

Find and Makeset are constant time operations.

Union(A,B) is done by merging B into A: Appending B's linked list to A's and updating all the parent pointers of B's elements to point to A. Time for union is proportional to the number of elements in the set B.

For any sequence of N operations on the data structure, if there are M elements, the total time is O(NM) because each union operation takes O(M) time.

Class exercise: improve the bound to O(N + M2).

answer: observe that each element has its parent pointer changed at most M times.

Class exercise: for any M, find a sequence of M operations on M elements that takes time proportional to M2.

answer: starting with elements 1,2,…, M,do

UNION(FIND(2),FIND(1)), UNION(FIND(3),FIND(1)), UNION(FIND(4),FIND(1)), …, UNION(FIND(M),FIND(1)

UNION(FIND(i),FIND(1)) takes time proportional to i, since the second set has size i.

Thus, total time is proportional to 1+2+3 + … + M, which is proportional to M2. (To see that the sum is proportional to M2, note that each of the M terms in the sum is at most M, so the sum is at most M2, and there are at least M/2 terms in the sum that are at least M/2, so the sum is at least M2/4.)

Next we considered the following improvement to UNION: instead of merging the second set into the first, merge the smaller set into the larger set.

What is the worst-case time for N operations on M elements now?

We considered an example:

UNION(FIND(1), FIND(2)), UNION(FIND(3), FIND(4)), ..., UNION(FIND(15), FIND(16)) (8 unions, each with sets of size 1)
then
UNION(FIND(1), FIND(3)), UNION(FIND(5), FIND(7)), ..., UNION(FIND(13), FIND(15)) (4 unions, each with sets of size 2)
then
UNION(FIND(1), FIND(5)), UNION(FIND(9), FIND(13)) (2 unions, each with sets of size 2)
then
UNION(FIND(1), FIND(9)) (1 union, with sets of size 4)

if we start with M elements, then the time for an example of this kind is proportional to M log(M). (This is similar to analysing mergesort.)

claim: for any sequence of N operations on M elements, the time is O(N+M log(M))

proof: The time for the makesets and the finds is O(N).

To count the time for the unions, note that each element X has its parent pointer changed at most log(M) times. This is because each time the parent pointer changes, the size of the set X is in at least doubles (because the parent pointers change only for the elements in the smaller set each time).

Thus, the total time spent changing parent pointers throughout the entire sequence of operations is O(M log(M)). The time spent doing UNIONs is proportional to the time spent changing parent pointers, plus the number of unions. Thus, the time spent doing UNIONS is O(N+M log(M)).

### References

ClassS04CS141 | recent changes | Preferences