Disjoint Set Data Structure (Union–Find / DSU)

A Disjoint Set is a data structure that keeps track of a partition of elements into non-overlapping sets and efficiently supports two main operations:

Find(x) → Determine which set the element x belongs to
Union(x, y) → Merge the sets containing x and y

It is widely used in:

Graph algorithms (Kruskal’s MST)
Network connectivity
Image processing
Clustering
Tracking connected components

1. What is a Disjoint Set?

A Disjoint Set System represents a collection of sets where:

Each element belongs to exactly one set
Sets are disjoint (no common elements)
Sets change during operations (union operations)

2. Operations on Disjoint Set

1. Make-Set(x)

Creates a new set containing the single element x.

parent[x] = x
rank[x] = 0

2. Find(x)

Returns the representative (root) of the set containing x.

Used to check if two elements are in the same set.

3. Union(x, y)

Merges the sets containing x and y.

3. Efficient Implementations

There are two powerful optimizations used to make DSU extremely fast:

A. Path Compression (Optimization of Find)

Goal:
Flatten the structure of the tree during Find operation so future finds are faster.

Idea:

When doing Find(x), make every node on the path directly point to the root.

Example Before Compression

1 → 2 → 3 → 4   (root)

After Find(1):

Effect:
Find operation becomes nearly O(1).

B. Union by Rank / Union by Size

These are strategies to keep the tree shallow.

1. Union by Rank

Each root has a rank (an estimate of tree height).

Attach the shorter tree under the taller tree.
If ranks are equal → increase rank of new root by 1.

Example:

Tree A (rank 2)
Tree B (rank 1)

Union(A, B):

B attaches under A
rank[A] stays 2

2. Union by Size

Keep track of size of each tree.

Attach the smaller tree under the larger one.

Goal:
Minimize height → faster operations.

4. Combined: Path Compression + Union by Rank

When both are used together:

👉 Amortized time becomes almost constant

Formal time complexity:

[
O(\alpha(n))
]

Where α(n) is the inverse Ackermann function, one of the slowest-growing functions in computer science.

For all practical purposes:

[
\alpha(n) \le 4 \text{ for all realistic n}
]

So DSU is almost O(1).

5. Internal Representation

Disjoint Set is usually implemented using arrays:

parent[i]

Stores parent of node i.
If parent[i] = i → i is the root.

rank[i] or size[i]

Used for union strategies.

Example:

Node	Parent	Rank
1	1	0
2	2	0
3	3	0

After union(1,2):

Node	Parent
1	1
2	1

6. Example of Operations

Initial:

Make-Set for 1,2,3,4,5

1 2 3 4 5     (each is own root)

Union(1,2)

1
 \
  2

Union(3,4)

3
 \
  4

Union(2,3)

Attach root of 2’s set under root of 3’s set based on rank.

7. Applications of Disjoint Sets

A. Kruskal’s Minimum Spanning Tree Algorithm

Sort edges
Add edge if it doesn’t form a cycle
Detect cycle using DSU (Find operation)

B. Checking Connectivity in Graphs

Join nodes with Union
Query connectivity with Find

C. Detecting Cycles in Undirected Graphs

If Find(u) == Find(v) before union → cycle exists

D. Network / Social Network Connectivity

Friend groups
Connected components

E. Image Processing

Grouping pixels into connected regions

F. Clustering

Used in hierarchical clustering

8. Time Complexities

Operation	Without Optimizations	With Union by Rank + Path Compression
Make-Set	O(1)	O(1)
Find	O(n)	O(α(n)) ≈ O(1)
Union	O(n)	O(α(n)) ≈ O(1)

9. Advantages

✔ Extremely fast (nearly constant time)
✔ Very simple implementation
✔ Used in many graph algorithms

10. Disadvantages

✘ Does not support deletion of elements
✘ Only works for partitioned sets (non-overlapping)

Summary

A Disjoint Set / Union–Find structure supports:

Make-Set
Union
Find

Using:

Path Compression
Union by Rank / Size

It becomes one of the most efficient data structures in algorithms, almost O(1) per operation.

Data Structures for Disjoint Sets