Disjoint Set Data Structure (Union–Find / DSU)
A Disjoint Set is a data structure that keeps track of a partition of elements into non-overlapping sets and efficiently supports two main operations:
- Find(x) → Determine which set the element x belongs to
- Union(x, y) → Merge the sets containing x and y
It is widely used in:
- Graph algorithms (Kruskal’s MST)
- Network connectivity
- Image processing
- Clustering
- Tracking connected components
1. What is a Disjoint Set?
A Disjoint Set System represents a collection of sets where:
- Each element belongs to exactly one set
- Sets are disjoint (no common elements)
- Sets change during operations (union operations)
2. Operations on Disjoint Set
1. Make-Set(x)
Creates a new set containing the single element x.
parent[x] = x
rank[x] = 0
2. Find(x)
Returns the representative (root) of the set containing x.
Used to check if two elements are in the same set.
3. Union(x, y)
Merges the sets containing x and y.
3. Efficient Implementations
There are two powerful optimizations used to make DSU extremely fast:
A. Path Compression (Optimization of Find)
Goal:
Flatten the structure of the tree during Find operation so future finds are faster.
Idea:
When doing Find(x), make every node on the path directly point to the root.
Example Before Compression
1 → 2 → 3 → 4 (root)
After Find(1):
1 → 4
2 → 4
3 → 4
4 → 4
Effect:
Find operation becomes nearly O(1).
B. Union by Rank / Union by Size
These are strategies to keep the tree shallow.
1. Union by Rank
Each root has a rank (an estimate of tree height).
- Attach the shorter tree under the taller tree.
- If ranks are equal → increase rank of new root by 1.
Example:
Tree A (rank 2)
Tree B (rank 1)
Union(A, B):
- B attaches under A
- rank[A] stays 2
2. Union by Size
Keep track of size of each tree.
- Attach the smaller tree under the larger one.
Goal:
Minimize height → faster operations.
4. Combined: Path Compression + Union by Rank
When both are used together:
👉 Amortized time becomes almost constant
Formal time complexity:
[
O(\alpha(n))
]
Where α(n) is the inverse Ackermann function, one of the slowest-growing functions in computer science.
For all practical purposes:
[
\alpha(n) \le 4 \text{ for all realistic n}
]
So DSU is almost O(1).
5. Internal Representation
Disjoint Set is usually implemented using arrays:
parent[i]
Stores parent of node i.
If parent[i] = i → i is the root.
rank[i] or size[i]
Used for union strategies.
Example:
| Node | Parent | Rank |
|---|---|---|
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 3 | 3 | 0 |
After union(1,2):
| Node | Parent |
|---|---|
| 1 | 1 |
| 2 | 1 |
6. Example of Operations
Initial:
Make-Set for 1,2,3,4,5
1 2 3 4 5 (each is own root)
Union(1,2)
1
\
2
Union(3,4)
3
\
4
Union(2,3)
Attach root of 2’s set under root of 3’s set based on rank.
7. Applications of Disjoint Sets
A. Kruskal’s Minimum Spanning Tree Algorithm
- Sort edges
- Add edge if it doesn’t form a cycle
- Detect cycle using DSU (Find operation)
B. Checking Connectivity in Graphs
- Join nodes with Union
- Query connectivity with Find
C. Detecting Cycles in Undirected Graphs
- If Find(u) == Find(v) before union → cycle exists
D. Network / Social Network Connectivity
- Friend groups
- Connected components
E. Image Processing
- Grouping pixels into connected regions
F. Clustering
- Used in hierarchical clustering
8. Time Complexities
| Operation | Without Optimizations | With Union by Rank + Path Compression |
|---|---|---|
| Make-Set | O(1) | O(1) |
| Find | O(n) | O(α(n)) ≈ O(1) |
| Union | O(n) | O(α(n)) ≈ O(1) |
9. Advantages
✔ Extremely fast (nearly constant time)
✔ Very simple implementation
✔ Used in many graph algorithms
10. Disadvantages
✘ Does not support deletion of elements
✘ Only works for partitioned sets (non-overlapping)
Summary
A Disjoint Set / Union–Find structure supports:
- Make-Set
- Union
- Find
Using:
- Path Compression
- Union by Rank / Size
It becomes one of the most efficient data structures in algorithms, almost O(1) per operation.
