Skip to content
Home » Data Structures for Disjoint Sets

Data Structures for Disjoint Sets


Disjoint Set Data Structure (Union–Find / DSU)

A Disjoint Set is a data structure that keeps track of a partition of elements into non-overlapping sets and efficiently supports two main operations:

  1. Find(x) → Determine which set the element x belongs to
  2. Union(x, y) → Merge the sets containing x and y

It is widely used in:

  • Graph algorithms (Kruskal’s MST)
  • Network connectivity
  • Image processing
  • Clustering
  • Tracking connected components

1. What is a Disjoint Set?

A Disjoint Set System represents a collection of sets where:

  • Each element belongs to exactly one set
  • Sets are disjoint (no common elements)
  • Sets change during operations (union operations)

2. Operations on Disjoint Set

1. Make-Set(x)

Creates a new set containing the single element x.

parent[x] = x
rank[x] = 0

2. Find(x)

Returns the representative (root) of the set containing x.

Used to check if two elements are in the same set.

3. Union(x, y)

Merges the sets containing x and y.


3. Efficient Implementations

There are two powerful optimizations used to make DSU extremely fast:


A. Path Compression (Optimization of Find)

Goal:
Flatten the structure of the tree during Find operation so future finds are faster.

Idea:

When doing Find(x), make every node on the path directly point to the root.

Example Before Compression

1 → 2 → 3 → 4   (root)

After Find(1):

1 → 4
2 → 4
3 → 4
4 → 4

Effect:
Find operation becomes nearly O(1).


B. Union by Rank / Union by Size

These are strategies to keep the tree shallow.


1. Union by Rank

Each root has a rank (an estimate of tree height).

  • Attach the shorter tree under the taller tree.
  • If ranks are equal → increase rank of new root by 1.

Example:

Tree A (rank 2)
Tree B (rank 1)

Union(A, B):

  • B attaches under A
  • rank[A] stays 2

2. Union by Size

Keep track of size of each tree.

  • Attach the smaller tree under the larger one.

Goal:
Minimize height → faster operations.


4. Combined: Path Compression + Union by Rank

When both are used together:

👉 Amortized time becomes almost constant

Formal time complexity:

[
O(\alpha(n))
]

Where α(n) is the inverse Ackermann function, one of the slowest-growing functions in computer science.

For all practical purposes:

[
\alpha(n) \le 4 \text{ for all realistic n}
]

So DSU is almost O(1).


5. Internal Representation

Disjoint Set is usually implemented using arrays:

parent[i]

Stores parent of node i.
If parent[i] = i → i is the root.

rank[i] or size[i]

Used for union strategies.

Example:

NodeParentRank
110
220
330

After union(1,2):

NodeParent
11
21

6. Example of Operations

Initial:

Make-Set for 1,2,3,4,5

1 2 3 4 5     (each is own root)

Union(1,2)

1
 \
  2

Union(3,4)

3
 \
  4

Union(2,3)

Attach root of 2’s set under root of 3’s set based on rank.


7. Applications of Disjoint Sets

A. Kruskal’s Minimum Spanning Tree Algorithm

  • Sort edges
  • Add edge if it doesn’t form a cycle
  • Detect cycle using DSU (Find operation)

B. Checking Connectivity in Graphs

  • Join nodes with Union
  • Query connectivity with Find

C. Detecting Cycles in Undirected Graphs

  • If Find(u) == Find(v) before union → cycle exists

D. Network / Social Network Connectivity

  • Friend groups
  • Connected components

E. Image Processing

  • Grouping pixels into connected regions

F. Clustering

  • Used in hierarchical clustering

8. Time Complexities

OperationWithout OptimizationsWith Union by Rank + Path Compression
Make-SetO(1)O(1)
FindO(n)O(α(n)) ≈ O(1)
UnionO(n)O(α(n)) ≈ O(1)

9. Advantages

✔ Extremely fast (nearly constant time)
✔ Very simple implementation
✔ Used in many graph algorithms


10. Disadvantages

✘ Does not support deletion of elements
✘ Only works for partitioned sets (non-overlapping)


Summary

A Disjoint Set / Union–Find structure supports:

  • Make-Set
  • Union
  • Find

Using:

  • Path Compression
  • Union by Rank / Size

It becomes one of the most efficient data structures in algorithms, almost O(1) per operation.