Below is a clear, complete, exam-oriented explanation of the Rabin–Karp Algorithm, written exactly like a Data Structures & Algorithms expert, including an intuitive understanding, mathematical explanation, and Python code.
📘 Rabin–Karp Algorithm (String Matching using Hashing)
The Rabin–Karp algorithm is a string-matching algorithm that uses hashing to find occurrences of a pattern string P inside a text string T.
Instead of comparing characters one by one at each position (like brute force), Rabin–Karp compares hash values.
📌 Key Idea
- Compute hash of the pattern.
- Compute hash of the first m characters of the text.
- Slide the pattern over the text by one position:
- Quickly compute the next substring hash using rolling hash.
- If hashes match → compare characters to avoid false collisions.
- Continue until end of text.
Hashing makes the algorithm very efficient for searching.
📘 When Rabin–Karp Is Useful
- Multiple pattern searching
- Searching in large text files
- Plagiarism detection
- Bioinformatics (DNA search)
- Searching for multiple keywords efficiently
- When hashing is faster than character comparison
📘 Rolling Hash Concept (Important)
Given substring:
[
T[i \dots i+m-1]
]
Rolling hash allows computing:
[
T[i+1 \dots i+m]
]
without recomputing from scratch.
This makes sliding the pattern fast.
📘 Hash Formula (Polynomial Rolling Hash)
Let:
- Base = d (number of possible characters, e.g., 256 for ASCII)
- Mod = q (a large prime number)
For a substring of length m:
[
H = (T[i] \cdot d^{m-1} + T[i+1] \cdot d^{m-2} + \dots + T[i+m-1]) \mod q
]
📘 Time Complexity
| Case | Time |
|---|---|
| Best / Average | O(n + m) |
| Worst (many collisions) | O(nm) |
Worst-case occurs when many hash collisions happen, but choosing good q minimizes this.
📘 Space Complexity
- O(1) auxiliary space
- O(m) for storing hash / pattern
📌 Advantages
- Very fast for multiple patterns
- Efficient average-case
- Rolling hash avoids recomputation
- Works well for large texts
📌 Disadvantages
- Collisions possible
- Worst-case can degrade to O(nm)
- Requires good choice of base and mod values
📘 Rabin–Karp Algorithm Steps
- Compute hash of pattern
P. - Compute hash for the first window of text
T[0:m]. - Slide window from left to right:
- Update hash using rolling hash
- If hash matches → verify characters
- Collect all matching positions.
📘 Python Implementation of Rabin–Karp Algorithm
This implementation uses a classical rolling hash technique.
def rabin_karp(text, pattern):
n = len(text)
m = len(pattern)
if m > n:
return []
base = 256 # total characters (ASCII)
mod = 101 # a prime number
# Initial hash values for pattern and first text window
hash_p = 0
hash_t = 0
# The value of (base^(m-1)) % mod used for rolling hash
h = pow(base, m - 1, mod)
# Compute initial hash values
for i in range(m):
hash_p = (hash_p * base + ord(pattern[i])) % mod
hash_t = (hash_t * base + ord(text[i])) % mod
positions = []
# Slide over the text
for i in range(n - m + 1):
# If hash values match, compare characters
if hash_p == hash_t:
if text[i:i + m] == pattern:
positions.append(i)
# Rolling hash: compute hash of next window
if i < n - m:
hash_t = (hash_t - ord(text[i]) * h) * base + ord(text[i + 1])
hash_t %= mod # ensure non-negative
return positions
# Example usage
text = "ABCCDDAEFG"
pattern = "CDD"
print("Matches found at positions:", rabin_karp(text, pattern))
📘 Example Output
Matches found at positions: [2]
📘 Rabin–Karp vs Other Algorithms
| Algorithm | Best Case | Worst Case | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(nm) | Simple but slow |
| KMP | O(n + m) | O(n + m) | Avoids re-scanning |
| Rabin–Karp | O(n + m) | O(nm) | Good for multiple patterns |
| Boyer–Moore | Sublinear | O(nm) | Fastest in practice |
📘 Summary (Exam Notes)
- Rabin–Karp is a string matching algorithm using hashing.
- Uses rolling hash for fast substring hash computation.
- Best/Average: O(n + m)
- Worst-case: O(nm) (due to collisions)
- Great for multiple pattern searching.
- Uses polynomial hash with base and modulo.
