Below is a clear, complete, exam-oriented explanation of the Rabin–Karp Algorithm, written exactly like a Data Structures & Algorithms expert, including an intuitive understanding, mathematical explanation, and Python code.

📘 Rabin–Karp Algorithm (String Matching using Hashing)

The Rabin–Karp algorithm is a string-matching algorithm that uses hashing to find occurrences of a pattern string P inside a text string T.

Instead of comparing characters one by one at each position (like brute force), Rabin–Karp compares hash values.

📌 Key Idea

Compute hash of the pattern.
Compute hash of the first m characters of the text.
Slide the pattern over the text by one position:
- Quickly compute the next substring hash using rolling hash.
If hashes match → compare characters to avoid false collisions.
Continue until end of text.

Hashing makes the algorithm very efficient for searching.

📘 When Rabin–Karp Is Useful

Multiple pattern searching
Searching in large text files
Plagiarism detection
Bioinformatics (DNA search)
Searching for multiple keywords efficiently
When hashing is faster than character comparison

📘 Rolling Hash Concept (Important)

Given substring:

[
T[i \dots i+m-1]
]

Rolling hash allows computing:

[
T[i+1 \dots i+m]
]

without recomputing from scratch.

This makes sliding the pattern fast.

📘 Hash Formula (Polynomial Rolling Hash)

Let:

Base = d (number of possible characters, e.g., 256 for ASCII)
Mod = q (a large prime number)

For a substring of length m:

[
H = (T[i] \cdot d^{m-1} + T[i+1] \cdot d^{m-2} + \dots + T[i+m-1]) \mod q
]

📘 Time Complexity

Case	Time
Best / Average	O(n + m)
Worst (many collisions)	O(nm)

Worst-case occurs when many hash collisions happen, but choosing good q minimizes this.

📘 Space Complexity

O(1) auxiliary space
O(m) for storing hash / pattern

📌 Advantages

Very fast for multiple patterns
Efficient average-case
Rolling hash avoids recomputation
Works well for large texts

📌 Disadvantages

Collisions possible
Worst-case can degrade to O(nm)
Requires good choice of base and mod values

📘 Rabin–Karp Algorithm Steps

Compute hash of pattern P.
Compute hash for the first window of text T[0:m].
Slide window from left to right:
- Update hash using rolling hash
- If hash matches → verify characters
Collect all matching positions.

📘 Python Implementation of Rabin–Karp Algorithm

This implementation uses a classical rolling hash technique.

def rabin_karp(text, pattern):
    n = len(text)
    m = len(pattern)

    if m > n:
        return []

    base = 256            # total characters (ASCII)
    mod = 101             # a prime number

    # Initial hash values for pattern and first text window
    hash_p = 0
    hash_t = 0

    # The value of (base^(m-1)) % mod used for rolling hash
    h = pow(base, m - 1, mod)

    # Compute initial hash values
    for i in range(m):
        hash_p = (hash_p * base + ord(pattern[i])) % mod
        hash_t = (hash_t * base + ord(text[i])) % mod

    positions = []

    # Slide over the text
    for i in range(n - m + 1):
        # If hash values match, compare characters
        if hash_p == hash_t:
            if text[i:i + m] == pattern:
                positions.append(i)

        # Rolling hash: compute hash of next window
        if i < n - m:
            hash_t = (hash_t - ord(text[i]) * h) * base + ord(text[i + 1])
            hash_t %= mod  # ensure non-negative

    return positions


# Example usage
text = "ABCCDDAEFG"
pattern = "CDD"
print("Matches found at positions:", rabin_karp(text, pattern))

📘 Example Output

Matches found at positions: [2]

📘 Rabin–Karp vs Other Algorithms

Algorithm	Best Case	Worst Case	Notes
Brute Force	O(n)	O(nm)	Simple but slow
KMP	O(n + m)	O(n + m)	Avoids re-scanning
Rabin–Karp	O(n + m)	O(nm)	Good for multiple patterns
Boyer–Moore	Sublinear	O(nm)	Fastest in practice

📘 Summary (Exam Notes)

Rabin–Karp is a string matching algorithm using hashing.
Uses rolling hash for fast substring hash computation.
Best/Average: O(n + m)
Worst-case: O(nm) (due to collisions)
Great for multiple pattern searching.
Uses polynomial hash with base and modulo.

Rabin Karp algorithm