Skip to content
Home » Rabin Karp algorithm

Rabin Karp algorithm

Below is a clear, complete, exam-oriented explanation of the Rabin–Karp Algorithm, written exactly like a Data Structures & Algorithms expert, including an intuitive understanding, mathematical explanation, and Python code.


📘 Rabin–Karp Algorithm (String Matching using Hashing)

The Rabin–Karp algorithm is a string-matching algorithm that uses hashing to find occurrences of a pattern string P inside a text string T.

Instead of comparing characters one by one at each position (like brute force), Rabin–Karp compares hash values.


📌 Key Idea

  1. Compute hash of the pattern.
  2. Compute hash of the first m characters of the text.
  3. Slide the pattern over the text by one position:
    • Quickly compute the next substring hash using rolling hash.
  4. If hashes match → compare characters to avoid false collisions.
  5. Continue until end of text.

Hashing makes the algorithm very efficient for searching.


📘 When Rabin–Karp Is Useful

  • Multiple pattern searching
  • Searching in large text files
  • Plagiarism detection
  • Bioinformatics (DNA search)
  • Searching for multiple keywords efficiently
  • When hashing is faster than character comparison

📘 Rolling Hash Concept (Important)

Given substring:

[
T[i \dots i+m-1]
]

Rolling hash allows computing:

[
T[i+1 \dots i+m]
]

without recomputing from scratch.

This makes sliding the pattern fast.


📘 Hash Formula (Polynomial Rolling Hash)

Let:

  • Base = d (number of possible characters, e.g., 256 for ASCII)
  • Mod = q (a large prime number)

For a substring of length m:

[
H = (T[i] \cdot d^{m-1} + T[i+1] \cdot d^{m-2} + \dots + T[i+m-1]) \mod q
]


📘 Time Complexity

CaseTime
Best / AverageO(n + m)
Worst (many collisions)O(nm)

Worst-case occurs when many hash collisions happen, but choosing good q minimizes this.


📘 Space Complexity

  • O(1) auxiliary space
  • O(m) for storing hash / pattern

📌 Advantages

  • Very fast for multiple patterns
  • Efficient average-case
  • Rolling hash avoids recomputation
  • Works well for large texts

📌 Disadvantages

  • Collisions possible
  • Worst-case can degrade to O(nm)
  • Requires good choice of base and mod values

📘 Rabin–Karp Algorithm Steps

  1. Compute hash of pattern P.
  2. Compute hash for the first window of text T[0:m].
  3. Slide window from left to right:
    • Update hash using rolling hash
    • If hash matches → verify characters
  4. Collect all matching positions.

📘 Python Implementation of Rabin–Karp Algorithm

This implementation uses a classical rolling hash technique.

def rabin_karp(text, pattern):
    n = len(text)
    m = len(pattern)

    if m > n:
        return []

    base = 256            # total characters (ASCII)
    mod = 101             # a prime number

    # Initial hash values for pattern and first text window
    hash_p = 0
    hash_t = 0

    # The value of (base^(m-1)) % mod used for rolling hash
    h = pow(base, m - 1, mod)

    # Compute initial hash values
    for i in range(m):
        hash_p = (hash_p * base + ord(pattern[i])) % mod
        hash_t = (hash_t * base + ord(text[i])) % mod

    positions = []

    # Slide over the text
    for i in range(n - m + 1):
        # If hash values match, compare characters
        if hash_p == hash_t:
            if text[i:i + m] == pattern:
                positions.append(i)

        # Rolling hash: compute hash of next window
        if i < n - m:
            hash_t = (hash_t - ord(text[i]) * h) * base + ord(text[i + 1])
            hash_t %= mod  # ensure non-negative

    return positions


# Example usage
text = "ABCCDDAEFG"
pattern = "CDD"
print("Matches found at positions:", rabin_karp(text, pattern))

📘 Example Output

Matches found at positions: [2]

📘 Rabin–Karp vs Other Algorithms

AlgorithmBest CaseWorst CaseNotes
Brute ForceO(n)O(nm)Simple but slow
KMPO(n + m)O(n + m)Avoids re-scanning
Rabin–KarpO(n + m)O(nm)Good for multiple patterns
Boyer–MooreSublinearO(nm)Fastest in practice

📘 Summary (Exam Notes)

  • Rabin–Karp is a string matching algorithm using hashing.
  • Uses rolling hash for fast substring hash computation.
  • Best/Average: O(n + m)
  • Worst-case: O(nm) (due to collisions)
  • Great for multiple pattern searching.
  • Uses polynomial hash with base and modulo.