Skip to content
Home » Boyer–Moore algorithm

Boyer–Moore algorithm

Below is a clear, complete, exam-oriented explanation of the Boyer–Moore Algorithm, written like a DSA expert. This includes intuition, heuristics, complexity, and workflow.


📘 Boyer–Moore String Matching Algorithm

The Boyer–Moore algorithm is one of the fastest string-matching algorithms in practice. It works by comparing characters from right to left and uses two powerful heuristics to skip large sections of the text.

In many real-world cases (natural language text), Boyer–Moore runs in sublinear time, faster than KMP and Rabin–Karp.


📌 Key Ideas of Boyer–Moore

  1. Pattern is matched from right to left.
  2. When mismatch occurs, the pattern is shifted intelligently, not by just 1.
  3. Two heuristics determine how far the pattern should be shifted:
    • Bad Character Rule
    • Good Suffix Rule

These allow big jumps → fewer comparisons → fast matching.


📘 Heuristic 1: Bad Character Rule

When mismatch occurs at pattern character P[j] and text character T[i]:

  • Shift pattern so that:
    • The mismatched text character aligns with its last occurrence in the pattern
    • OR skip past it if it doesn’t exist in the pattern

Example

Pattern: “ABCD”
Mismatch at ‘X’ (not in pattern)

Shift pattern entirely to the right because ‘X’ doesn’t exist.


📘 Heuristic 2: Good Suffix Rule

When mismatch happens after some suffix matched:

  • Shift pattern to align the next occurrence of that suffix in the pattern
    OR
  • Shift so that the suffix aligns with a prefix of the pattern

This allows even larger skips.


📘 Why Boyer–Moore Is Fast

  • It skips many unnecessary character comparisons
  • In English text, average complexity becomes O(n / m)
  • Often faster than KMP and Rabin-Karp in practice
  • Most efficient for large alphabets (ASCII, Unicode, text documents)

📘 Time Complexity

CaseTime
Best CaseO(n/m) (almost constant time per shift)
Average CaseO(n)
Worst CaseO(n × m)

Worst case happens rarely (repetitive patterns like “AAAAAA”).


📘 Space Complexity

  • O(m) for storing bad character table
  • O(m) for good suffix table

📘 Boyer–Moore Matching Steps (High-Level)

  1. Preprocess pattern to build:
    • Bad character table
    • Good suffix table
  2. Align pattern to text
  3. Compare pattern characters from right to left
  4. On mismatch:
    • Compute shift from bad character rule
    • Compute shift from good suffix rule
    • Choose the maximum shift
  5. Repeat until pattern passes the text end
  6. If match found, report position

📘 Simplified Python Implementation (Using Bad Character Rule Only)

(For exam demonstration. Full BM is longer.)

def bad_character_table(pattern):
    table = {}
    m = len(pattern)
    for i in range(m):
        table[pattern[i]] = i
    return table


def boyer_moore(text, pattern):
    n, m = len(text), len(pattern)
    if m == 0:
        return []

    bad_char = bad_character_table(pattern)
    positions = []

    shift = 0
    while shift <= n - m:
        j = m - 1

        # Compare pattern from right to left
        while j >= 0 and pattern[j] == text[shift + j]:
            j -= 1

        if j < 0:  # full match
            positions.append(shift)
            shift += m - bad_char.get(text[shift + m], -1) if shift + m < n else 1
        else:
            bad_char_shift = j - bad_char.get(text[shift + j], -1)
            shift += max(1, bad_char_shift)

    return positions


# Example usage
text = "ABAAABCD"
pattern = "ABC"
print("Pattern found at positions:", boyer_moore(text, pattern))

📘 Output

Pattern found at positions: [4]

📘 Comparison with Other Algorithms

AlgorithmBestAverageWorstNotes
Brute ForceO(n)O(nm)O(nm)Very slow
KMPO(n + m)O(n + m)O(n + m)No rechecking
Rabin–KarpO(n + m)O(n + m)O(nm)Uses hashing
Boyer–MooreO(n/m)O(n)O(nm)Fastest in practice

📘 Summary (Exam Notes)

  • Boyer–Moore compares pattern from right to left.
  • Uses Bad Character and Good Suffix heuristics.
  • Gives sublinear running time in many cases.
  • Best-case performance: O(n / m) — extremely fast.
  • Worst-case: O(n × m) but rare.
  • Popular in text editors, document search, grep command, etc.