Below is a clear, complete, exam-oriented explanation of the Boyer–Moore Algorithm, written like a DSA expert. This includes intuition, heuristics, complexity, and workflow.
📘 Boyer–Moore String Matching Algorithm
The Boyer–Moore algorithm is one of the fastest string-matching algorithms in practice. It works by comparing characters from right to left and uses two powerful heuristics to skip large sections of the text.
In many real-world cases (natural language text), Boyer–Moore runs in sublinear time, faster than KMP and Rabin–Karp.
📌 Key Ideas of Boyer–Moore
- Pattern is matched from right to left.
- When mismatch occurs, the pattern is shifted intelligently, not by just 1.
- Two heuristics determine how far the pattern should be shifted:
- Bad Character Rule
- Good Suffix Rule
These allow big jumps → fewer comparisons → fast matching.
📘 Heuristic 1: Bad Character Rule
When mismatch occurs at pattern character P[j] and text character T[i]:
- Shift pattern so that:
- The mismatched text character aligns with its last occurrence in the pattern
- OR skip past it if it doesn’t exist in the pattern
Example
Pattern: “ABCD”
Mismatch at ‘X’ (not in pattern)
Shift pattern entirely to the right because ‘X’ doesn’t exist.
📘 Heuristic 2: Good Suffix Rule
When mismatch happens after some suffix matched:
- Shift pattern to align the next occurrence of that suffix in the pattern
OR - Shift so that the suffix aligns with a prefix of the pattern
This allows even larger skips.
📘 Why Boyer–Moore Is Fast
- It skips many unnecessary character comparisons
- In English text, average complexity becomes O(n / m)
- Often faster than KMP and Rabin-Karp in practice
- Most efficient for large alphabets (ASCII, Unicode, text documents)
📘 Time Complexity
| Case | Time |
|---|---|
| Best Case | O(n/m) (almost constant time per shift) |
| Average Case | O(n) |
| Worst Case | O(n × m) |
Worst case happens rarely (repetitive patterns like “AAAAAA”).
📘 Space Complexity
- O(m) for storing bad character table
- O(m) for good suffix table
📘 Boyer–Moore Matching Steps (High-Level)
- Preprocess pattern to build:
- Bad character table
- Good suffix table
- Align pattern to text
- Compare pattern characters from right to left
- On mismatch:
- Compute shift from bad character rule
- Compute shift from good suffix rule
- Choose the maximum shift
- Repeat until pattern passes the text end
- If match found, report position
📘 Simplified Python Implementation (Using Bad Character Rule Only)
(For exam demonstration. Full BM is longer.)
def bad_character_table(pattern):
table = {}
m = len(pattern)
for i in range(m):
table[pattern[i]] = i
return table
def boyer_moore(text, pattern):
n, m = len(text), len(pattern)
if m == 0:
return []
bad_char = bad_character_table(pattern)
positions = []
shift = 0
while shift <= n - m:
j = m - 1
# Compare pattern from right to left
while j >= 0 and pattern[j] == text[shift + j]:
j -= 1
if j < 0: # full match
positions.append(shift)
shift += m - bad_char.get(text[shift + m], -1) if shift + m < n else 1
else:
bad_char_shift = j - bad_char.get(text[shift + j], -1)
shift += max(1, bad_char_shift)
return positions
# Example usage
text = "ABAAABCD"
pattern = "ABC"
print("Pattern found at positions:", boyer_moore(text, pattern))
📘 Output
Pattern found at positions: [4]
📘 Comparison with Other Algorithms
| Algorithm | Best | Average | Worst | Notes |
|---|---|---|---|---|
| Brute Force | O(n) | O(nm) | O(nm) | Very slow |
| KMP | O(n + m) | O(n + m) | O(n + m) | No rechecking |
| Rabin–Karp | O(n + m) | O(n + m) | O(nm) | Uses hashing |
| Boyer–Moore | O(n/m) | O(n) | O(nm) | Fastest in practice |
📘 Summary (Exam Notes)
- Boyer–Moore compares pattern from right to left.
- Uses Bad Character and Good Suffix heuristics.
- Gives sublinear running time in many cases.
- Best-case performance: O(n / m) — extremely fast.
- Worst-case: O(n × m) but rare.
- Popular in text editors, document search, grep command, etc.
