📘 String Matching (Pattern Matching)
String Matching (or Pattern Matching) is the process of finding one or more occurrences of a pattern string (P) inside a text string (T).
Example:
Text (T) = "ABAAABCD"
Pattern (P) = "ABC"
Goal: Find if P appears in T, and at which positions.
String-matching algorithms are fundamental in:
- Text editors (search feature)
- DNA sequence matching
- Information retrieval
- Spell checkers
- Compilers (lexical analysis)
- Web search engines
📌 Basic Terminology
- Text (T) → large string (length n)
- Pattern (P) → string to be searched (length m)
- Match → location where P fully appears in T
- Shift → how far pattern moves on text after mismatch
📘 Types of String-Matching Algorithms
There are many algorithms, but the most important ones (exam focus) are:
- Naive (Brute Force) Algorithm
- Knuth–Morris–Pratt (KMP)
- Rabin–Karp Algorithm
- Boyer–Moore Algorithm
Below is an exam-level explanation of each.
1. Naive (Brute Force) String Matching
Idea:
Slide the pattern over text one position at a time and compare.
Example:
Check all:
T[0..m-1], T[1..m], T[2..m+1] ... T[n-m]
Time Complexity:
- Worst case: O(n × m)
- Best case: O(n)
Advantages:
- Very simple
- Works with any pattern
Disadvantages:
- Very slow for large input
2. KMP Algorithm (Knuth–Morris–Pratt)
Most important in DSA exams.
Key Idea:
Use the LPS array (Longest Prefix Suffix) to avoid re-checking characters.
If mismatch occurs → jump pattern intelligently.
Time Complexity:
- O(n + m)
Advantages:
- Fast and guaranteed linear
- No re-scanning of characters
Disadvantages:
- Calculating LPS array is tricky
3. Rabin–Karp Algorithm
Key Idea:
Use hashing to compare strings:
- Compute hash of pattern
- Compute hash of every substring of text
- If hashes match → compare actual characters to avoid false positives
Time Complexity:
- Worst case: O(n × m)
- Average case: O(n + m)
- Best case: O(n + m)
Advantage:
- Fast for multiple pattern matching
- Good for plagiarism detection, web search
Disadvantage:
- Hash collisions possible
4. Boyer–Moore Algorithm
Idea:
Match from right to left and use two heuristics:
- Bad Character Heuristic
- Good Suffix Heuristic
It skips large parts of the text.
Time Complexity:
- Best case: O(n / m) (extremely fast)
- Worst case: O(n × m)
- Average: sublinear performance
📌 Python Example — Simple Naive String Matching
def naive_string_match(text, pattern):
n = len(text)
m = len(pattern)
positions = []
for i in range(n - m + 1):
match = True
for j in range(m):
if text[i + j] != pattern[j]:
match = False
break
if match:
positions.append(i)
return positions
# Example usage
text = "ABAAABCD"
pattern = "ABC"
print("Matches at:", naive_string_match(text, pattern))
📘 Example Output
Matches at: [4]
📘 Applications of String Matching
- Bioinformatics (gene sequence search)
- Search engines
- Pattern detection in log files
- Spell checking
- Lexical analysis in compilers
- Word processors (Find & Replace)
📘 Summary (Exam Notes)
- String matching finds occurrences of a pattern in a text
- Naive method: O(n·m)
- KMP: uses prefix function → O(n + m)
- Rabin–Karp: hashing-based
- Boyer–Moore: right-to-left scanning with big jumps
- Applications: search engines, DNA matching, text processing
