๐ String Matching (Pattern Matching)
1๏ธโฃ Introduction
String Matching is the problem of finding the occurrence(s) of a pattern string within a text string.
๐น Definition
Given:
- A text string
Tof length n - A pattern string
Pof length m
Objective:
Find all positions where P occurs in T
2๏ธโฃ Applications of String Matching
- Text editors (find/replace)
- Search engines
- DNA sequence analysis
- Plagiarism detection
- Compiler design
3๏ธโฃ Types of String Matching
- Exact Matching โ Pattern must match exactly
- Approximate Matching โ Allows mismatches
4๏ธโฃ Naive String Matching Algorithm
๐น Idea
Check the pattern at every possible position in the text.
๐น Algorithm (Naive Approach)
NaiveStringMatch(T, P):
n = length(T)
m = length(P)
for i = 0 to n-m:
j = 0
while j < m and T[i+j] == P[j]:
j++
if j == m:
print "Pattern found at index", i
๐น Example
Text:
T = "AABAACAADAABAABA"
Pattern:
P = "AABA"
Matches at indices:
0, 9, 12
๐น Time Complexity
| Case | Complexity |
|---|---|
| Best Case | O(n) |
| Worst Case | O(nm) |
5๏ธโฃ Efficient String Matching Algorithms
๐น 1. Knuth-Morris-Pratt (KMP) Algorithm
๐ธ Idea
- Avoid unnecessary comparisons
- Use Longest Prefix Suffix (LPS) array
๐ธ Steps
- Preprocess pattern โ build LPS array
- Use LPS to skip comparisons
๐ธ Time Complexity
[
O(n + m)
]
๐น 2. Rabin-Karp Algorithm
๐ธ Idea
- Use hashing to compare strings
- Compare hash values instead of characters
๐ธ Steps
- Compute hash of pattern
- Compute hash of text substrings
- Compare hashes
๐ธ Time Complexity
- Average: O(n + m)
- Worst: O(nm) (due to collisions)
๐น 3. Boyer-Moore Algorithm
๐ธ Idea
- Compare from right to left
- Skip large portions using heuristics
๐ธ Techniques
- Bad character rule
- Good suffix rule
๐ธ Time Complexity
- Best: O(n/m)
- Worst: O(nm)
6๏ธโฃ Comparison of String Matching Algorithms
| Algorithm | Time Complexity | Technique |
|---|---|---|
| Naive | O(nm) | Brute Force |
| KMP | O(n + m) | Prefix function |
| Rabin-Karp | O(n + m) avg | Hashing |
| Boyer-Moore | O(n/m) best | Heuristics |
7๏ธโฃ Key Concepts
๐น Prefix
Beginning part of string
Example: “AB” is prefix of “ABCD”
๐น Suffix
Ending part of string
Example: “CD” is suffix of “ABCD”
8๏ธโฃ Advantages of Efficient Algorithms
โ Reduce redundant comparisons
โ Faster searching
โ Suitable for large texts
9๏ธโฃ Limitations
โ Complex implementation (KMP, BM)
โ Hash collisions (Rabin-Karp)
๐ Conclusion
String Matching is a fundamental problem in computer science, with multiple algorithms designed to improve efficiency over the naive approach.
Efficient algorithms like KMP and Rabin-Karp significantly reduce time complexity compared to brute force methods.
๐ Exam Tip
๐ Always include:
- Problem definition
- Naive algorithm
- One efficient algorithm (KMP preferred)
- Time complexity comparison
