📘 String Matching (Pattern Matching)

1️⃣ Introduction

String Matching is the problem of finding the occurrence(s) of a pattern string within a text string.

🔹 Definition

Given:

A text string T of length n
A pattern string P of length m

Objective:

Find all positions where P occurs in T

2️⃣ Applications of String Matching

Text editors (find/replace)
Search engines
DNA sequence analysis
Plagiarism detection
Compiler design

3️⃣ Types of String Matching

Exact Matching → Pattern must match exactly
Approximate Matching → Allows mismatches

4️⃣ Naive String Matching Algorithm

🔹 Idea

Check the pattern at every possible position in the text.

🔹 Algorithm (Naive Approach)

NaiveStringMatch(T, P):
   n = length(T)
   m = length(P)

   for i = 0 to n-m:
       j = 0
       while j < m and T[i+j] == P[j]:
           j++

       if j == m:
           print "Pattern found at index", i

🔹 Example

Text:

T = "AABAACAADAABAABA"

Pattern:

P = "AABA"

Matches at indices:

0, 9, 12

🔹 Time Complexity

Case	Complexity
Best Case	O(n)
Worst Case	O(nm)

5️⃣ Efficient String Matching Algorithms

🔹 1. Knuth-Morris-Pratt (KMP) Algorithm

🔸 Idea

Avoid unnecessary comparisons
Use Longest Prefix Suffix (LPS) array

🔸 Steps

Preprocess pattern → build LPS array
Use LPS to skip comparisons

🔸 Time Complexity

[
O(n + m)
]

🔹 2. Rabin-Karp Algorithm

🔸 Idea

Use hashing to compare strings
Compare hash values instead of characters

🔸 Steps

Compute hash of pattern
Compute hash of text substrings
Compare hashes

🔸 Time Complexity

Average: O(n + m)
Worst: O(nm) (due to collisions)

🔹 3. Boyer-Moore Algorithm

🔸 Idea

Compare from right to left
Skip large portions using heuristics

🔸 Techniques

Bad character rule
Good suffix rule

🔸 Time Complexity

Best: O(n/m)
Worst: O(nm)

6️⃣ Comparison of String Matching Algorithms

Algorithm	Time Complexity	Technique
Naive	O(nm)	Brute Force
KMP	O(n + m)	Prefix function
Rabin-Karp	O(n + m) avg	Hashing
Boyer-Moore	O(n/m) best	Heuristics

7️⃣ Key Concepts

🔹 Prefix

Beginning part of string
Example: “AB” is prefix of “ABCD”

🔹 Suffix

Ending part of string
Example: “CD” is suffix of “ABCD”

8️⃣ Advantages of Efficient Algorithms

✔ Reduce redundant comparisons
✔ Faster searching
✔ Suitable for large texts

9️⃣ Limitations

✖ Complex implementation (KMP, BM)
✖ Hash collisions (Rabin-Karp)

🔚 Conclusion

String Matching is a fundamental problem in computer science, with multiple algorithms designed to improve efficiency over the naive approach.

Efficient algorithms like KMP and Rabin-Karp significantly reduce time complexity compared to brute force methods.

📌 Exam Tip

👉 Always include:

Problem definition
Naive algorithm
One efficient algorithm (KMP preferred)
Time complexity comparison