📘 String Matching (Pattern Matching)

String Matching (or Pattern Matching) is the process of finding one or more occurrences of a pattern string (P) inside a text string (T).

Example:

Text (T)    = "ABAAABCD"
Pattern (P) = "ABC"

Goal: Find if P appears in T, and at which positions.

String-matching algorithms are fundamental in:

Text editors (search feature)
DNA sequence matching
Information retrieval
Spell checkers
Compilers (lexical analysis)
Web search engines

📌 Basic Terminology

Text (T) → large string (length n)
Pattern (P) → string to be searched (length m)
Match → location where P fully appears in T
Shift → how far pattern moves on text after mismatch

📘 Types of String-Matching Algorithms

There are many algorithms, but the most important ones (exam focus) are:

Naive (Brute Force) Algorithm
Knuth–Morris–Pratt (KMP)
Rabin–Karp Algorithm
Boyer–Moore Algorithm

Below is an exam-level explanation of each.

1. Naive (Brute Force) String Matching

Idea:

Slide the pattern over text one position at a time and compare.

Example:

Check all:

T[0..m-1], T[1..m], T[2..m+1] ... T[n-m]

Time Complexity:

Worst case: O(n × m)
Best case: O(n)

Advantages:

Very simple
Works with any pattern

Disadvantages:

Very slow for large input

2. KMP Algorithm (Knuth–Morris–Pratt)

Most important in DSA exams.

Key Idea:

Use the LPS array (Longest Prefix Suffix) to avoid re-checking characters.

If mismatch occurs → jump pattern intelligently.

Time Complexity:

O(n + m)

Advantages:

Fast and guaranteed linear
No re-scanning of characters

Disadvantages:

Calculating LPS array is tricky

3. Rabin–Karp Algorithm

Key Idea:

Use hashing to compare strings:

Compute hash of pattern
Compute hash of every substring of text
If hashes match → compare actual characters to avoid false positives

Time Complexity:

Worst case: O(n × m)
Average case: O(n + m)
Best case: O(n + m)

Advantage:

Fast for multiple pattern matching
Good for plagiarism detection, web search

Disadvantage:

Hash collisions possible

4. Boyer–Moore Algorithm

Idea:

Match from right to left and use two heuristics:

Bad Character Heuristic
Good Suffix Heuristic

It skips large parts of the text.

Time Complexity:

Best case: O(n / m) (extremely fast)
Worst case: O(n × m)
Average: sublinear performance

📌 Python Example — Simple Naive String Matching

def naive_string_match(text, pattern):
    n = len(text)
    m = len(pattern)

    positions = []
    for i in range(n - m + 1):
        match = True
        for j in range(m):
            if text[i + j] != pattern[j]:
                match = False
                break
        if match:
            positions.append(i)

    return positions


# Example usage
text = "ABAAABCD"
pattern = "ABC"
print("Matches at:", naive_string_match(text, pattern))

📘 Example Output

Matches at: [4]

📘 Applications of String Matching

Bioinformatics (gene sequence search)
Search engines
Pattern detection in log files
Spell checking
Lexical analysis in compilers
Word processors (Find & Replace)

📘 Summary (Exam Notes)

String matching finds occurrences of a pattern in a text
Naive method: O(n·m)
KMP: uses prefix function → O(n + m)
Rabin–Karp: hashing-based
Boyer–Moore: right-to-left scanning with big jumps
Applications: search engines, DNA matching, text processing