Skip to content
Home » String-Matching

String-Matching


📘 String Matching (Pattern Matching)

String Matching (or Pattern Matching) is the process of finding one or more occurrences of a pattern string (P) inside a text string (T).

Example:

Text (T)    = "ABAAABCD"
Pattern (P) = "ABC"

Goal: Find if P appears in T, and at which positions.

String-matching algorithms are fundamental in:

  • Text editors (search feature)
  • DNA sequence matching
  • Information retrieval
  • Spell checkers
  • Compilers (lexical analysis)
  • Web search engines

📌 Basic Terminology

  • Text (T) → large string (length n)
  • Pattern (P) → string to be searched (length m)
  • Match → location where P fully appears in T
  • Shift → how far pattern moves on text after mismatch

📘 Types of String-Matching Algorithms

There are many algorithms, but the most important ones (exam focus) are:

  1. Naive (Brute Force) Algorithm
  2. Knuth–Morris–Pratt (KMP)
  3. Rabin–Karp Algorithm
  4. Boyer–Moore Algorithm

Below is an exam-level explanation of each.


1. Naive (Brute Force) String Matching

Idea:

Slide the pattern over text one position at a time and compare.

Example:

Check all:

T[0..m-1], T[1..m], T[2..m+1] ... T[n-m]

Time Complexity:

  • Worst case: O(n × m)
  • Best case: O(n)

Advantages:

  • Very simple
  • Works with any pattern

Disadvantages:

  • Very slow for large input

2. KMP Algorithm (Knuth–Morris–Pratt)

Most important in DSA exams.

Key Idea:

Use the LPS array (Longest Prefix Suffix) to avoid re-checking characters.

If mismatch occurs → jump pattern intelligently.

Time Complexity:

  • O(n + m)

Advantages:

  • Fast and guaranteed linear
  • No re-scanning of characters

Disadvantages:

  • Calculating LPS array is tricky

3. Rabin–Karp Algorithm

Key Idea:

Use hashing to compare strings:

  • Compute hash of pattern
  • Compute hash of every substring of text
  • If hashes match → compare actual characters to avoid false positives

Time Complexity:

  • Worst case: O(n × m)
  • Average case: O(n + m)
  • Best case: O(n + m)

Advantage:

  • Fast for multiple pattern matching
  • Good for plagiarism detection, web search

Disadvantage:

  • Hash collisions possible

4. Boyer–Moore Algorithm

Idea:

Match from right to left and use two heuristics:

  1. Bad Character Heuristic
  2. Good Suffix Heuristic

It skips large parts of the text.

Time Complexity:

  • Best case: O(n / m) (extremely fast)
  • Worst case: O(n × m)
  • Average: sublinear performance

📌 Python Example — Simple Naive String Matching

def naive_string_match(text, pattern):
    n = len(text)
    m = len(pattern)

    positions = []
    for i in range(n - m + 1):
        match = True
        for j in range(m):
            if text[i + j] != pattern[j]:
                match = False
                break
        if match:
            positions.append(i)

    return positions


# Example usage
text = "ABAAABCD"
pattern = "ABC"
print("Matches at:", naive_string_match(text, pattern))

📘 Example Output

Matches at: [4]

📘 Applications of String Matching

  • Bioinformatics (gene sequence search)
  • Search engines
  • Pattern detection in log files
  • Spell checking
  • Lexical analysis in compilers
  • Word processors (Find & Replace)

📘 Summary (Exam Notes)

  • String matching finds occurrences of a pattern in a text
  • Naive method: O(n·m)
  • KMP: uses prefix function → O(n + m)
  • Rabin–Karp: hashing-based
  • Boyer–Moore: right-to-left scanning with big jumps
  • Applications: search engines, DNA matching, text processing