Skip to content
Home » Knuth-Morris-Pratt (KMP) algorithm

Knuth-Morris-Pratt (KMP) algorithm

Below is a clear, step-by-step, exam-oriented explanation of the Knuth–Morris–Pratt (KMP) Algorithm, written exactly like a Data Structures & Algorithms expert. This includes intuition, the LPS table, algorithm flow, complexity, and code in Python.


📘 Knuth–Morris–Pratt (KMP) Algorithm

The KMP algorithm is a linear-time string matching algorithm used to find occurrences of a pattern (P) in a text (T).

KMP avoids re-checking characters by using a preprocessed table called LPS (Longest Prefix Suffix).


📌 Why Do We Need KMP?

The naive string-matching algorithm may take:

  • Worst case = O(n × m) (very slow)

KMP improves this by ensuring:

No character is compared twice
✔ Matching is done in O(n + m) time
✔ Efficient for long texts and patterns


📘 The Core Idea of KMP

KMP uses pattern information to skip unnecessary comparisons.

When a mismatch happens:

  • Instead of moving the pattern one step,
  • KMP jumps based on the LPS array.

This avoids re-checking characters that we already matched.


📘 LPS Array (Longest Prefix Suffix)

The heart of KMP

For a pattern P, LPS[i] = length of the longest proper prefix which is also a suffix in P[0..i].

Proper means prefix ≠ the entire string.

Example Pattern:

P = "ABABCABAB"

LPS array:

Index: 0 1 2 3 4 5 6 7 8
Char : A B A B C A B A B
LPS  : 0 0 1 2 0 1 2 3 4

This array tells us:

  • If mismatch at index 8, skip to LPS[7] = 3
  • This avoids unnecessary re-checking

📘 Steps of KMP Algorithm

Step 1: Build LPS array

This preprocesses the pattern.

Step 2: Search pattern in text

Compare text and pattern:

  • On match → move both pointers
  • On mismatch → use LPS array to decide next shift

📘 KMP Time Complexity

PartTime
LPS computationO(m)
MatchingO(n)
TotalO(n + m)

✔ Always linear
✔ No worst-case degradation
✔ Faster than brute force and often faster than Rabin–Karp


📘 Python Code — KMP Algorithm

def compute_lps(pattern):
    lps = [0] * len(pattern)
    length = 0        # length of previous longest prefix suffix
    i = 1

    while i < len(pattern):
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                # try next longest prefix
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1

    return lps


def kmp_search(text, pattern):
    n, m = len(text), len(pattern)
    
    if m == 0:
        return []

    lps = compute_lps(pattern)
    i = j = 0   # pointers for text and pattern
    positions = []

    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1
        
        if j == m:
            positions.append(i - j)  # match found
            j = lps[j - 1]           # continue searching
        
        elif i < n and text[i] != pattern[j]:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1

    return positions


# Example:
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print("Pattern found at positions:", kmp_search(text, pattern))

📘 Expected Output

Pattern found at positions: [10]

📌 Example Illustration

Text:

ABABDABACDABABCABAB

Pattern:

ABABCABAB

KMP matches quickly by skipping unnecessary comparisons by using the LPS table.


📘 Advantages of KMP

✔ Linear time complexity
✔ No backtracking in the text
✔ Efficient for huge texts
✔ Works for repeated patterns
✔ Used in search engines, compilers, bioinformatics


📘 Summary (Exam Notes)

  • KMP is a fast pattern-matching algorithm
  • Uses LPS array to avoid redundant comparisons
  • Time = O(n + m)
  • LPS[i] gives the length of longest prefix which is also a suffix
  • KMP never re-checks characters in the text
  • Very important algorithm in DSA & exams