Below is a clear, step-by-step, exam-oriented explanation of the Knuth–Morris–Pratt (KMP) Algorithm, written exactly like a Data Structures & Algorithms expert. This includes intuition, the LPS table, algorithm flow, complexity, and code in Python.
📘 Knuth–Morris–Pratt (KMP) Algorithm
The KMP algorithm is a linear-time string matching algorithm used to find occurrences of a pattern (P) in a text (T).
KMP avoids re-checking characters by using a preprocessed table called LPS (Longest Prefix Suffix).
📌 Why Do We Need KMP?
The naive string-matching algorithm may take:
- Worst case = O(n × m) (very slow)
KMP improves this by ensuring:
✔ No character is compared twice
✔ Matching is done in O(n + m) time
✔ Efficient for long texts and patterns
📘 The Core Idea of KMP
KMP uses pattern information to skip unnecessary comparisons.
When a mismatch happens:
- Instead of moving the pattern one step,
- KMP jumps based on the LPS array.
This avoids re-checking characters that we already matched.
📘 LPS Array (Longest Prefix Suffix)
The heart of KMP
For a pattern P, LPS[i] = length of the longest proper prefix which is also a suffix in P[0..i].
Proper means prefix ≠ the entire string.
Example Pattern:
P = "ABABCABAB"
LPS array:
Index: 0 1 2 3 4 5 6 7 8
Char : A B A B C A B A B
LPS : 0 0 1 2 0 1 2 3 4
This array tells us:
- If mismatch at index 8, skip to LPS[7] = 3
- This avoids unnecessary re-checking
📘 Steps of KMP Algorithm
Step 1: Build LPS array
This preprocesses the pattern.
Step 2: Search pattern in text
Compare text and pattern:
- On match → move both pointers
- On mismatch → use LPS array to decide next shift
📘 KMP Time Complexity
| Part | Time |
|---|---|
| LPS computation | O(m) |
| Matching | O(n) |
| Total | O(n + m) |
✔ Always linear
✔ No worst-case degradation
✔ Faster than brute force and often faster than Rabin–Karp
📘 Python Code — KMP Algorithm
def compute_lps(pattern):
lps = [0] * len(pattern)
length = 0 # length of previous longest prefix suffix
i = 1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
# try next longest prefix
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
def kmp_search(text, pattern):
n, m = len(text), len(pattern)
if m == 0:
return []
lps = compute_lps(pattern)
i = j = 0 # pointers for text and pattern
positions = []
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
positions.append(i - j) # match found
j = lps[j - 1] # continue searching
elif i < n and text[i] != pattern[j]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return positions
# Example:
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print("Pattern found at positions:", kmp_search(text, pattern))
📘 Expected Output
Pattern found at positions: [10]
📌 Example Illustration
Text:
ABABDABACDABABCABAB
Pattern:
ABABCABAB
KMP matches quickly by skipping unnecessary comparisons by using the LPS table.
📘 Advantages of KMP
✔ Linear time complexity
✔ No backtracking in the text
✔ Efficient for huge texts
✔ Works for repeated patterns
✔ Used in search engines, compilers, bioinformatics
📘 Summary (Exam Notes)
- KMP is a fast pattern-matching algorithm
- Uses LPS array to avoid redundant comparisons
- Time = O(n + m)
- LPS[i] gives the length of longest prefix which is also a suffix
- KMP never re-checks characters in the text
- Very important algorithm in DSA & exams
