⭐ String — Detailed Explanation (Data Structure & Theory)
A String is a sequence of characters stored in contiguous memory.
Characters may include:
- Letters (A–Z, a–z)
- Digits (0–9)
- Symbols (!,@,#,$, etc.)
- Spaces, punctuation
- Unicode characters (emoji, regional languages)
Strings are one of the most fundamental data structures in programming and computer science.
⭐ 1. Definition
A string is a one-dimensional array of characters terminated by a special character ('\0' in C) or managed by length (in languages like C++, Java, Python).
Example (C-style):
char s[] = "HELLO";
This is stored as:
'H' 'E' 'L' 'L' 'O' '\0'
⭐ 2. Characteristics of Strings
- Strings are immutable in languages like Java, Python (once created, cannot be changed).
- Strings are mutable in languages like C, C++ (character array can be modified).
- Support indexing →
s[i] - Support lexicographical comparison
- Support substring extraction
- Stored in continuous memory
⭐ 3. Operations on Strings
Common operations include:
| Operation | Description | Time Complexity |
|---|---|---|
| Length | Count characters | O(n) or O(1) depending on language |
| Concatenation | Join two strings | O(n + m) |
| Copy | Copy contents | O(n) |
| Substring | Extract part of string | O(k) |
| Compare | Compare lexicographically | O(n) |
| Reverse | Reverse string | O(n) |
| Search | Find substring/pattern | O(n × m) naïve |
⭐ 4. String Representation in Memory
A. C-style Strings (Null-terminated)
char s[] = {'C','O','D','E','\0'};
- Terminated by null character
\0 - String length determined by scanning until
\0
B. Java/Python Strings (Length-based)
Store explicit length, making length operation O(1).
⭐ 5. String Matching (Pattern Searching)
One of the most important applications of strings.
Given:
- Text T of length n
- Pattern P of length m
Find occurrences of P in T.
5.1 Naïve String Matching
Try matching P at every position of T.
Time: O(n × m)
5.2 Knuth–Morris–Pratt (KMP)
Efficient pattern matching using LPS (Longest Prefix Suffix) array.
Time: O(n + m)
5.3 Rabin–Karp Algorithm
Uses hashing to match patterns.
Average Time: O(n + m)
Worst Case: O(n × m)
5.4 Boyer–Moore Algorithm
Skips multiple characters by matching from right to left.
Best for large alphabets.
Time: O(n/m)
⭐ 6. Advanced String Data Structures
Used heavily in text processing, compilers, bioinformatics, search engines:
A. Trie (Prefix Tree)
- Tree-like structure
- Fast prefix search
- Insert/Search → O(length of word)
Used in autocomplete systems.
B. Suffix Tree
- Stores all suffixes of a string
- substring search → O(m)
- heavy memory usage
C. Suffix Array
- Space-efficient alternative to suffix tree
- Used in pattern matching, LCP arrays, DNA sequencing
D. Suffix Automaton
- Minimal deterministic automaton for all substrings
- Used in competitive programming and compilers
E. Aho–Corasick Automaton
- Multiple-pattern matching
- Used in virus scanners and spam filters
⭐ 7. String Applications
Strings are used in:
- Text processing
- Data compression (Huffman, LZW)
- Compiler design (lexical analysis)
- Natural language processing (NLP)
- Pattern matching
- Searching in documents
- DNA and protein sequence analysis
- Networking protocols
- File path manipulation
- Web URLs and parsing
⭐ 8. Example String Problems
- Reverse a string
- Check palindrome
- Count vowels/consonants
- Compare two strings
- Pattern searching
- Longest substring without repeating characters
- Longest common prefix
- Longest palindromic substring
- Anagram detection
These questions often appear in exams and interviews.
⭐ Summary
A String is a sequence of characters stored in contiguous memory.
Major topics include:
- Representation (null-terminated / length-based)
- Common operations (concatenate, substring, compare)
- Pattern matching algorithms (Naïve, KMP, Rabin-Karp, Boyer-Moore)
- Advanced structures: Trie, Suffix Tree, Suffix Array
- Heavy use in compilers, NLP, networking, search engines
