Skip to content
Home » String

String


String — Detailed Explanation (Data Structure & Theory)

A String is a sequence of characters stored in contiguous memory.
Characters may include:

  • Letters (A–Z, a–z)
  • Digits (0–9)
  • Symbols (!,@,#,$, etc.)
  • Spaces, punctuation
  • Unicode characters (emoji, regional languages)

Strings are one of the most fundamental data structures in programming and computer science.


1. Definition

A string is a one-dimensional array of characters terminated by a special character ('\0' in C) or managed by length (in languages like C++, Java, Python).

Example (C-style):

char s[] = "HELLO";

This is stored as:

'H' 'E' 'L' 'L' 'O' '\0'

2. Characteristics of Strings

  • Strings are immutable in languages like Java, Python (once created, cannot be changed).
  • Strings are mutable in languages like C, C++ (character array can be modified).
  • Support indexing → s[i]
  • Support lexicographical comparison
  • Support substring extraction
  • Stored in continuous memory

3. Operations on Strings

Common operations include:

OperationDescriptionTime Complexity
LengthCount charactersO(n) or O(1) depending on language
ConcatenationJoin two stringsO(n + m)
CopyCopy contentsO(n)
SubstringExtract part of stringO(k)
CompareCompare lexicographicallyO(n)
ReverseReverse stringO(n)
SearchFind substring/patternO(n × m) naïve

4. String Representation in Memory

A. C-style Strings (Null-terminated)

char s[] = {'C','O','D','E','\0'};
  • Terminated by null character \0
  • String length determined by scanning until \0

B. Java/Python Strings (Length-based)

Store explicit length, making length operation O(1).


5. String Matching (Pattern Searching)

One of the most important applications of strings.

Given:

  • Text T of length n
  • Pattern P of length m

Find occurrences of P in T.

5.1 Naïve String Matching

Try matching P at every position of T.
Time: O(n × m)


5.2 Knuth–Morris–Pratt (KMP)

Efficient pattern matching using LPS (Longest Prefix Suffix) array.
Time: O(n + m)


5.3 Rabin–Karp Algorithm

Uses hashing to match patterns.
Average Time: O(n + m)
Worst Case: O(n × m)


5.4 Boyer–Moore Algorithm

Skips multiple characters by matching from right to left.
Best for large alphabets.
Time: O(n/m)


6. Advanced String Data Structures

Used heavily in text processing, compilers, bioinformatics, search engines:


A. Trie (Prefix Tree)

  • Tree-like structure
  • Fast prefix search
  • Insert/Search → O(length of word)

Used in autocomplete systems.


B. Suffix Tree

  • Stores all suffixes of a string
  • substring search → O(m)
  • heavy memory usage

C. Suffix Array

  • Space-efficient alternative to suffix tree
  • Used in pattern matching, LCP arrays, DNA sequencing

D. Suffix Automaton

  • Minimal deterministic automaton for all substrings
  • Used in competitive programming and compilers

E. Aho–Corasick Automaton

  • Multiple-pattern matching
  • Used in virus scanners and spam filters

7. String Applications

Strings are used in:

  • Text processing
  • Data compression (Huffman, LZW)
  • Compiler design (lexical analysis)
  • Natural language processing (NLP)
  • Pattern matching
  • Searching in documents
  • DNA and protein sequence analysis
  • Networking protocols
  • File path manipulation
  • Web URLs and parsing

8. Example String Problems

  1. Reverse a string
  2. Check palindrome
  3. Count vowels/consonants
  4. Compare two strings
  5. Pattern searching
  6. Longest substring without repeating characters
  7. Longest common prefix
  8. Longest palindromic substring
  9. Anagram detection

These questions often appear in exams and interviews.


Summary

A String is a sequence of characters stored in contiguous memory.
Major topics include:

  • Representation (null-terminated / length-based)
  • Common operations (concatenate, substring, compare)
  • Pattern matching algorithms (Naïve, KMP, Rabin-Karp, Boyer-Moore)
  • Advanced structures: Trie, Suffix Tree, Suffix Array
  • Heavy use in compilers, NLP, networking, search engines