B-Trees — Detailed Explanation
A B-Tree is a self-balancing multi-way search tree used extensively in databases, file systems, and indexing where data is stored on disk.
It is designed to minimize disk I/O by storing multiple keys in one node and having multiple children (not just 2 like in BST).
1. Why B-Trees?
In large databases, accessing a single disk block is slow.
A binary search tree would require many disk reads.
B-Trees reduce disk accesses because:
✔ Each node holds many keys
✔ Height is very small → fewer disk reads
✔ Perfect for storage systems (databases/OS)
2. Definition of B-Tree
A B-Tree of order m (also called m-way B-Tree) is an M-ary search tree where:
- Each node can have at most m children
- Each node stores at most m−1 keys
- Every node (except root) has at least ceil(m/2) children
- Keys in each node are stored in sorted order
- All leaf nodes are at the same level
3. Properties of B-Tree
- Every node contains multiple keys (up to m−1)
- All leaves are at the same level
- Keys in each node are sorted
- For each key, values in:
- left subtree < key
- right subtree > key
- Root:
- Has at least 1 key
- Can have minimum 2 children
- Each non-root node has:
- Minimum ceil(m/2) − 1 keys
- Maximum m − 1 keys
4. Structure of a B-Tree Node
A node typically contains:
[ K1 | K2 | K3 | ... | K(n) ]
| | | |
C0 C1 C2 Cn
- Keys: K1, K2, …, Kn
- Pointers: C0, C1, …, Cn
5. Operations on B-Trees
A. Searching
Works like search in BST but inside a node we do binary search:
- Search key inside node.
- If found → success.
- Otherwise follow the child pointer between key ranges.
Time Complexity:
- O(log n) (because tree height is small)
B. Insertion
Insertion may cause node overflow (more than m−1 keys).
To handle this, B-Trees use splitting.
Steps to Insert:
- Search correct leaf node.
- Insert key in sorted order.
- If node exceeds m−1 keys → Split:
- Median key moves up to parent
- Node splits into two nodes
Example Split:
For order m = 5 (max 4 keys):
Node before split:
[10 | 20 | 30 | 40 | 50]
Median = 30 (move to parent)
Split into:
Left child: [10 | 20]
Right child: [40 | 50]
C. Deletion
Deletion is more complex than insertion.
Deleting a key may cause underflow (less than ceil(m/2)−1 keys).
Two ways to fix:
Case 1 — Borrow from Sibling
Borrow a key from left or right sibling.
Case 2 — Merge Nodes
If siblings cannot give a key:
- Merge node with sibling
- Pull one key down from parent
Steps:
- Find the key
- If it’s in an internal node → swap with inorder predecessor/successor
- Delete from leaf
- Fix underflow by borrow/merge
6. Height of B-Tree
Height is very small.
Maximum height:
[
h = O(\log_m n)
]
For large m (like order 100), tree height may be 2 or 3, even for millions of records.
7. Applications of B-Trees
B-Trees are used where disk access is important:
- Database indexing (MySQL, PostgreSQL)
- File systems:
- NTFS
- HFS+
- ext4
- Operating systems directories
- Multilevel indexing
- Block storage devices
8. Advantages of B-Trees
✔ Balanced tree → fast searching
✔ Nodes store many keys → small height
✔ Designed for storage (minimizes disk I/O)
✔ Insert/delete cause fewer rotations than AVL/RBT
✔ Ideal for large datasets
9. Disadvantages
✘ Implementation is complex
✘ Key movement during split/merge can be costly
✘ Not suitable for in-memory small datasets
10. B-Tree vs B+ Tree (Important for Exams)
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Keys in leaf | Yes | Yes |
| Keys in internal nodes | Yes | No (only index) |
| Searching | Slower | Faster (linked leaves) |
| In-order traversal | Hard | Easy |
| Used in DB | Less | More (standard) |
Summary
A B-Tree is a self-balancing multi-way search tree designed for efficient disk-based storage and indexing.
It maintains balance by using node splitting (overflow) and borrowing/merging (underflow).
