B-Trees — Detailed Explanation

A B-Tree is a self-balancing multi-way search tree used extensively in databases, file systems, and indexing where data is stored on disk.

It is designed to minimize disk I/O by storing multiple keys in one node and having multiple children (not just 2 like in BST).

1. Why B-Trees?

In large databases, accessing a single disk block is slow.
A binary search tree would require many disk reads.

B-Trees reduce disk accesses because:

✔ Each node holds many keys
✔ Height is very small → fewer disk reads
✔ Perfect for storage systems (databases/OS)

2. Definition of B-Tree

A B-Tree of order m (also called m-way B-Tree) is an M-ary search tree where:

Each node can have at most m children
Each node stores at most m−1 keys
Every node (except root) has at least ceil(m/2) children
Keys in each node are stored in sorted order
All leaf nodes are at the same level

3. Properties of B-Tree

Every node contains multiple keys (up to m−1)
All leaves are at the same level
Keys in each node are sorted
For each key, values in:
- left subtree < key
- right subtree > key
Root:
- Has at least 1 key
- Can have minimum 2 children
Each non-root node has:
- Minimum ceil(m/2) − 1 keys
- Maximum m − 1 keys

4. Structure of a B-Tree Node

A node typically contains:

[ K1 | K2 | K3 | ... | K(n) ]
  |    |    |          |
 C0   C1   C2         Cn

Keys: K1, K2, …, Kn
Pointers: C0, C1, …, Cn

5. Operations on B-Trees

A. Searching

Works like search in BST but inside a node we do binary search:

Search key inside node.
If found → success.
Otherwise follow the child pointer between key ranges.

Time Complexity:

O(log n) (because tree height is small)

B. Insertion

Insertion may cause node overflow (more than m−1 keys).
To handle this, B-Trees use splitting.

Steps to Insert:

Search correct leaf node.
Insert key in sorted order.
If node exceeds m−1 keys → Split:
- Median key moves up to parent
- Node splits into two nodes

Example Split:

For order m = 5 (max 4 keys):

Node before split:

[10 | 20 | 30 | 40 | 50]

Median = 30 (move to parent)

Split into:

Left child:  [10 | 20]
Right child: [40 | 50]

C. Deletion

Deletion is more complex than insertion.

Deleting a key may cause underflow (less than ceil(m/2)−1 keys).

Two ways to fix:

Case 1 — Borrow from Sibling

Borrow a key from left or right sibling.

Case 2 — Merge Nodes

If siblings cannot give a key:

Merge node with sibling
Pull one key down from parent

Steps:

Find the key
If it’s in an internal node → swap with inorder predecessor/successor
Delete from leaf
Fix underflow by borrow/merge

6. Height of B-Tree

Height is very small.

Maximum height:

[
h = O(\log_m n)
]

For large m (like order 100), tree height may be 2 or 3, even for millions of records.

7. Applications of B-Trees

B-Trees are used where disk access is important:

Database indexing (MySQL, PostgreSQL)
File systems:
- NTFS
- HFS+
- ext4
Operating systems directories
Multilevel indexing
Block storage devices

8. Advantages of B-Trees

✔ Balanced tree → fast searching
✔ Nodes store many keys → small height
✔ Designed for storage (minimizes disk I/O)
✔ Insert/delete cause fewer rotations than AVL/RBT
✔ Ideal for large datasets

9. Disadvantages

✘ Implementation is complex
✘ Key movement during split/merge can be costly
✘ Not suitable for in-memory small datasets

10. B-Tree vs B+ Tree (Important for Exams)

Feature	B-Tree	B+ Tree
Keys in leaf	Yes	Yes
Keys in internal nodes	Yes	No (only index)
Searching	Slower	Faster (linked leaves)
In-order traversal	Hard	Easy
Used in DB	Less	More (standard)

Summary

A B-Tree is a self-balancing multi-way search tree designed for efficient disk-based storage and indexing.
It maintains balance by using node splitting (overflow) and borrowing/merging (underflow).

B-trees