Skip to content
Home » B-trees

B-trees

B-Trees — Detailed Explanation

A B-Tree is a self-balancing multi-way search tree used extensively in databases, file systems, and indexing where data is stored on disk.

It is designed to minimize disk I/O by storing multiple keys in one node and having multiple children (not just 2 like in BST).


1. Why B-Trees?

In large databases, accessing a single disk block is slow.
A binary search tree would require many disk reads.

B-Trees reduce disk accesses because:

✔ Each node holds many keys
✔ Height is very small → fewer disk reads
✔ Perfect for storage systems (databases/OS)


2. Definition of B-Tree

A B-Tree of order m (also called m-way B-Tree) is an M-ary search tree where:

  • Each node can have at most m children
  • Each node stores at most m−1 keys
  • Every node (except root) has at least ceil(m/2) children
  • Keys in each node are stored in sorted order
  • All leaf nodes are at the same level

3. Properties of B-Tree

  1. Every node contains multiple keys (up to m−1)
  2. All leaves are at the same level
  3. Keys in each node are sorted
  4. For each key, values in:
    • left subtree < key
    • right subtree > key
  5. Root:
    • Has at least 1 key
    • Can have minimum 2 children
  6. Each non-root node has:
    • Minimum ceil(m/2) − 1 keys
    • Maximum m − 1 keys

4. Structure of a B-Tree Node

A node typically contains:

[ K1 | K2 | K3 | ... | K(n) ]
  |    |    |          |
 C0   C1   C2         Cn
  • Keys: K1, K2, …, Kn
  • Pointers: C0, C1, …, Cn

5. Operations on B-Trees

A. Searching

Works like search in BST but inside a node we do binary search:

  1. Search key inside node.
  2. If found → success.
  3. Otherwise follow the child pointer between key ranges.

Time Complexity:

  • O(log n) (because tree height is small)

B. Insertion

Insertion may cause node overflow (more than m−1 keys).
To handle this, B-Trees use splitting.

Steps to Insert:

  1. Search correct leaf node.
  2. Insert key in sorted order.
  3. If node exceeds m−1 keys → Split:
    • Median key moves up to parent
    • Node splits into two nodes

Example Split:

For order m = 5 (max 4 keys):

Node before split:

[10 | 20 | 30 | 40 | 50]

Median = 30 (move to parent)

Split into:

Left child:  [10 | 20]
Right child: [40 | 50]

C. Deletion

Deletion is more complex than insertion.

Deleting a key may cause underflow (less than ceil(m/2)−1 keys).

Two ways to fix:

Case 1 — Borrow from Sibling

Borrow a key from left or right sibling.

Case 2 — Merge Nodes

If siblings cannot give a key:

  • Merge node with sibling
  • Pull one key down from parent

Steps:

  1. Find the key
  2. If it’s in an internal node → swap with inorder predecessor/successor
  3. Delete from leaf
  4. Fix underflow by borrow/merge

6. Height of B-Tree

Height is very small.

Maximum height:

[
h = O(\log_m n)
]

For large m (like order 100), tree height may be 2 or 3, even for millions of records.


7. Applications of B-Trees

B-Trees are used where disk access is important:

  • Database indexing (MySQL, PostgreSQL)
  • File systems:
    • NTFS
    • HFS+
    • ext4
  • Operating systems directories
  • Multilevel indexing
  • Block storage devices

8. Advantages of B-Trees

✔ Balanced tree → fast searching
✔ Nodes store many keys → small height
✔ Designed for storage (minimizes disk I/O)
✔ Insert/delete cause fewer rotations than AVL/RBT
✔ Ideal for large datasets


9. Disadvantages

✘ Implementation is complex
✘ Key movement during split/merge can be costly
✘ Not suitable for in-memory small datasets


10. B-Tree vs B+ Tree (Important for Exams)

FeatureB-TreeB+ Tree
Keys in leafYesYes
Keys in internal nodesYesNo (only index)
SearchingSlowerFaster (linked leaves)
In-order traversalHardEasy
Used in DBLessMore (standard)

Summary

A B-Tree is a self-balancing multi-way search tree designed for efficient disk-based storage and indexing.
It maintains balance by using node splitting (overflow) and borrowing/merging (underflow).