Skip to content
Home » I/O Parallelism

I/O Parallelism


I/O PARALLELISM (Parallel Databases – Detailed Discussion)

In any database system, especially large-scale ones, disk I/O is the slowest component of query execution.
Even if many CPUs are available, performance will be limited unless disk operations (reading/writing data) are also done in parallel.

I/O Parallelism is the technique of using multiple disks and disk controllers to read/write data simultaneously, thereby increasing the throughput and speed of database operations.


1. INTRODUCTION TO I/O PARALLELISM

I/O Parallelism refers to distributing database data across multiple disks and performing parallel disk accesses. Instead of storing all data on a single disk (which becomes a bottleneck), DBMS distributes data across several disks so that:

✔ Data can be accessed faster
✔ Queries perform better
✔ Parallel scans, joins, and sorting become much faster
✔ System avoids single-disk bottleneck

Essentially:

I/O Parallelism = Using multiple disks to access data simultaneously


2. WHY I/O PARALLELISM IS IMPORTANT?

Because I/O operations are slow:

  • Disk seek time
  • Disk rotational delay
  • Transfer rate limitations

Even with fast CPUs, if data access is slow, query execution suffers.
I/O Parallelism reduces these delays and helps in:

✔ High-speed table scans
✔ Faster join operations
✔ High-performance OLAP workloads
✔ Efficient data warehousing
✔ Better throughput for multi-user systems


3. TYPES OF I/O PARALLELISM

There are two main types of I/O Parallelism:


A. Intra-Query I/O Parallelism

Different parts of the same query are read from multiple disks at the same time.

Example:
While scanning a table stored across 4 disks, DBMS reads 4 parts simultaneously.


B. Inter-Query I/O Parallelism

Different queries access data on different disks in parallel.

Example:
Query 1 scans Disk A, Query 2 scans Disk B → both run faster independently.


4. HOW I/O PARALLELISM IS ACHIEVED?

I/O parallelism is implemented using two important concepts:


1. Data Partitioning (Distribution)

Data is divided across multiple disks so that each disk stores a portion.

✔ Common Data Partitioning Methods:

(i) Horizontal Partitioning

Rows are divided across disks
Example: 10M rows of a table → divided into 4 partitions.

(ii) Vertical Partitioning

Columns are divided across disks
Example: frequently accessed columns stored separately.

(iii) Hash Partitioning

Hash function distributes rows uniformly across disks.

(iv) Range Partitioning

Based on value ranges (dates, numeric ranges)

(v) Round-robin Partitioning

Sequential distribution: Disk 1 gets row1, Disk 2 gets row2, Disk 3 gets row3, etc.


2. Disk Striping (RAID Technology)

Disk striping divides data into small blocks and stores them across multiple disks.

If a table scan needs 100 MB:

  • With 1 disk → 100 MB read
  • With 4 disks → each disk reads 25 MB in parallel
  • Time reduces by up to 4×

Used in RAID 0, RAID 5, RAID 10.


3. Parallel Disk Controllers

Multiple disk controllers and channels ensure disks can operate simultaneously without interference.


4. Parallel Buffer Management

Data read from multiple disks is loaded into buffer pool pages in parallel, reducing bottlenecks.


5. Parallel Pre-fetching

DBMS anticipates future I/O requests and loads data in advance from multiple disks.


5. QUERY OPERATIONS BENEFITING FROM I/O PARALLELISM

✔ 1. Parallel Table Scans

Useful for large table scans:

  • SELECT * FROM large_table
  • Scans executed on multiple disks simultaneously

✔ 2. Parallel Join Algorithms

  • Parallel hash join
  • Parallel merge join
  • Partitioned nested loop join

Each partition is processed independently.

✔ 3. Parallel Sorting

  • External merge-sort becomes much faster
  • Each disk sorts its local data first
  • Final merge stage is parallelized

✔ 4. Parallel Aggregation

  • SUM, AVG, COUNT, GROUP BY done locally on each disk
  • Final merge computed by coordinator

6. BENEFITS OF I/O PARALLELISM

✔ Dramatically reduces disk bottleneck

✔ Speeds up large scans and joins

✔ Allows high-performance OLAP queries

✔ Improves throughput for many users

✔ Essential for big data and parallel DBMS

✔ Linear scalability with more disks


7. CHALLENGES / DISADVANTAGES

✔ Data skew (unequal partition sizes)

✔ Expensive hardware (multiple disks, controllers)

✔ Coordination overhead

✔ Complex implementation

✔ Fault tolerance required (RAID or replication)

If one disk gets more data than others → slows down entire parallel query.


8. EXAMPLE (MCA-Style)

Suppose a table has 100 million rows.

Without I/O Parallelism:

  • Stored on 1 disk
  • Sequential scan takes 10 seconds

With 4-disk striping:

  • Each disk holds 25 million rows
  • Each disk reads in parallel
  • Total time ≈ 2.5 seconds

Speedup = 4× faster


Perfect 5–6 Mark Short Answer

I/O Parallelism refers to performing disk input/output operations in parallel by using multiple disks.
It is achieved through data partitioning, disk striping, parallel disk controllers, and parallel pre-fetching.
I/O Parallelism enables faster table scans, joins, and sorting by distributing data across multiple disks so that different parts can be accessed simultaneously.
It improves query performance, increases throughput, and provides high scalability in parallel databases.