Skip to content
Home » Distributed Data Storage

Distributed Data Storage


DISTRIBUTED DATA STORAGE (Detailed Discussion)

In a Distributed Database System (DDBMS), data is physically stored at multiple sites, but to the user it appears as a single integrated database.

Distributed Data Storage defines how and where data is stored across different sites, how it is divided, replicated, and accessed efficiently.

It involves three major concepts:

  1. Fragmentation
  2. Replication
  3. Allocation

These determine how data is distributed across the network.


1. WHY DISTRIBUTED DATA STORAGE?

A distributed organization (banks, universities, e-commerce, telecom) stores data across several locations.
Distributed storage provides:

✔ Faster local access
✔ Reduced communication cost
✔ Improved reliability & availability
✔ Better performance
✔ Scalability
✔ Local autonomy

Example:
Bank branches store their local customer data at local servers, but all branches form one unified database.


2. COMPONENTS OF DISTRIBUTED DATA STORAGE

Distributed data storage involves:

  • Fragmentation (how data is divided)
  • Replication (how duplicates are maintained)
  • Allocation (where data is placed)

Let’s discuss each.


3. FRAGMENTATION (How Data Is Divided)

Fragmentation divides a relation (table) into smaller fragments and stores them across sites.

There are three types:


A. Horizontal Fragmentation

Rows (tuples) are divided across sites.

Example:
Employees from Punjab → Site 1
Employees from Delhi → Site 2
Employees from Mumbai → Site 3

✔ Advantages

  • Faster local queries
  • Better performance
  • Reduced transfer of irrelevant data

B. Vertical Fragmentation

Columns (attributes) are divided across sites.

Example:

Fragment 1: (EmpID, Name, Dept)
Fragment 2: (EmpID, Salary)

Common attribute EmpID is the primary key → used for reconstruction.

✔ Advantages

  • Security: sensitive attributes (salary) stored separately
  • Better locality for attribute-specific queries

C. Hybrid (Mixed) Fragmentation

Combination of horizontal + vertical fragmentation.

Example:
First apply vertical fragmentation → then horizontally distribute each fragment.

Most flexible type but most complex.


4. REPLICATION (Duplicate Copies Across Sites)

Replication means keeping multiple copies of the same data at different sites.

Types of Replication:


A. Full Replication

Entire database replicated at every site.

✔ Advantages:

  • Highest availability
  • Very fast read queries
  • Good for fault tolerance

✔ Disadvantages:

  • Very high update cost
  • Difficult synchronization
  • Storage overhead

B. Partial Replication

Only selected tables or fragments are replicated.

Example:
Branch data stored locally, but product catalog replicated everywhere.


C. No Replication

Each data item stored at exactly one site only.

✔ Advantages:

  • Simple to update
  • No duplicate maintenance

✔ Disadvantages:

  • Low availability
  • Remote access needed

5. DATA ALLOCATION (Where to Store Data)

Data allocation determines which fragment or replica should be stored at which site.

Allocation can be:


A. Centralized Allocation

All fragments stored at one central site.
Not truly distributed.


B. Partitioned Allocation

Each fragment stored at exactly one site.
Ensures local autonomy and best performance for local queries.


C. Replicated Allocation

Fragments stored at multiple sites.
Good for availability but costly for updates.


Factors Affecting Allocation

  • Frequency of data access
  • Network cost
  • Storage cost
  • Processing capability of each site
  • Reliability requirements
  • Security constraints

6. DISTRIBUTED DIRECTORY MANAGEMENT

Directory = metadata that contains location information of each data fragment.

Directory locations can be:

  • Centralized
  • Distributed
  • Fully replicated

This directory helps query processors know where to fetch data from.


7. ADVANTAGES OF DISTRIBUTED DATA STORAGE

✔ Improved performance

Local queries served locally → less delay.

✔ Reduced network traffic

Only required fragments transferred.

✔ Fault tolerance & high availability

Replication ensures system continues after failure.

✔ Scalability

Easy to add new sites.

✔ Local autonomy

Sites manage their own data.


8. CHALLENGES OF DISTRIBUTED DATA STORAGE

✔ Complexity in managing fragmentation & replication

✔ Distributed deadlocks

✔ High cost of synchronization

✔ Difficult distributed recovery

✔ Security issues across sites

✔ Data inconsistency if replicas not synchronized


9. REAL-LIFE EXAMPLES

✔ Banking

Branches store local customer transactions.

✔ E-commerce

Warehouses store local product & inventory data.

✔ Telecom

Call data records stored region-wise.

✔ University

Different campuses store student data locally.


Perfect 5–6 Mark Short Answer

Distributed Data Storage refers to storing database data across multiple sites in a network. It involves three main techniques: fragmentation (horizontal, vertical, hybrid), replication (full, partial, none), and allocation (deciding where to place fragments and replicas). Distributed data storage improves performance, reduces communication cost, increases availability, and supports local autonomy, but also adds complexity in synchronization, concurrency control, and recovery.