⭐ DISTRIBUTED DATA STORAGE (Detailed Discussion)
In a Distributed Database System (DDBMS), data is physically stored at multiple sites, but to the user it appears as a single integrated database.
Distributed Data Storage defines how and where data is stored across different sites, how it is divided, replicated, and accessed efficiently.
It involves three major concepts:
- Fragmentation
- Replication
- Allocation
These determine how data is distributed across the network.
⭐ 1. WHY DISTRIBUTED DATA STORAGE?
A distributed organization (banks, universities, e-commerce, telecom) stores data across several locations.
Distributed storage provides:
✔ Faster local access
✔ Reduced communication cost
✔ Improved reliability & availability
✔ Better performance
✔ Scalability
✔ Local autonomy
Example:
Bank branches store their local customer data at local servers, but all branches form one unified database.
⭐ 2. COMPONENTS OF DISTRIBUTED DATA STORAGE
Distributed data storage involves:
- Fragmentation (how data is divided)
- Replication (how duplicates are maintained)
- Allocation (where data is placed)
Let’s discuss each.
⭐ 3. FRAGMENTATION (How Data Is Divided)
Fragmentation divides a relation (table) into smaller fragments and stores them across sites.
There are three types:
⭐ A. Horizontal Fragmentation
Rows (tuples) are divided across sites.
Example:
Employees from Punjab → Site 1
Employees from Delhi → Site 2
Employees from Mumbai → Site 3
✔ Advantages
- Faster local queries
- Better performance
- Reduced transfer of irrelevant data
⭐ B. Vertical Fragmentation
Columns (attributes) are divided across sites.
Example:
Fragment 1: (EmpID, Name, Dept)
Fragment 2: (EmpID, Salary)
Common attribute EmpID is the primary key → used for reconstruction.
✔ Advantages
- Security: sensitive attributes (salary) stored separately
- Better locality for attribute-specific queries
⭐ C. Hybrid (Mixed) Fragmentation
Combination of horizontal + vertical fragmentation.
Example:
First apply vertical fragmentation → then horizontally distribute each fragment.
Most flexible type but most complex.
⭐ 4. REPLICATION (Duplicate Copies Across Sites)
Replication means keeping multiple copies of the same data at different sites.
Types of Replication:
⭐ A. Full Replication
Entire database replicated at every site.
✔ Advantages:
- Highest availability
- Very fast read queries
- Good for fault tolerance
✔ Disadvantages:
- Very high update cost
- Difficult synchronization
- Storage overhead
⭐ B. Partial Replication
Only selected tables or fragments are replicated.
Example:
Branch data stored locally, but product catalog replicated everywhere.
⭐ C. No Replication
Each data item stored at exactly one site only.
✔ Advantages:
- Simple to update
- No duplicate maintenance
✔ Disadvantages:
- Low availability
- Remote access needed
⭐ 5. DATA ALLOCATION (Where to Store Data)
Data allocation determines which fragment or replica should be stored at which site.
Allocation can be:
⭐ A. Centralized Allocation
All fragments stored at one central site.
Not truly distributed.
⭐ B. Partitioned Allocation
Each fragment stored at exactly one site.
Ensures local autonomy and best performance for local queries.
⭐ C. Replicated Allocation
Fragments stored at multiple sites.
Good for availability but costly for updates.
⭐ Factors Affecting Allocation
- Frequency of data access
- Network cost
- Storage cost
- Processing capability of each site
- Reliability requirements
- Security constraints
⭐ 6. DISTRIBUTED DIRECTORY MANAGEMENT
Directory = metadata that contains location information of each data fragment.
Directory locations can be:
- Centralized
- Distributed
- Fully replicated
This directory helps query processors know where to fetch data from.
⭐ 7. ADVANTAGES OF DISTRIBUTED DATA STORAGE
✔ Improved performance
Local queries served locally → less delay.
✔ Reduced network traffic
Only required fragments transferred.
✔ Fault tolerance & high availability
Replication ensures system continues after failure.
✔ Scalability
Easy to add new sites.
✔ Local autonomy
Sites manage their own data.
⭐ 8. CHALLENGES OF DISTRIBUTED DATA STORAGE
✔ Complexity in managing fragmentation & replication
✔ Distributed deadlocks
✔ High cost of synchronization
✔ Difficult distributed recovery
✔ Security issues across sites
✔ Data inconsistency if replicas not synchronized
⭐ 9. REAL-LIFE EXAMPLES
✔ Banking
Branches store local customer transactions.
✔ E-commerce
Warehouses store local product & inventory data.
✔ Telecom
Call data records stored region-wise.
✔ University
Different campuses store student data locally.
⭐ Perfect 5–6 Mark Short Answer
Distributed Data Storage refers to storing database data across multiple sites in a network. It involves three main techniques: fragmentation (horizontal, vertical, hybrid), replication (full, partial, none), and allocation (deciding where to place fragments and replicas). Distributed data storage improves performance, reduces communication cost, increases availability, and supports local autonomy, but also adds complexity in synchronization, concurrency control, and recovery.
