⭐ INTRODUCTION TO DISTRIBUTED DATABASE CONCEPTS
A Distributed Database System (DDBMS) is a database system in which:
- Data is stored across multiple sites/locations
- Sites are connected via a computer network
- Users experience the system as if it is a single unified database
Each site can be a separate computer with local processors, storage, and DBMS software.
Despite being physically distributed, the system behaves logically as one database.
⭐ WHY DISTRIBUTED DATABASES?
Organizations today operate across multiple:
✔ Locations
✔ Branches
✔ Cities
✔ Countries
A centralized database becomes slow, expensive, and unreliable for geographically distributed operations.
Distributed databases solve these issues.
✔ Main reasons include:
- Improved Reliability & Availability
If one site fails, others continue functioning. - Better Performance
Local queries run faster because data is stored closer to users. - Scalability
Add more sites as the organization grows. - Reduced Communication Cost
Accessing local data reduces network usage. - Local Autonomy
Each site can operate independently.
⭐ CHARACTERISTICS OF DISTRIBUTED DATABASES
A distributed database system exhibits the following characteristics:
✔ 1. Multiple Sites
Data stored across several geographically distributed nodes.
✔ 2. Distributed Processing
Queries, transactions, and updates may be processed at multiple sites.
✔ 3. Logical Data Integration
Even though data is dispersed, to the user it appears as one database.
✔ 4. Autonomy
Each site can manage its own data and local DBMS.
✔ 5. Distributed Query Processing
Query optimization and execution occur across multiple sites.
✔ 6. Distributed Concurrency Control
Ensures correctness when multiple sites access shared data.
✔ 7. Distributed Recovery Management
Restores database consistency after site/network failures.
⭐ TYPES OF DISTRIBUTED DATABASES
Distributed databases are classified into two main types:
⭐ 1. Homogeneous Distributed Database
All sites use:
✔ Same DBMS
✔ Same OS
✔ Same data model
✔ Same query language
✔ Advantages:
- Easy communication
- Simplified query processing
- Uniform environment
✔ Example:
All branches use MySQL servers connected over network.
⭐ 2. Heterogeneous Distributed Database
Sites use:
✔ Different DBMS software (Oracle + MySQL + SQL Server)
✔ Different data models (Relational + Object DB)
✔ Different OS
✔ Advantages:
- Flexibility in using best DBMS for each site
- Integrates pre-existing systems
✔ Challenges:
- Query translation
- Schema mismatches
- Data conversion issues
⭐ DISTRIBUTION STRATEGIES
A DDBMS can distribute data using:
✔ 1. Fragmentation
Data is divided into smaller parts:
a. Horizontal Fragmentation
Rows are partitioned across sites.
b. Vertical Fragmentation
Columns are partitioned.
c. Hybrid Fragmentation
Combination of both.
✔ 2. Replication
Multiple copies of the same data stored at different sites.
Benefits:
- High availability
- Faster local access
Types:
- Full Replication
- Partial Replication
- No Replication
✔ 3. Allocation
Deciding which site should store which fragment or replica.
⭐ ADVANTAGES OF DISTRIBUTED DATABASES
✔ 1. Reliability & Availability
One site failure doesn’t affect the system.
✔ 2. Faster Response
Local data means quicker access.
✔ 3. Scalability
Easily add new sites.
✔ 4. Lower Communication Cost
Remote data transfer is minimized.
✔ 5. Local Autonomy
Each site controls its own operations.
⭐ CHALLENGES OF DISTRIBUTED DATABASES
✔ 1. Complex Query Processing
Must fetch and join data from multiple sites.
✔ 2. Distributed Concurrency Control
Managing locks across sites is difficult.
✔ 3. Distributed Deadlocks
Deadlocks can occur across nodes.
✔ 4. Distributed Recovery
Handling site/network failures becomes complex.
✔ 5. Security Issues
Multiple sites → more points of attack.
✔ 6. High Initial Setup Cost
Network, hardware, synchronization, policies.
⭐ DISTRIBUTED DBMS ARCHITECTURE (Components)
A DDBMS contains:
✔ Local DBMS at each site
✔ Communication manager
✔ Distributed transaction manager
✔ Distributed concurrency control
✔ Distributed query processor
✔ Global schema & metadata
✔ Directory (data location information)
✔ Global recovery manager
⭐ DISTRIBUTED TRANSACTIONS
Transactions may access multiple sites.
Needs protocols like:
- Two-Phase Commit (2PC)
- Three-Phase Commit (3PC)
These ensure atomicity across multiple machines.
⭐ Perfect 5–6 Mark Short Answer
A Distributed Database System stores data across multiple sites connected through a network while presenting it to users as a single unified database. Distributed databases improve reliability, performance, scalability, and local access efficiency. They support data fragmentation, replication, distributed query execution, and distributed transaction management. However, they also introduce complexity in concurrency control, deadlock handling, recovery, and security.
