Skip to content
Home » Distributed Database Concepts

Distributed Database Concepts


INTRODUCTION TO DISTRIBUTED DATABASE CONCEPTS

A Distributed Database System (DDBMS) is a database system in which:

  • Data is stored across multiple sites/locations
  • Sites are connected via a computer network
  • Users experience the system as if it is a single unified database

Each site can be a separate computer with local processors, storage, and DBMS software.
Despite being physically distributed, the system behaves logically as one database.


WHY DISTRIBUTED DATABASES?

Organizations today operate across multiple:

✔ Locations
✔ Branches
✔ Cities
✔ Countries

A centralized database becomes slow, expensive, and unreliable for geographically distributed operations.

Distributed databases solve these issues.

✔ Main reasons include:

  1. Improved Reliability & Availability
    If one site fails, others continue functioning.
  2. Better Performance
    Local queries run faster because data is stored closer to users.
  3. Scalability
    Add more sites as the organization grows.
  4. Reduced Communication Cost
    Accessing local data reduces network usage.
  5. Local Autonomy
    Each site can operate independently.

CHARACTERISTICS OF DISTRIBUTED DATABASES

A distributed database system exhibits the following characteristics:

✔ 1. Multiple Sites

Data stored across several geographically distributed nodes.

✔ 2. Distributed Processing

Queries, transactions, and updates may be processed at multiple sites.

✔ 3. Logical Data Integration

Even though data is dispersed, to the user it appears as one database.

✔ 4. Autonomy

Each site can manage its own data and local DBMS.

✔ 5. Distributed Query Processing

Query optimization and execution occur across multiple sites.

✔ 6. Distributed Concurrency Control

Ensures correctness when multiple sites access shared data.

✔ 7. Distributed Recovery Management

Restores database consistency after site/network failures.


TYPES OF DISTRIBUTED DATABASES

Distributed databases are classified into two main types:


⭐ 1. Homogeneous Distributed Database

All sites use:

✔ Same DBMS
✔ Same OS
✔ Same data model
✔ Same query language

✔ Advantages:

  • Easy communication
  • Simplified query processing
  • Uniform environment

✔ Example:

All branches use MySQL servers connected over network.


⭐ 2. Heterogeneous Distributed Database

Sites use:

✔ Different DBMS software (Oracle + MySQL + SQL Server)
✔ Different data models (Relational + Object DB)
✔ Different OS

✔ Advantages:

  • Flexibility in using best DBMS for each site
  • Integrates pre-existing systems

✔ Challenges:

  • Query translation
  • Schema mismatches
  • Data conversion issues

DISTRIBUTION STRATEGIES

A DDBMS can distribute data using:


✔ 1. Fragmentation

Data is divided into smaller parts:

a. Horizontal Fragmentation

Rows are partitioned across sites.

b. Vertical Fragmentation

Columns are partitioned.

c. Hybrid Fragmentation

Combination of both.


✔ 2. Replication

Multiple copies of the same data stored at different sites.

Benefits:

  • High availability
  • Faster local access

Types:

  • Full Replication
  • Partial Replication
  • No Replication

✔ 3. Allocation

Deciding which site should store which fragment or replica.


ADVANTAGES OF DISTRIBUTED DATABASES

✔ 1. Reliability & Availability

One site failure doesn’t affect the system.

✔ 2. Faster Response

Local data means quicker access.

✔ 3. Scalability

Easily add new sites.

✔ 4. Lower Communication Cost

Remote data transfer is minimized.

✔ 5. Local Autonomy

Each site controls its own operations.


CHALLENGES OF DISTRIBUTED DATABASES

✔ 1. Complex Query Processing

Must fetch and join data from multiple sites.

✔ 2. Distributed Concurrency Control

Managing locks across sites is difficult.

✔ 3. Distributed Deadlocks

Deadlocks can occur across nodes.

✔ 4. Distributed Recovery

Handling site/network failures becomes complex.

✔ 5. Security Issues

Multiple sites → more points of attack.

✔ 6. High Initial Setup Cost

Network, hardware, synchronization, policies.


DISTRIBUTED DBMS ARCHITECTURE (Components)

A DDBMS contains:

✔ Local DBMS at each site
✔ Communication manager
✔ Distributed transaction manager
✔ Distributed concurrency control
✔ Distributed query processor
✔ Global schema & metadata
✔ Directory (data location information)
✔ Global recovery manager


DISTRIBUTED TRANSACTIONS

Transactions may access multiple sites.

Needs protocols like:

  • Two-Phase Commit (2PC)
  • Three-Phase Commit (3PC)

These ensure atomicity across multiple machines.


Perfect 5–6 Mark Short Answer

A Distributed Database System stores data across multiple sites connected through a network while presenting it to users as a single unified database. Distributed databases improve reliability, performance, scalability, and local access efficiency. They support data fragmentation, replication, distributed query execution, and distributed transaction management. However, they also introduce complexity in concurrency control, deadlock handling, recovery, and security.