Skip to content

Structure of a Distributed Database

1. Introduction to Distributed Databases

A Distributed Database (DDB) is a collection of interrelated databases stored at different physical locations but connected via a network. The system manages and controls the distribution, replication, and processing of data across multiple nodes.

Why Use Distributed Databases?

Improved Performance: Data is stored closer to users, reducing query response time.
High Availability: Even if one site fails, others can continue operations.
Scalability: More nodes can be added as demand increases.
Fault Tolerance: Data replication ensures minimal data loss.
Geographical Distribution: Supports multi-location operations efficiently.


2. Architecture of Distributed Databases

The structure of a distributed database is based on how data is distributed, accessed, and managed across different sites.

1. Types of Distributed Database Systems

Distributed databases can be categorized into:

TypeDescriptionExample
Homogeneous DDBMSAll databases use the same DBMS software.Multiple MySQL servers in different locations.
Heterogeneous DDBMSDifferent DBMS software at different sites.A combination of MySQL, Oracle, and MongoDB.
Federated DDBMSIndependent databases connected by a global schema.A system integrating multiple banking databases.

2. Data Distribution Strategies

Data in a distributed database can be distributed using different methods:

A. Fragmentation (Partitioning)

  • The database is split into smaller parts (fragments) stored at different locations.
  • Each fragment contains data relevant to a specific region or function.
  • Types of Fragmentation:
    • Horizontal Fragmentation: Divides tables into rows.
    • Vertical Fragmentation: Divides tables into columns.
    • Hybrid Fragmentation: Combination of both.

Example: Horizontal Fragmentation

  • A customer database is divided into North Region, South Region, etc.
CREATE TABLE Customers_North AS 
SELECT * FROM Customers WHERE Region = 'North';

Reduces query load for each region.


B. Replication

  • Copies of data are stored at multiple sites to improve availability and fault tolerance.
  • Types of Replication:
    • Full Replication: Entire database is copied to multiple locations.
    • Partial Replication: Only selected data is replicated.
    • Synchronous Replication: Changes are updated instantly across all replicas.
    • Asynchronous Replication: Updates occur after a delay.

Example: Replicating a Table in MySQL

CREATE TABLE OrdersReplica AS SELECT * FROM Orders;

Ensures backup and faster access.


C. Allocation

  • Decides where each fragment or replica is stored for optimal performance.
  • Types:
    • Centralized Allocation: All data is stored in one central server.
    • Partitioned Allocation: Different fragments are stored in different sites.
    • Replicated Allocation: Some data is duplicated across sites for redundancy.

Example: Allocating Product Data Based on Region

  • Products sold in Asia are stored in a database in Singapore.
  • Products sold in Europe are stored in a database in Germany.

3. Components of a Distributed Database

A Distributed Database System (DDBMS) consists of multiple components.

ComponentFunction
Database ServersStores and processes database queries.
Transaction ManagerEnsures consistency using ACID properties.
Query ProcessorOptimizes and executes distributed queries.
Data DirectoryStores metadata about data locations.
Communication NetworkEnables data exchange between different sites.

Example: A Banking System with Distributed Data

  • Branch 1: Stores customer accounts.
  • Branch 2: Manages loan information.
  • Branch 3: Handles transaction records.
  • The DDBMS integrates all branches while maintaining data consistency.

4. Distributed Query Processing

  • Queries in distributed databases must fetch data from multiple locations.
  • The query is optimized for minimal communication and fast execution.

Example: Querying Distributed Data

SELECT Customers.Name, Orders.OrderID
FROM Customers, Orders
WHERE Customers.CustomerID = Orders.CustomerID
AND Customers.Region = 'North';

Retrieves only relevant data from the North region.


5. Distributed Transaction Management

A. ACID Properties in Distributed Systems

Transactions must follow:
Atomicity: All operations succeed or fail together.
Consistency: Database remains in a valid state.
Isolation: Transactions do not interfere with each other.
Durability: Committed transactions are saved permanently.

Example: Distributed Bank Transfer
1️⃣ Deduct $500 from Account A (Location 1).
2️⃣ Add $500 to Account B (Location 2).
3️⃣ Ensure both updates succeed or both fail.

Prevents incomplete transactions.


B. Two-Phase Commit Protocol (2PC)

Ensures distributed transactions are committed properly.

PhaseAction
Phase 1 (Prepare)The transaction manager asks all sites if they can commit.
Phase 2 (Commit/Rollback)If all sites agree → commit; Otherwise → rollback.

Example:
1️⃣ Bank transaction starts.
2️⃣ System checks if both locations can commit.
3️⃣ If yes → Money is transferred.
4️⃣ If no → Transaction is canceled.

Ensures data consistency across multiple locations.


6. Advantages & Disadvantages of Distributed Databases

Advantages

Faster Data Access – Queries are processed locally.
Scalability – Easy to add more servers.
High Availability – Data is replicated for fault tolerance.
Improved Performance – Workload is distributed.

Disadvantages

Complex Management – Requires careful planning and administration.
Data Synchronization Issues – Updates across multiple sites may be delayed.
High Setup Cost – Needs networking and multiple servers.
Security Risks – More entry points can lead to vulnerabilities.


7. Conclusion

🔹 A Distributed Database spreads data across multiple locations to improve performance, availability, and fault tolerance.
🔹 Uses fragmentation, replication, and allocation to distribute data efficiently.
🔹 Requires query optimization and transaction management to maintain consistency and reliability.
🔹 Despite challenges, it is widely used in banking, e-commerce, cloud storage, and global enterprises.