1. Introduction to Distributed Databases
A Distributed Database (DDB) is a collection of interrelated databases stored at different physical locations but connected via a network. The system manages and controls the distribution, replication, and processing of data across multiple nodes.
Why Use Distributed Databases?
✅ Improved Performance: Data is stored closer to users, reducing query response time.
✅ High Availability: Even if one site fails, others can continue operations.
✅ Scalability: More nodes can be added as demand increases.
✅ Fault Tolerance: Data replication ensures minimal data loss.
✅ Geographical Distribution: Supports multi-location operations efficiently.
2. Architecture of Distributed Databases
The structure of a distributed database is based on how data is distributed, accessed, and managed across different sites.
1. Types of Distributed Database Systems
Distributed databases can be categorized into:
Type | Description | Example |
---|---|---|
Homogeneous DDBMS | All databases use the same DBMS software. | Multiple MySQL servers in different locations. |
Heterogeneous DDBMS | Different DBMS software at different sites. | A combination of MySQL, Oracle, and MongoDB. |
Federated DDBMS | Independent databases connected by a global schema. | A system integrating multiple banking databases. |
2. Data Distribution Strategies
Data in a distributed database can be distributed using different methods:
A. Fragmentation (Partitioning)
- The database is split into smaller parts (fragments) stored at different locations.
- Each fragment contains data relevant to a specific region or function.
- Types of Fragmentation:
- Horizontal Fragmentation: Divides tables into rows.
- Vertical Fragmentation: Divides tables into columns.
- Hybrid Fragmentation: Combination of both.
✔ Example: Horizontal Fragmentation
- A customer database is divided into North Region, South Region, etc.
CREATE TABLE Customers_North AS
SELECT * FROM Customers WHERE Region = 'North';
✅ Reduces query load for each region.
B. Replication
- Copies of data are stored at multiple sites to improve availability and fault tolerance.
- Types of Replication:
- Full Replication: Entire database is copied to multiple locations.
- Partial Replication: Only selected data is replicated.
- Synchronous Replication: Changes are updated instantly across all replicas.
- Asynchronous Replication: Updates occur after a delay.
✔ Example: Replicating a Table in MySQL
CREATE TABLE OrdersReplica AS SELECT * FROM Orders;
✅ Ensures backup and faster access.
C. Allocation
- Decides where each fragment or replica is stored for optimal performance.
- Types:
- Centralized Allocation: All data is stored in one central server.
- Partitioned Allocation: Different fragments are stored in different sites.
- Replicated Allocation: Some data is duplicated across sites for redundancy.
✔ Example: Allocating Product Data Based on Region
- Products sold in Asia are stored in a database in Singapore.
- Products sold in Europe are stored in a database in Germany.
3. Components of a Distributed Database
A Distributed Database System (DDBMS) consists of multiple components.
Component | Function |
---|---|
Database Servers | Stores and processes database queries. |
Transaction Manager | Ensures consistency using ACID properties. |
Query Processor | Optimizes and executes distributed queries. |
Data Directory | Stores metadata about data locations. |
Communication Network | Enables data exchange between different sites. |
✔ Example: A Banking System with Distributed Data
- Branch 1: Stores customer accounts.
- Branch 2: Manages loan information.
- Branch 3: Handles transaction records.
- The DDBMS integrates all branches while maintaining data consistency.
4. Distributed Query Processing
- Queries in distributed databases must fetch data from multiple locations.
- The query is optimized for minimal communication and fast execution.
✔ Example: Querying Distributed Data
SELECT Customers.Name, Orders.OrderID
FROM Customers, Orders
WHERE Customers.CustomerID = Orders.CustomerID
AND Customers.Region = 'North';
✅ Retrieves only relevant data from the North region.
5. Distributed Transaction Management
A. ACID Properties in Distributed Systems
Transactions must follow:
✔ Atomicity: All operations succeed or fail together.
✔ Consistency: Database remains in a valid state.
✔ Isolation: Transactions do not interfere with each other.
✔ Durability: Committed transactions are saved permanently.
✔ Example: Distributed Bank Transfer
1️⃣ Deduct $500 from Account A (Location 1).
2️⃣ Add $500 to Account B (Location 2).
3️⃣ Ensure both updates succeed or both fail.
✅ Prevents incomplete transactions.
B. Two-Phase Commit Protocol (2PC)
Ensures distributed transactions are committed properly.
Phase | Action |
---|---|
Phase 1 (Prepare) | The transaction manager asks all sites if they can commit. |
Phase 2 (Commit/Rollback) | If all sites agree → commit; Otherwise → rollback. |
✔ Example:
1️⃣ Bank transaction starts.
2️⃣ System checks if both locations can commit.
3️⃣ If yes → Money is transferred.
4️⃣ If no → Transaction is canceled.
✅ Ensures data consistency across multiple locations.
6. Advantages & Disadvantages of Distributed Databases
✅ Advantages
✔ Faster Data Access – Queries are processed locally.
✔ Scalability – Easy to add more servers.
✔ High Availability – Data is replicated for fault tolerance.
✔ Improved Performance – Workload is distributed.
❌ Disadvantages
❌ Complex Management – Requires careful planning and administration.
❌ Data Synchronization Issues – Updates across multiple sites may be delayed.
❌ High Setup Cost – Needs networking and multiple servers.
❌ Security Risks – More entry points can lead to vulnerabilities.
7. Conclusion
🔹 A Distributed Database spreads data across multiple locations to improve performance, availability, and fault tolerance.
🔹 Uses fragmentation, replication, and allocation to distribute data efficiently.
🔹 Requires query optimization and transaction management to maintain consistency and reliability.
🔹 Despite challenges, it is widely used in banking, e-commerce, cloud storage, and global enterprises.