⭐ DISTRIBUTED TRANSACTIONS (Detailed Discussion)
A Distributed Transaction is a database transaction that accesses data stored at multiple sites (servers) in a distributed database system.
Even though data is located at different physical locations, the transaction must behave as one unified, atomic unit of work.
Example:
A banking transaction transferring money between accounts stored at different branches (each branch = a site).
A distributed transaction must preserve ACID properties across all sites.
⭐ WHY DISTRIBUTED TRANSACTIONS?
✔ Organizations operate in many branches/locations
✔ Data is fragmented or replicated across sites
✔ One transaction may require data from multiple sites
Examples:
- Airline booking (seat data at different servers)
- Online shopping (inventory at warehouse + payment server)
- Banking (accounts at different branches)
⭐ ACID PROPERTIES IN DISTRIBUTED TRANSACTIONS
Distributed transactions must maintain:
✔ Atomicity
All sub-transactions across sites commit or abort together.
✔ Consistency
Data should remain valid across all sites after execution.
✔ Isolation
Concurrent distributed transactions must not interfere.
✔ Durability
Committed changes survive failures at any site.
Guaranteeing ACID across multiple sites is complex and requires special protocols.
⭐ HOW DISTRIBUTED TRANSACTIONS WORK?
A distributed transaction involves:
- Global Transaction Manager (Coordinator)
- Local Transaction Managers at each site (Participants)
- Communication network
- Commit protocol (2PC or 3PC)
⭐ COMPONENTS OF DISTRIBUTED TRANSACTIONS
✔ 1. Coordinator
- Manages entire distributed transaction
- Initiates the commit process
- Communicates with all participant sites
✔ 2. Participants (Subordinates)
Each site involved has its own local manager that:
- Executes part of the transaction
- Reports status to coordinator
- Commits/aborts based on coordinator’s instructions
⭐ STEPS IN A DISTRIBUTED TRANSACTION
- Start – Coordinator begins the transaction
- Execute – Operations distributed to relevant sites
- Prepare to Commit – Each local site checks if it can commit
- Commit/Abort Decision – Coordinator decides and informs all sites
- Completion – All sites commit or abort together
⭐ FAILURE TYPES IN DISTRIBUTED TRANSACTIONS
Distributed environments face additional failure types:
✔ Site failures
One site crashes.
✔ Communication failures
Messages lost, network partition.
✔ Transaction failures
Local constraint violations.
✔ Coordinator failure
Coordinator crashes before final decision.
⭐ SOLUTION: COMMIT PROTOCOLS
To maintain atomicity across sites, we use atomic commit protocols.
The main protocols are:
- Two-Phase Commit (2PC)
- Three-Phase Commit (3PC)
⭐ 1. TWO-PHASE COMMIT PROTOCOL (2PC) (Very important)
Ensures all sites either commit or abort together.
✔ Phase 1: PREPARE Phase
Coordinator → “Prepare to commit?”
Each participant:
- Writes local logs
- Replies YES or NO
✔ Phase 2: COMMIT Phase
If ALL sites replied YES → Coordinator sends COMMIT
If ANY site replied NO → Coordinator sends ABORT
✔ Guarantees Atomicity
But if the coordinator crashes → participants may block (wait forever).
2PC is blocking.
⭐ 2. THREE-PHASE COMMIT PROTOCOL (3PC)
Designed to overcome blocking in 2PC.
Phases:
- CanCommit?
- PreCommit
- Commit
✔ Non-blocking
If coordinator fails → participants can still reach a decision.
✔ More messages → More overhead
Rarely implemented in practice.
⭐ DISTRIBUTED CONCURRENCY CONTROL
For correctness, distributed transactions need concurrency control mechanisms:
✔ Distributed Locking (Global Lock Manager)
- Each site uses local locks
- Coordinator resolves conflicts
✔ Distributed Timestamp Ordering
- Global timestamps ensure serializability
✔ Distributed Deadlock Detection
- Using wait-for graphs across sites
- Edge-chasing (probe messages)
⭐ DISTRIBUTED DEADLOCKS
Occurs when:
- T1 waits for T2 at Site A
- T2 waits for T1 at Site B
Detection methods:
- Centralized deadlock detector
- Distributed wait-for graph
- Edge-chasing algorithm
Resolution:
Abort one of the transactions.
⭐ DISTRIBUTED RECOVERY
If failure occurs:
✔ Use distributed logs
✔ Use WRITE-AHEAD logging (WAL)
✔ Recovery completed using 2PC logs
✔ Checkpointing used to reduce recovery overhead
If a site crashes:
- Participants that have committed must redo
- Participants that have not committed must undo
⭐ ADVANTAGES OF DISTRIBUTED TRANSACTIONS
✔ Data consistency across sites
✔ Supports global applications
✔ Increases reliability
✔ Data can be processed closer to where it resides
✔ Reduces communication cost for local operations
⭐ DISADVANTAGES
✘ High communication overhead
✘ Concurrency control more complex
✘ Distributed deadlocks
✘ Complex recovery
✘ Slow commit due to network delays
✘ 2PC is blocking
⭐ EXAMPLE (MCA STYLE)
A transaction transfers money between:
- Account A stored at Bank Server 1
- Account B stored at Bank Server 2
Steps:
- Coordinator sends debit instruction to Site 1
- Sends credit instruction to Site 2
- Both sites prepare
- Coordinator decides commit/abort
- Both sites execute final decision
Ensures the transfer is either fully completed or not done at all.
⭐ Perfect 5–6 Mark Short Answer
A distributed transaction is a transaction that accesses data stored at multiple sites in a distributed database system. It must maintain ACID properties across all nodes. Distributed transactions are managed using a coordinator and participant sites. Commit protocols like Two-Phase Commit (2PC) and Three-Phase Commit (3PC) ensure atomicity, while distributed concurrency control, deadlock detection, and failure recovery ensure correctness. These transactions enable consistent updates across geographically dispersed databases.
