⭐ DISTRIBUTED TRANSACTIONS (Detailed Discussion)

A Distributed Transaction is a database transaction that accesses data stored at multiple sites (servers) in a distributed database system.
Even though data is located at different physical locations, the transaction must behave as one unified, atomic unit of work.

Example:
A banking transaction transferring money between accounts stored at different branches (each branch = a site).

A distributed transaction must preserve ACID properties across all sites.

⭐ WHY DISTRIBUTED TRANSACTIONS?

✔ Organizations operate in many branches/locations
✔ Data is fragmented or replicated across sites
✔ One transaction may require data from multiple sites

Examples:

Airline booking (seat data at different servers)
Online shopping (inventory at warehouse + payment server)
Banking (accounts at different branches)

⭐ ACID PROPERTIES IN DISTRIBUTED TRANSACTIONS

Distributed transactions must maintain:

✔ Atomicity

All sub-transactions across sites commit or abort together.

✔ Consistency

Data should remain valid across all sites after execution.

✔ Isolation

Concurrent distributed transactions must not interfere.

✔ Durability

Committed changes survive failures at any site.

Guaranteeing ACID across multiple sites is complex and requires special protocols.

⭐ HOW DISTRIBUTED TRANSACTIONS WORK?

A distributed transaction involves:

Global Transaction Manager (Coordinator)
Local Transaction Managers at each site (Participants)
Communication network
Commit protocol (2PC or 3PC)

⭐ COMPONENTS OF DISTRIBUTED TRANSACTIONS

✔ 1. Coordinator

Manages entire distributed transaction
Initiates the commit process
Communicates with all participant sites

✔ 2. Participants (Subordinates)

Each site involved has its own local manager that:

Executes part of the transaction
Reports status to coordinator
Commits/aborts based on coordinator’s instructions

⭐ STEPS IN A DISTRIBUTED TRANSACTION

Start – Coordinator begins the transaction
Execute – Operations distributed to relevant sites
Prepare to Commit – Each local site checks if it can commit
Commit/Abort Decision – Coordinator decides and informs all sites
Completion – All sites commit or abort together

⭐ FAILURE TYPES IN DISTRIBUTED TRANSACTIONS

Distributed environments face additional failure types:

✔ Site failures

One site crashes.

✔ Communication failures

Messages lost, network partition.

✔ Transaction failures

Local constraint violations.

✔ Coordinator failure

Coordinator crashes before final decision.

⭐ SOLUTION: COMMIT PROTOCOLS

To maintain atomicity across sites, we use atomic commit protocols.

The main protocols are:

Two-Phase Commit (2PC)
Three-Phase Commit (3PC)

⭐ 1. TWO-PHASE COMMIT PROTOCOL (2PC) (Very important)

Ensures all sites either commit or abort together.

✔ Phase 1: PREPARE Phase

Coordinator → “Prepare to commit?”

Each participant:

Writes local logs
Replies YES or NO

✔ Phase 2: COMMIT Phase

If ALL sites replied YES → Coordinator sends COMMIT
If ANY site replied NO → Coordinator sends ABORT

✔ Guarantees Atomicity

But if the coordinator crashes → participants may block (wait forever).

2PC is blocking.

⭐ 2. THREE-PHASE COMMIT PROTOCOL (3PC)

Designed to overcome blocking in 2PC.

Phases:

CanCommit?
PreCommit
Commit

✔ Non-blocking

If coordinator fails → participants can still reach a decision.

✔ More messages → More overhead

Rarely implemented in practice.

⭐ DISTRIBUTED CONCURRENCY CONTROL

For correctness, distributed transactions need concurrency control mechanisms:

✔ Distributed Locking (Global Lock Manager)

Each site uses local locks
Coordinator resolves conflicts

✔ Distributed Timestamp Ordering

Global timestamps ensure serializability

✔ Distributed Deadlock Detection

Using wait-for graphs across sites
Edge-chasing (probe messages)

⭐ DISTRIBUTED DEADLOCKS

Occurs when:

T1 waits for T2 at Site A
T2 waits for T1 at Site B

Detection methods:

Centralized deadlock detector
Distributed wait-for graph
Edge-chasing algorithm

Resolution:

Abort one of the transactions.

⭐ DISTRIBUTED RECOVERY

If failure occurs:

✔ Use distributed logs

✔ Use WRITE-AHEAD logging (WAL)

✔ Recovery completed using 2PC logs

✔ Checkpointing used to reduce recovery overhead

If a site crashes:

Participants that have committed must redo
Participants that have not committed must undo

⭐ ADVANTAGES OF DISTRIBUTED TRANSACTIONS

✔ Data consistency across sites
✔ Supports global applications
✔ Increases reliability
✔ Data can be processed closer to where it resides
✔ Reduces communication cost for local operations

⭐ DISADVANTAGES

✘ High communication overhead
✘ Concurrency control more complex
✘ Distributed deadlocks
✘ Complex recovery
✘ Slow commit due to network delays
✘ 2PC is blocking

⭐ EXAMPLE (MCA STYLE)

A transaction transfers money between:

Account A stored at Bank Server 1
Account B stored at Bank Server 2

Steps:

Coordinator sends debit instruction to Site 1
Sends credit instruction to Site 2
Both sites prepare
Coordinator decides commit/abort
Both sites execute final decision

Ensures the transfer is either fully completed or not done at all.

⭐ Perfect 5–6 Mark Short Answer

A distributed transaction is a transaction that accesses data stored at multiple sites in a distributed database system. It must maintain ACID properties across all nodes. Distributed transactions are managed using a coordinator and participant sites. Commit protocols like Two-Phase Commit (2PC) and Three-Phase Commit (3PC) ensure atomicity, while distributed concurrency control, deadlock detection, and failure recovery ensure correctness. These transactions enable consistent updates across geographically dispersed databases.

Distributed Transactions