📡 Communication in Distributed Operating Systems
In a Distributed Operating System, multiple processes running on different machines need to communicate and coordinate with each other to perform tasks. Unlike centralized systems, communication in DOS relies on network-based messaging.
🔄 Types of Communication
Type | Description |
---|---|
Message Passing | Primary method; processes exchange messages over the network. |
Remote Procedure Call (RPC) | A process calls a procedure on a remote machine as if it were local. |
Remote Method Invocation (RMI) | Like RPC, but used in object-oriented systems (e.g., Java RMI). |
Sockets | Low-level communication channels using TCP/UDP protocols. |
Shared Memory Emulation | Not directly possible across machines, but emulated using middleware. |
📬 Message Passing Mechanism
Distributed systems often use send and receive primitives.
send(destination, message)
receive(source, message)
Key challenges:
- Reliability: Messages can be lost or duplicated.
- Ordering: Ensure messages are received in correct order.
- Latency: Communication delay due to network.
⚙️ Remote Procedure Call (RPC)
- Simplifies communication by allowing remote function execution.
- Looks like a local call but actually sends message over the network.
- Consists of:
- Stub (client and server)
- Binder (name-to-address mapping)
- Transport protocol (e.g., TCP)
Example:
int result = add(x, y); // Might be running on a remote server
🕒 Synchronization in Distributed Operating Systems
In a DOS, synchronization is crucial to ensure data consistency and correct process coordination.
🔁 Why Synchronization Is Needed
- Avoid race conditions when multiple processes access shared resources.
- Ensure mutual exclusion.
- Maintain consistency of replicated data.
- Coordinate execution order among processes.
⏰ Types of Synchronization
1. Clock Synchronization
Due to the absence of a global clock, systems must synchronize their clocks to ensure event ordering.
⏳ a. Cristian’s Algorithm
- A client requests time from a time server.
- Adjusts its clock based on response and estimated delay.
⏰ b. Berkeley Algorithm
- Central node polls all other nodes for their time.
- Calculates average and instructs nodes to adjust.
🕰️ c. Network Time Protocol (NTP)
- Used on the Internet.
- Synchronizes clocks using a hierarchy of time servers.
2. Logical Clocks
Used to maintain event ordering when physical clocks can’t be perfectly synchronized.
a. Lamport Timestamps
- Each event gets a timestamp.
- If A → B, then timestamp(A) < timestamp(B).
- Useful for causal ordering.
b. Vector Clocks
- Each process maintains a vector of counters.
- Provides precise causality between events.
3. Mutual Exclusion in Distributed Systems
Mutual exclusion ensures that only one process accesses a critical section at a time.
🧮 Algorithms:
Algorithm | Description |
---|---|
Centralized | One coordinator grants access to critical section. Easy but not fault-tolerant. |
Token-Based | A token circulates among processes; only the token holder can enter the critical section. |
Ricart-Agrawala | Processes exchange timestamped messages to request entry. |
Maekawa’s Algorithm | Voting-based; nodes request permission from a subset (quorum). |
4. Election Algorithms
Used to select a coordinator or leader in distributed systems.
Examples:
- Bully Algorithm
- Ring Algorithm
These help in fault tolerance and managing shared resources.
🧠 Summary Table
Aspect | Description |
---|---|
Communication | Message passing, RPC/RMI, sockets, shared memory (simulated) |
Synchronization | Ensures ordered execution, data consistency, and mutual exclusion |
Time Sync | NTP, Cristian’s, Berkeley, Lamport/Vector Clocks |
Mutual Exclusion | Centralized, Token-based, Ricart-Agrawala, Maekawa’s |
Tools Used | Timestamps, stubs, daemons, coordinators, tokens |
✅ Conclusion
In a Distributed Operating System:
- Communication ensures that processes across different machines can exchange data and collaborate.
- Synchronization ensures that concurrent operations happen in a correct, predictable order despite the absence of a global clock.
These two functions are critical to building reliable, consistent, and efficient distributed systems.