Three-phase commit protocol
In computer networking and databases, the three-phase commit protocol (3PC) is a distributed algorithm which lets all nodes in a distributed system agree to commit a transaction. It is a refinement of the two-phase commit protocol (2PC) which is more resilient to failures.
A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the Commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond.
The three-phase commit protocol eliminates this problem by introducing the Prepared to commit state. If the coordinator fails before sending preCommit messages, the cohort will unanimously agree that the operation was aborted. The coordinator will not send out a doCommit message until all cohort members have ACKed that they are Prepared to commit. This eliminates the possibility that any cohort member actually completed the transaction before all cohort members were aware of the decision to do so (an ambiguity that necessitated indefinite blocking in the two-phase commit protocol).
Using Skeen's original three-phase commit protocol, it is possible that a quorum becomes connected without being able to make progress (this is not a deadlock situation; the system will still progress if the network partitioning is resolved). Keidar and Dolev's E3PC refines Skeen's three-phase commit protocol and solves this problem in a way which *always* allows a quorum to make progress.
The protocol requires at least three round trips to complete, needing a minimum of three round trip times (RTTs). This is potentially a long latency to complete each transaction.
- Skeen, Dale (February 1982). A Quorum-Based Commit Protocol (Technical report). Department of Computer Science, Cornell University.
- Keidar, Idit; Danny Dolev (December 1998). "Increasing the Resilience of Distributed and Replicated Database Systems". Journal of Computer and System Sciences (JCSS). 57 (3): 309–324. doi:10.1006/jcss.1998.1566.