Distributed transactions
Last updated
Last updated
Awesome Distributed transactions. A curated selection of distributed transactions protocols
ARTICLE is used for following content.
In general, there are three scenarios for distributed transactions:
Cross-database distributed transactions
Cross-service distributed transactions
Hybrid distributed transactions
4 core roles:
RM (Resource Manager): RDBMS (also message queues) Provides interfaces for data operation and management to ensure data consistency and integrity. A database management system is the most representative RM instance. Some file systems and message queue (MQ) systems can also be considered as RM instances.
TM (Transaction Manager): Serves as the coordinator, which coordinates the behaviours of all RMs associated with cross-database transactions.
Application Program (AP): Calls the RM interface in accordance with business rules to modify business model data. If a data change involves multiple RMs and transaction consistency and integrity must be protected, AP defines the boundary of a transaction through a TM, which is responsible for coordinating RMs involved in the transaction to complete a global transaction.
Communication Resource Manager (CRM): Responsible for transmitting transactions across services.
The XA specification divides the distributed transaction processing process into the following two phases, hence, it is also called a two-phase commit protocol:
1) Preparation phase
The TM logs transaction startup and queries each RM about whether it is ready to perform preparatory operations.
After an RM receive the order, it evaluates its own state and tries to perform preparatory operations for the local transaction, such as reserving resources, locking resources, and performing operations. Then, the RM waits for subsequent orders from the TM without committing the transaction. If the preceding attempt fails, the RM informs the TM that the execution of this phase fails and rolls back the performed operations. Then, the RM no longer participates in this transaction. For example, MySQL will lock resources and write the redo and undo logs at this phase.
The TM collects the RM's responses and logs the transaction preparation as completed.
2) Commit or rollback phase
This phase initiates the transaction commit or rollback operation based on the coordination result of the previous phase.
If all RMs responded success in the previous step, then the following actions take place:
The TM logs the transaction as committed and issues the transaction commit order to all RMs.
After RMs receive the order, they commit the transactions, release the resources, and respond the TM with "commit completed".
If the TM receives responses from all of these RMs, it logs the transaction as completed.
If any RM responds with an execution failure or does not respond timely in the previous step, the TM will regard the transaction as failed. Then, the following actions take place:
The TM logs the transaction as aborted and issues the transaction rollback order to all RMs.
After receiving the order, RMs roll back the transaction, release the resources, and respond the TM with "rollback completed".
If the TM receives responses from all of these RMs, it logs the transaction as completed.
The XA specification also defines the following optimization measures:
If the TM finds that only one RM is involved in the whole transaction, the entire process will be degraded to a one-phase commit.
If the data operation from the AP received by an RM is a read-only operation, the RM can complete the transaction in phase one and inform the TM that phase two is no longer needed. If this is the case, dirty reads may occur.
If the RM does not receive the order for proceeding to phase two long after phase one is completed, it can commit or roll back the local transaction on its own. This situation is called the Heuristic Completion. Note that this issue may undermine the consistency of transactions and lead to exceptions.
Errors and network timeouts may occur during transaction execution. For these exceptions, different implementations may have different exception handling methods, as described below.
If the TM encounters downtime before querying RMs in phase one, no operations will be needed after recovery from the downtime.
If the TM encounters downtime after querying RMs in phase one, as some of the RMs may have received the query, sending rollback requests to these RMs is required.
If the TM encounters downtime after querying RMs in phase one but before logging the preparatory operations as completed, sending rollback requests to the RMs is needed after recovery from the downtime, because the TM has no idea about the negotiation result before the downtime.
If the TM encounters downtime after logging the transaction preparation as completed in phase one, commit or rollback orders can be issued based on the resulting log after the recovery from the downtime.
If the TM encounters downtime before generating the commit or abort log in phase two, commit or rollback orders can be issued based on the log after the recovery from the downtime.
If the TM encounters downtime before logging the transaction as completed in phase two, commit or rollback orders can be issued based on the resulting log after the recovery from the downtime.
If the TM encounters downtime after logging the transaction as completed in phase two, no operation will be needed after the recovery from the downtime.
When any RM does not respond promptly in phase one, the TM issues the rollback order to all of the RMs.
When any RM does not respond promptly in phase two, the TM continually issues the rollback order to the unresponsive RMs.
Feature Analysis
The XA two-phase commit protocol is designed to implement the atomicity, consistency, isolation, and durability of a transaction like local transactions.
Atomicity: ensures that transactions are atomic in the preparation and commitment phases.
Consistency: The XA protocol implements strong consistency.
Isolation: XA transactions hold resource locks until they are completed, which achieves write isolation.
Durability: Durability can be ensured since XA transactions are based on local transactions.
The XA specification is the earliest distributed transaction specification. Mainstream database products, such as Oracle, MySQL, and SQL Server, support the XA specification. Note that the JTA specification in J2EE is also based on the XA specification, and therefore is compatible with the XA specification.
XA is a distributed transaction model implemented at the resource management layer. It features a low degree of intrusion into businesses.
XA's two-phase commit protocol can cover the three scenarios of distributed transactions. However, an RM keeps locking resources during the execution of a global transaction. If the transaction involves too many RMs, especially in cross-service scenarios, the number and time consumption of network communications will rapidly increase. Consequently, the blocking time will be prolonged, the throughput of the system declines, and the probability of transaction deadlocks increases. Therefore, the two-phase commit protocol is not suitable for the cross-service distributed transaction mode in microservice scenarios.
Each TM realm creates a single point, which may incur a single point of failure. If the TM crashes after phase one, the participating RMs will not receive the order for phase two and therefore hold the resource locks for a long time. This, as a result, affects the throughput of the business. On the other hand, in a complete global transaction, the TM interacts with RMs eight times, which causes complexity and performance decline.
In addition, the two-phase protocol may cause split-brain exceptions. If the TM goes down after instructing RMs to commit the transaction in phase two, and only some of the RMs receive the commit order, then, when the TM restores, it cannot coordinate all the RMs in maintaining the local transaction consistency.
XA has to deal with many exception scenarios, which are challenging to the framework implementation. Regarding open-source implementations, you can refer to Atomikos and Bitronix.
To solve problems in the two-phase commit protocol, an improved three-phase commit solution was proposed. This new solution removes the single point of failure (SPOF) and adds a timeout mechanism for RMs to avoid long-term locking of resources. However, the three-phase solution cannot solve the split-brain problem and is rarely applied to practical cases. If you are interested in this solution, you can read relevant material on it.
Try, Commit, and Cancel (TCC) is a model of compensating transactions. This model requires each service of an application to provide three interfaces, that is, the try, commit, and cancel interfaces. The core idea of this model is to release the locking of resources at the quickest possible time by reserving the resources (providing intermediate states). If the transaction can be committed, the reserved resources are confirmed. If the transaction needs to be rolled back, the reserved resources are released.
TCC is also a two-phase commit protocol, and can be considered to be a variant of the two-phase-commit XA, which does not hold resource locks for a long time.
The TCC model divides the commitment of transactions into two phases:
1) Phase one
In phase one, TCC performs business check (for consistency) and reserves business resources (for quasi-isolation). This defines the try operation of TCC.
2) Phase two
If the reservation of all business resources is successful in the try phase, the confirm operation is performed. Otherwise, the cancel operation is performed.
Confirm: The confirm operation acts only on the reserved resources and does not check the business. If the operation fails, the system keeps retrying.
Cancel: The cancel operation cancels the execution of a business operation, releases the reserved resources, and retries if it fails.
In the TCC model, both the initiator and participants of a transaction need to record logs for the transaction. The initiator records the status and information of the global transaction and each branch transaction. Participants record the status of the branch transactions.
At any stage of the TCC transaction execution process, exceptions such as downtime, restart, and network disconnection may occur. If this is the case, the transaction enters a non-atomic and non-consistent state. Hence, it is necessary to commit or roll back the remaining branch transactions based on the logs of the main transaction and the branch transactions. In this way, all transactions in the entire distributed system can reach the final consistency and atomicity.
Example of TCC
Take an e-commerce system as an example. Xiao Ming bought a book on Taobao for RMB 100 and received 10 credits for this purchase. The purchase involves the following operations in the system:
The order system creates a commodity order.
The payment system accepts payment from Xiao Ming.
The inventory system deducts the product inventory.
The membership system rewards credits to Xiao Ming's account.
These operations need to be executed as one transaction and must be all successful or all canceled.
If the TCC model is used in this case, the systems must be transformed as follows:
1) The order system
Try: The system creates an order whose status is "pending payment".
Confirm: The system updates the status of the order to "completed".
Cancel: The system updates the status of the order to "canceled".
2) The payment system
Try: Assume that Xiao Ming has RMB 1,000 in his account, and the system freezes RMB 100 upon the purchase. At this time, Xiao Ming still has a balance of RMB 1,000 in the account.
Confirm: The system changes the account balance to RMB 900 and clears the freeze record.
Cancel: The system clears the freeze record.
3) The inventory system
Try: Suppose there are 10 books in the inventory, and the system freezes one of them. In this step, there are still 10 books in the inventory.
Confirm: The system updates the inventory to 9 books and clears the freeze record.
**Cancel: The system clears the freeze record.
4) The membership system
**Try: Assume that Xiao Ming has 3,000 credits in the account, and the system now rewards him 10 credits. In this step, the credits in Xiao Ming's account are still 3,000.
**Confirm: The system updates the credit to 3,010 and clears the credit preparation record.
Cancel: The system clears the credit preparation record.
Feature Analysis
TCC transactions have four features:
Atomic: The initiator of the transaction coordinates and ensures that either all the branch transactions are committed or all of them are rolled back.
Consistency: TCC transactions ensure eventual consistency.
Isolation: TCC implements data isolation by pre-allocating resources in the try phase.
Durability: TCC implements durability by coordinating each branch transaction.
The TCC transaction model is intrusive for the business as the business side needs to split one interface to three, which results in high development costs.
At the same time, in order to avoid exceptions caused by communication failures or timeouts in asynchronous networks, TCC requires the business side to follow three policies in design and implementation:
Allow null rollbacks: When some participants do not receive the try request in phase one, the system will cancel the entire transaction. If a participant who fails or does not perform the try operation receives a cancel request, a null rollback operation is required.
Maintain idempotence: When exceptions like network timeouts occur in phase two, the confirm and cancel methods are repeatedly called. Therefore, the implementation of these two methods must be idempotent.
Prevent resource suspension: Network exceptions disturb the execution order of the two phases, which makes the try request arrive later than the cancel request on the participant side. The cancel operation will perform a null rollback to ensure the correctness of the transaction, and the try operation will not be executed.
TCC implements distributed transactions at the business layer rather than the resource layer, which allows businesses to flexibly select a resource locking granularity. In addition, locks will not always be held in the global transaction execution process, therefore, the system throughput is much higher than that in the two-phase-commit XA mode.
Open-source frameworks that support TCC transactions include ByteTCC, Himly, and TCC-transaction.
Saga, like TCC, is also a compensating transaction model, but it does not include a try phase. Saga regards distributed transactions as a transaction chain that is composed of a group of local transactions.
Each forward transaction operation in the transaction chain corresponds to a reversible transaction operation. The Saga transaction coordinator executes branch transactions in the transaction chain in sequence. After the branch transactions are all executed, resources are released. However, if a branch transaction fails, a compensating operation is performed in the opposite direction.
Assume that a Saga distributed transaction chain is composed of n branch transactions, that is, [T1, T2, ..., Tn]. Then, there are three possible conditions where the distributed transaction executes:
T1, T2, ..., Tn: A total of n transactions are executed successfully.
T1, T2, ..., Ti, Ci, ..., C2, C1: The execution failed at the i-th (i<=n) transaction. Then, compensating operations are called in sequence from i to 1. If any compensating operation fails, it will retry until it is successful. Compensating operations can be optimized for parallel execution.
T1, T2, ..., Ti (failure), Ti (retry), Ti (retry), ..., Tn: Applies to scenarios where transactions must succeed. If a failure occurs, the transaction will keep retrying and no compensating operation will be performed.
Example of Saga
Assume that Xiao Ming wants to take a trip on the National Day holiday. He plans to depart from Beijing, spend three days in London, and then pay a three-day visit to Paris before returning to Beijing. The whole trip involves ticket reservations from different airlines and hotel reservations in London and Paris. Xiao Ming's plan is to cancel the trip if any of the reservations fail. Assume that a comprehensive travel service platform can make all reservations with one click, which resembles a long transaction. If the service is arranged by using Saga, as shown in the following figure, the trip reservation will be canceled through compensating operations when any of the reservations fail.
Feature Analysis
Saga transactions guarantee three transaction features:
Atomicity: The Saga coordinator can ensure that local transactions in the transaction chain are all committed or all rolled back.
Consistency: Saga transactions ensure eventual consistency.
Durability: Durability can be ensured as Saga is based on local transactions.
However, Saga does not guarantee the isolation of transactions. A local transaction will be visible to other transactions after it is committed. If other transactions have changed the data that has been submitted successfully, the compensating operation may fail. For example, the deduction fails, but the money in the account has gone. Therefore, we need to consider this scenario and avoid this problem from business design.
Saga transactions, like TCC transactions, require the business design and implementation to follow three policies:
Allow null compensation: Transaction participants may receive the compensation order before performing normal operations due to network exceptions. In this case, null compensation is required.
Maintain idempotence: Forward operations and compensating operations can both be repeatedly triggered. Therefore, the idempotence of operations must be correct.
Prevent resource suspension: If the forward operation arrives later than the compensating operation due to network exceptions, the forward operation must be discarded. Otherwise, resource suspension occurs.
Although Saga and TCC are both compensating transaction models, they are different due to different commit phases.
Saga adopts imperfect compensation. The compensation operation will leave traces of the original transaction operations. Therefore, the impact on the business must be considered.
TCC adopts perfect compensation. The compensating operation will completely clean up the original transaction operations, and users will not be able to perceive the status information before the transaction is canceled.
TCC can better support asynchronization, whereas Saga is generally more suitable for asynchronization in the compensating phase.
The Saga mode is suitable for long transactions and microservices, as it is less intrusive to business. At the same time, Saga uses the one-phase commit mode, which does not lock resources for a long time and has no "cask effects." Therefore, systems with this architecture have high performance and high throughput.
Both Alibaba's open-source project Seata and Huawei's open-source project ServiceComb support Saga.
There are two main solutions for the message-based distributed transaction mode:
A solution based on transactional messages
A solution based on local messages
The following figure shows the processes for sending local transactions and transactional messages.
The transaction initiator sends a transactional message in advance.
After MQ receives the transactional message, it persists the message, updates the message state to "to be sent", and sends an acknowledge (ACK) message to the sender.
If the transaction initiator does not receive the ACK message, the execution of the local transaction is canceled. If the transaction initiator receives the ACK message, the local transaction is executed and another message is sent to the MQ system to notify the execution of local transactions.
After MQ receives the notification, it changes the message state of the transaction based on the execution result of the local transaction. If the execution is successful, MQ changes the message state to "consumable" and delivers it to the subscribers. If the execution fails, the message is deleted.
When a local transaction is executed, the notification message sent to MQ may be lost. Therefore, MQ, which supports transactional messages, has a regular scanning logic. By scanning, MQ identifies messages that stay in the "to be sent" state and initiates a query to the sender of the message for the final state of the message. Based on the query result, MQ updates the message state accordingly. Hence, the initiator of the transaction needs to provide the MQ system with an interface for querying the transactional message state.
If the state of the transactional message is "ready to send", MQ pushes the message to downstream participants. If the push fails, the system will keep retrying.
Upon receiving the message, the downstream participants execute the local transactions. If the execution of the local transactions is successful, an ACK message is sent to the MQ system. If the execution fails, no ACK message is sent. In this case, MQ continuously pushes messages to the downstream participant who does not return the ACK message.
The transactional-message-based model places high requirements on the MQ system. Not all MQ systems support transactional messages. RocketMQ is one of the few MQ systems that support transactional messages. If the MQ system does not support transactional messages, the local message mode can be used.
The core concept of this mode is that the transaction initiator maintains a local message table. Business and local message table operations are executed in the same local transaction. If the service is successfully executed, a message with the "to be sent" state is also recorded in the local message table. The system starts a scheduled task to regularly scan the local message table for messages that are in the "to be sent" state and sends them to MQ. If the sending fails or times out, the message will be resent until it is sent successfully. Then, the task will delete the state record from the local message table. The subsequent consumption and subscription process is similar to that of the transactional message mode.
Feature Analysis
The message-based distributed transaction mode supports ACID as follows:
Atomicity: Branch transactions are either all executed or all canceled.
Consistency: Eventual consistency is ensured.
Isolation: Isolation is not guaranteed.
Durability: Durability is guaranteed by local transactions.
Message-based distributed transactions can effectively decouple distributed systems from other systems, hence the calls between transaction participants are not synchronous calls.
Message-based distributed transactions have high requirements for the MQ system and bring certain intrusion to the business. For such transactions, either the interface for querying the transaction message status must be provided or the local message table needs to be maintained. In addition, message-based distributed transactions do not support transaction rollback. If a transaction fails, it must retry until it is successful. This feature makes it applicable to limited business scenarios that are less sensitive to eventual consistency. For example, calls between cross-enterprise systems.
Best-effort notification-based distributed transactions
The best-effort notification-based distributed transaction is another solution based on MQ systems, but it does not require MQ messages to be reliable.
Example of Message-Based Distributed Transaction
Suppose Xiao Ming uses the Unicom mobile app to pay phone bills, and the prepayment method is Alipay. The whole process is as follows:
Xiao Ming chooses "RMB 50" as the prepayment amount and "Alipay" as the prepayment method.
The Unicom app creates a prepayment order with the "paying" state and redirects to the Alipay payment page. That is, the process now enters the Alipay system.
Alipay confirms Xiao Ming's payment, and then deducts RMB 50 from his account and adds RMB 50 to Unicom's account. After the execution is completed, a message indicating whether the payment is successful is sent to the MQ system. In this case, the message sending is not guaranteed to be successful.
If the message is sent successfully, Alipay notification service subscribes to the message and calls the Unicom interface to notify Xiao Ming of the payment result. If the Unicom service fails, Alipay will call the Unicom interface repeatedly at increasing intervals of 5 minutes, 10 minutes, 30 minutes, 1 hour, ..., and 24 hours, until the interface is successfully called or the time window threshold is reached. This working principle reflects the meaning of best-effort notification.
When the Unicom service restores and receives the notification from Alipay, it prepays to the account if the payment is successful. Otherwise, the service will cancel the prepayment. After Unicom completes the operations, it responds to Alipay notification service with the confirmation. If the confirmation fails, Alipay will retry the request. Therefore, Unicom's prepayment interface needs to maintain idempotence.
If the Unicom service is restored long after the time window limit of Alipay notification service, then Unicom scans the order with the "paying" state and initiates a request to Alipay to verify the payment result of the order.
Alibaba provides two distributed transaction middleware:
The first middleware is XTS developed by the Ant Financial team, whose alias is DTX as a financial cloud product.
The other middleware is TXC developed by the Alibaba middleware team.
XTS and TXC have similar functions, and both support the TCC transaction mode. They also provide distributed transaction solutions that are less intrusive to business. Currently, the two teams are working on developing the open-source distributed transaction middleware, Simple Extensible Autonomous Transaction Architecture (Seata). Next, let's take a look at Seata.
Seata supports the TCC and Saga modes. Specially, Seata provides a solution with zero intrusion to TCC. This solution is called the Automatic Transaction (AT) mode. The mechanism of the AT mode is as follows:
The global transaction is still based on branch transactions. The Seata server coordinates that branch transactions will either commit together or roll back together.
When each branch transaction is running, the Seata client locates the data row by parsing the SQL statements of the transaction after SQL proxying and interception. The client then creates snapshots for the data row before and after the SQL execution, which are beforeImage and afterImage. These two images compose the rollback log that is stored in a separate table. The writing of the rollback log and the change of the business data are committed in the same local transaction.
When the branch transaction is completed, the lock on local resources is released immediately, and the result of the transaction execution is reported to the Seata coordinator.
Then, the Seata coordinator summarizes the completion of each branch transaction, generates the resolution for committing or rolling back the transaction, and sends the resolution to the Seata client.
If the resolution is to commit the transaction, the Seata client asynchronously clears the rollback log. If the resolution is to roll back the transaction, the Seata client performs compensating operations according to the rollback log. Before the compensation, the client compares the current snapshot with the afterImage. If they are inconsistent, the client will not be able to roll back and manual intervention is required.
The AT mode automatically generates rollback logs, which lowers access cost and business intrusion. However, the AT mode has the following limits:
The AT mode supports only relational databases based on ACID transactions.
The AT mode relies on SQL parsing but provides limited support for SQL syntax. In addition, compatibility needs to be considered for cases with complex SQL statements.
The AT mode currently does not support compound primary keys. Therefore, you must add an auto-increment primary key when designing business tables.
The default isolation level of global transactions is Read Uncommitted. However, statements such as SELECT...FOR UPDATE can be used to achieve the Read Committed isolation level. The isolation level between Read Uncommitted and Read Committed can be achieved through the global exclusive write lock.
BA - basic availability (i.e. service availability is at least 99.9%)
S - soft state
E - eventuel consistency
👎ANTI PATTERN!!!
we are loosing boundaries of the services
background process is aware of all DB schemas and every change in schema should be reflected in this component
this is a HUGE coupling of services
👎Also quite a bad idea of way of synchronisation.
Response time is high
data consistency and integrity suffers. E.g. if preference service failed => you have already a committed transaction in wish list service.
By increase of number of services to sync => complexity increases
Customer information service becomes very coupled to all other services
👍Best idea
customer service is not coupled to other services
response time is not impacted