Consistency
Consistency
Any payment bugs that are related to correctness would cause an unacceptable customer experience. When an error occurs it needs to be corrected immediately. Further, the process to remediate such mistakes is time consuming, and usually is complicated due to various legal and compliance constraints.
Distributed transactions
Applicable scenarios
In general, there are three scenarios for distributed transactions:
Cross-database distributed transactions
Cross-service distributed transactions
Hybrid distributed transactions
Use transactional outbox to implement SAGA
Reference: infoq.com/articles/saga-orchestration-outbox/
Transactional outbox
https://microservices.io/patterns/data/transactional-outbox.html
https://medium.com/engineering-varo/event-driven-architecture-and-the-outbox-pattern-569e6fba7216
Benefits
2PC is not used Messages are guaranteed to be sent if and only if the database transaction commits Messages are sent to the message broker in the order they were sent by the application
Drawbacks
This pattern has the following drawbacks:
Potentially error prone since the developer might forget to publish the message/event after updating the database.
More than once delivery
Problem: The Message Relay might publish a message more than once. It might, for example, crash after publishing a message but before recording the fact that it has done so. When it restarts, it will then publish the message again.
Solution:
A message consumer must be idempotent, perhaps by tracking the IDs of the messages that it has already processed. Fortunately, since Message Consumers usually need to be idempotent (because a message broker can deliver messages more than once) this is typically not a problem.
To allow consumers to detect and ignore duplicate messages, each message should have a unique id. This could for instance be a UUID or a monotonically increasing sequence specific to each message producer, propagated as a Kafka message header.
Example problem for SAGA
Apply outbox pattern on Saga orchestration
Ordering
For scaling purposes, Kafka topics can be organized into multiple partitions.
Only within a partition, it is guaranteed that a consumer will receive the messages in exactly the same order as they have been sent by the producer. As by default, all messages with the same key will go into the same partition, the unique id of a Saga is a natural choice for the Kafka message key. That way, the correct order of processing of the messages of one one Saga instance is ensured.
Several Saga instances can be processed in parallel if they end up in different partitions of the topics used for the Saga message exchange.
Two patterns for message relay
Transactional log tailing
https://microservices.io/patterns/data/transaction-log-tailing.html
The mechanism for trailing the transaction log depends on the database:
MySQL binlog
Postgres WAL
AWS DynamoDB table streams
Polling publisher
https://microservices.io/patterns/data/polling-publisher.html
Success and failure flow
Success case
Failure case
Storage
id: Unique identifier of a given Saga instance, representing the creation of one particular purchase order
currentStep: The step at which the Saga currently is, e.g., “credit-approval” or “payment”
payload: An arbitrary data structure associated with a particular Saga instance, e.g., containing the id of the corresponding purchase order and other information useful during the Saga lifecycle; while the example implementation uses JSON as the payload format, one could also think of using other formats, for instance, Apache Avro, with payload schemas stored in a schema registry
status: The current status of the Saga; one of STARTED, SUCCEEDED, ABORTING, or ABORTED
stepState: A stringified JSON structure describing the status of the individual steps, e.g., "{"credit-approval":"SUCCEEDED","payment":"STARTED"}"
type: A nominal type of a Saga, e.g., “order-placement”; useful to tell apart different kinds of Sagas supported by one system
version: An optimistic locking version, used to detect and reject concurrent updates to one Saga instance (in which case the message triggering the failing update needs to be retried, reloading the current state from the Saga log)
Example failure state transition for a purchase order whose payment fails
First, the order comes in and the “credit-approval” step gets started. At this point, a “credit-approval” request message has been persisted in the outbox table, too.
Once this has been sent to Kafka, the order service will process it and send a reply message. The order service processes this by updating the Saga state and starting the payment step:
Again a message is sent via the outbox table, now the “payment” request. This fails, and the payment system responds with a reply message indicating this fact. This means that the “credit-approval” step needs to be compensated via the customer system:
Once that has succeeded, the Saga is in its final state, ABORTED:
Validation
Account, entry, order
The Payments Platform inherits three key principles from double-entry bookkeeping:
Immutability of orders (once created, the payment orders are immutable: if an order was created in error, a new corrective order needs to be created),
Audatibility of all money movements (reliably stored and cannot be changed),
Error detection based on the zero-sum principle. Every entry to an account requires a corresponding and opposite entry to a different account.
The implementation of payments in Uber platform centers around three key concepts:
Entry - describes a single instance of a money movement to or from an entry (a customer, partner or Uber business),
Account represents the entity in payments, capturing all entries of that entity. The sum of money amounts in the account entries represents its balance,
Order - captures the payments for encapsulating all money movements among the involved parties (customers, partners, and Uber businesses).
Double entry bookkeeping
Conceptually, the Uber payment platform can be described as a generalized payments order processing system, based on the zero-sum principle. The processing of a payment order results in money movements to and from accounts. The zero-sum principle also originates from the double-entry bookkeeping and zero-proof bookkeeping, and in this context it means that sum of amounts (+ vs -) in each order has to be zero. For instance, a typical Uber payment order will involve the collection of the money from a customer for a service (e.g., ride-sharing), paying (disbursement) of a partner (e.g., a driver), as well as obtaining a service charge for a Uber business. These three entries will constitute an order, and the amount of money collected from the customer has to equal the amount of money obtained by partners and Uber businesses.
The zero-sum principle is a simple error detection mechanism, especially useful it in a loosely coupled distributed systems at the Uber scale. Processing of the order will results in several transactions, each potentially involving integration with different payment service providers and banks. As delays, network, and other failures will unavoidably happen, zero-sum principle provides a solid method to detect if any errors happened.
Validating side effects
Validation of processing results based on side-effects recording
References
Overview of distributed transactions
ACID distributed transactions
Assumptions
The protocol works in the following manner:
One node is designated the coordinator, which is the master site, and the rest of the nodes in the network are called cohorts.
Stable storage at each site and use of a write ahead log by each node.
The protocol assumes that no node crashes forever, and eventually any two nodes can communicate with each other. The latter is not a big deal since network communication can typically be rerouted. The former is a much stronger assumption; suppose the machine blows up!
Process
Success case
Failure case
TCC
https://www.zhihu.com/question/471722924
https://www.sofastack.tech/blog/sofa-channel-4-retrospect/
Transaction message based distributed transaction
Distributed Sagas
Motivation
Using distributed transaction to maintain data consistency suffers from the following two pitfalls
Many modern technologies including NoSQL databases such as MongoDB and Cassandra don't support them. Distributed transactions aren't supported by modern message brokers such as RabbitMQ and Apache Kafka.
It is a form of syncronous IPC, which reduces availability. In order for a distributed transaction to commit, all participating services must be available. If a distributed transaction involves two services that are 99.5% available, then the overall availability is 99\%. Each additional service involved in a distributed transaction further reduces availability.
Sagas are mechanisms to maintain data consistency in a microservice architecture without having to use distributed transactions.
Distributed sagas execute transactions that span multiple physical databases by breaking them into smaller transactions and compensating transactions that operate on single databases.
Definition
High entry bar: First need to build a state machine. A saga is a state machine.
A distributed saga is a collection of requests. Each request has a compensating request on failure. A dsitributed saga guarantees the following properties:
Either all Requests in the Saga are succesfully completed, or
A subset of Requests and their Compensating Requests are executed.
Limitation: Does not guarantee the separation
Solution 1: Semantic lock
Approaches
Event-driven choreography: When there is no central coordination, each service produces and listen to other service’s events and decides if an action should be taken or not.
Command/Orchestration: When a coordinator service is responsible for centralizing the saga’s decision making and sequencing business logic.
References
Distributed transactional middleware - Seata
Seata is an implementation of variants 2PC.
Uber Cadence
Cadence meetup: Introduction to Cadence
Use case: Long transaction example - UberEATS
Uber Cadence: Fault Tolerant Actor Framework
Use case: Long transaction example
Sagas
https://www.youtube.com/watch?v=txlSrGVCK18&t=2471s&ab_channel=InfoQ
TODO
Data Consistency in Microservices Architecture (Grygoriy Gonchar):https://www.youtube.com/watch?v=CFdPDfXy6Y0&ab_channel=Devoxx
Last updated