Consistency

Any payment bugs that are related to correctness would cause an unacceptable customer experience. When an error occurs it needs to be corrected immediately. Further, the process to remediate such mistakes is time consuming, and usually is complicated due to various legal and compliance constraints.

Distributed transactions

Applicable scenarios

In general, there are three scenarios for distributed transactions:
- Cross-database distributed transactions
- Cross-service distributed transactions
- Hybrid distributed transactions

Use transactional outbox to implement SAGA

Reference: infoq.com/articles/saga-orchestration-outbox/

Transactional outbox

https://microservices.io/patterns/data/transactional-outbox.html
https://medium.com/engineering-varo/event-driven-architecture-and-the-outbox-pattern-569e6fba7216

Benefits

2PC is not used Messages are guaranteed to be sent if and only if the database transaction commits Messages are sent to the message broker in the order they were sent by the application

Drawbacks

This pattern has the following drawbacks:

Potentially error prone since the developer might forget to publish the message/event after updating the database.

More than once delivery

Problem: The Message Relay might publish a message more than once. It might, for example, crash after publishing a message but before recording the fact that it has done so. When it restarts, it will then publish the message again.
Solution:
- A message consumer must be idempotent, perhaps by tracking the IDs of the messages that it has already processed. Fortunately, since Message Consumers usually need to be idempotent (because a message broker can deliver messages more than once) this is typically not a problem.
- To allow consumers to detect and ignore duplicate messages, each message should have a unique id. This could for instance be a UUID or a monotonically increasing sequence specific to each message producer, propagated as a Kafka message header.

Example problem for SAGA

Apply outbox pattern on Saga orchestration

Ordering

For scaling purposes, Kafka topics can be organized into multiple partitions.
Only within a partition, it is guaranteed that a consumer will receive the messages in exactly the same order as they have been sent by the producer. As by default, all messages with the same key will go into the same partition, the unique id of a Saga is a natural choice for the Kafka message key. That way, the correct order of processing of the messages of one one Saga instance is ensured.
Several Saga instances can be processed in parallel if they end up in different partitions of the topics used for the Saga message exchange.

Two patterns for message relay

Transactional log tailing

https://microservices.io/patterns/data/transaction-log-tailing.html
The mechanism for trailing the transaction log depends on the database:
- MySQL binlog
- Postgres WAL
- AWS DynamoDB table streams

Polling publisher

https://microservices.io/patterns/data/polling-publisher.html

Success and failure flow

Success case

Failure case

Storage

id: Unique identifier of a given Saga instance, representing the creation of one particular purchase order
currentStep: The step at which the Saga currently is, e.g., “credit-approval” or “payment”
payload: An arbitrary data structure associated with a particular Saga instance, e.g., containing the id of the corresponding purchase order and other information useful during the Saga lifecycle; while the example implementation uses JSON as the payload format, one could also think of using other formats, for instance, Apache Avro, with payload schemas stored in a schema registry
status: The current status of the Saga; one of STARTED, SUCCEEDED, ABORTING, or ABORTED
stepState: A stringified JSON structure describing the status of the individual steps, e.g., "{"credit-approval":"SUCCEEDED","payment":"STARTED"}"
type: A nominal type of a Saga, e.g., “order-placement”; useful to tell apart different kinds of Sagas supported by one system
version: An optimistic locking version, used to detect and reject concurrent updates to one Saga instance (in which case the message triggering the failing update needs to be retried, reloading the current state from the Saga log)

Example failure state transition for a purchase order whose payment fails

First, the order comes in and the “credit-approval” step gets started. At this point, a “credit-approval” request message has been persisted in the outbox table, too.

{
  "id": "73707ad2-0732-4592-b7e2-79b07c745e45",
  "currentstep": null,
  "payload": "\"order-id\": 2, \"customer-id\": 456, \"payment-due\": 4999, \"credit-card-no\": \"xxxx-yyyy-dddd-9999\"}",
  "sagastatus": "STARTED",
  "stepstatus": "{}",
  "type": "order-placement",
  "version": 0
}
{
  "id": "73707ad2-0732-4592-b7e2-79b07c745e45",
  "currentstep": "credit-approval",
  "payload": "{ \"order-id\": 2, \"customer-id\": 456, ... }",
  "sagastatus": "STARTED",
  "stepstatus": "{\"credit-approval\": \"STARTED\"}",
  "type": "order-placement",
  "version": 1
}

Once this has been sent to Kafka, the order service will process it and send a reply message. The order service processes this by updating the Saga state and starting the payment step:

{
  "id": "73707ad2-0732-4592-b7e2-79b07c745e45",
  "currentstep": "payment",
  "payload": "{ \"order-id\": 2, \"customer-id\": 456, ... }",
  "sagastatus": "STARTED",
  "stepstatus": "{\"payment\": \"STARTED\", \"credit-approval\": \"SUCCEEDED\"}",
  "type": "order-placement",
  "version": 2
}

Again a message is sent via the outbox table, now the “payment” request. This fails, and the payment system responds with a reply message indicating this fact. This means that the “credit-approval” step needs to be compensated via the customer system:

{
  "id": "73707ad2-0732-4592-b7e2-79b07c745e45",
  "currentstep": "credit-approval",
  "payload": "{ \"order-id\": 2, \"customer-id\": 456, ... }",
  "sagastatus": "ABORTING",
  "stepstatus": "{\"payment\": \"FAILED\", \"credit-approval\": \"COMPENSATING\"}",
  "type": "order-placement",
  "version": 3
}

Once that has succeeded, the Saga is in its final state, ABORTED:

{
  "id": "73707ad2-0732-4592-b7e2-79b07c745e45",
  "currentstep": null,
  "payload": "{ \"order-id\": 2, \"customer-id\": 456, ... }",
  "sagastatus": "ABORTED",
  "stepstatus": "{\"payment\": \"FAILED\", \"credit-approval\": \"COMPENSATED\"}",
  "type": "order-placement",
  "version": 4
}

Validation

Account, entry, order

The Payments Platform inherits three key principles from double-entry bookkeeping:
- Immutability of orders (once created, the payment orders are immutable: if an order was created in error, a new corrective order needs to be created),
- Audatibility of all money movements (reliably stored and cannot be changed),
- Error detection based on the zero-sum principle. Every entry to an account requires a corresponding and opposite entry to a different account.
The implementation of payments in Uber platform centers around three key concepts:
- Entry - describes a single instance of a money movement to or from an entry (a customer, partner or Uber business),
- Account represents the entity in payments, capturing all entries of that entity. The sum of money amounts in the account entries represents its balance,
- Order - captures the payments for encapsulating all money movements among the involved parties (customers, partners, and Uber businesses).

Double entry bookkeeping

Conceptually, the Uber payment platform can be described as a generalized payments order processing system, based on the zero-sum principle. The processing of a payment order results in money movements to and from accounts. The zero-sum principle also originates from the double-entry bookkeeping and zero-proof bookkeeping, and in this context it means that sum of amounts (+ vs -) in each order has to be zero. For instance, a typical Uber payment order will involve the collection of the money from a customer for a service (e.g., ride-sharing), paying (disbursement) of a partner (e.g., a driver), as well as obtaining a service charge for a Uber business. These three entries will constitute an order, and the amount of money collected from the customer has to equal the amount of money obtained by partners and Uber businesses.
The zero-sum principle is a simple error detection mechanism, especially useful it in a loosely coupled distributed systems at the Uber scale. Processing of the order will results in several transactions, each potentially involving integration with different payment service providers and banks. As delays, network, and other failures will unavoidably happen, zero-sum principle provides a solid method to detect if any errors happened.

Validating side effects

Validation of processing results based on side-effects recording

References

Overview of distributed transactions

In depth analysis

ACID distributed transactions

Assumptions

The protocol works in the following manner:
1. One node is designated the coordinator, which is the master site, and the rest of the nodes in the network are called cohorts.
2. Stable storage at each site and use of a write ahead log by each node.
3. The protocol assumes that no node crashes forever, and eventually any two nodes can communicate with each other. The latter is not a big deal since network communication can typically be rerouted. The former is a much stronger assumption; suppose the machine blows up!

Process

Success case

Failure case

TCC

https://www.zhihu.com/question/471722924
https://www.sofastack.tech/blog/sofa-channel-4-retrospect/

Transaction message based distributed transaction

Rocket MQ supports transactional message

Distributed Sagas

Motivation

Using distributed transaction to maintain data consistency suffers from the following two pitfalls
- Many modern technologies including NoSQL databases such as MongoDB and Cassandra don't support them. Distributed transactions aren't supported by modern message brokers such as RabbitMQ and Apache Kafka.
- It is a form of syncronous IPC, which reduces availability. In order for a distributed transaction to commit, all participating services must be available. If a distributed transaction involves two services that are 99.5% available, then the overall availability is 99\%. Each additional service involved in a distributed transaction further reduces availability.
Sagas are mechanisms to maintain data consistency in a microservice architecture without having to use distributed transactions.
Distributed sagas execute transactions that span multiple physical databases by breaking them into smaller transactions and compensating transactions that operate on single databases.

Definition

High entry bar: First need to build a state machine. A saga is a state machine.
A distributed saga is a collection of requests. Each request has a compensating request on failure. A dsitributed saga guarantees the following properties:
1. Either all Requests in the Saga are succesfully completed, or
2. A subset of Requests and their Compensating Requests are executed.
Limitation: Does not guarantee the separation
- Solution 1: Semantic lock

Approaches

Event-driven choreography: When there is no central coordination, each service produces and listen to other service’s events and decides if an action should be taken or not.
Command/Orchestration: When a coordinator service is responsible for centralizing the saga’s decision making and sequencing business logic.

References

Distributed transactional middleware - Seata

Seata is an implementation of variants 2PC.
https://github.com/seata/seata

Uber Cadence

TODO in Chinese
Cadence: The only workflow platform you'll ever need
Cadence meetup: Introduction to Cadence
- Use case: Long transaction example - UberEATS
Uber Cadence: Fault Tolerant Actor Framework
- Use case: Long transaction example

Sagas

MicroService.io
https://www.youtube.com/watch?v=txlSrGVCK18&t=2471s&ab_channel=InfoQ

TODO

Data Consistency in Microservices Architecture (Grygoriy Gonchar):https://www.youtube.com/watch?v=CFdPDfXy6Y0&ab_channel=Devoxx

PreviousResilience NextFlash sale

Last updated 3 years ago

Was this helpful?