Last updated
Last updated
Problem: How to prevent overselling for limited inventory products?
Solution1: Set serializable isolation level in DB
Set optimistic lock on the table where multiple writes to a single table happens often.
Redis transaction mechanism:
Different from DB transaction, an atomic batch processing mechanism for Redis
Similar to put optimistic mechanism inside Redis
Flowchart
Implementation:
PXC is a type of strong consistency MySQL cluster built on top of Galera. It could store data requring high consistency.
Replication is a type of weak consistency MySQL cluster shipped with MySQL based on binlog replication. It could be used to store data only requiring low consistency.
Definition:
Master databases in City A serves all traffic. Backup databases are only for backup purpose.
To be tolerant against failure in DC1. Deploy another backup DC 2 within the same city.
To be tolerant against failure for entire city A. Deploy another backup DC 3 in city B.
Failover:
If master database goes down, fail over to backup database.
Pros:
Improve availability.
Cons:
Backup DC capacity is not used 100%.
No confidence that after failing over to backup DC, it could still serve traffic.
Could not serve larger traffic volume.
Definition:
Master DCs serve read/write traffic. Slave DCs only serve read traffic. All master DCs have full copy of data.
Slave DCs redirect write traffic to master DCs.
Failover:
If DC 1 goes down, fail over to DC 2.
If entire city A goes down, fail over to DC 3.
Pros:
Can be horizontally scaled to multiple DCs.
Cons:
Each DC needs to have full copy of data to be fault tolerant.
To avoid write conflicts, two masters could not process the same copy of data.
The following table summarizes the differences of these two pattern
Since the latency within the same city will be low, it is fine to have one centralized database layer and have cross DC remote calls.
Since the latency between two DCs across region/continent will be high, it is only possible to sync the data asynchronously.
Definition:
Request routing:
API Router Layer: Route external API calls to the correct DC.
Internal DC call Router: Within a sharded DC, route cross DC calls.
Global Coordinator Service: Maintains the mapping from shard key -> shard id -> DC
Shard key varies with each request.
Shard Id -> DC mapping does not change much.
Data:
Sharded DC: Contains eventual consistent sharded data. For example, in case of ecommerce system, each buyer has their own orders, comments, user behaviors.
Global zone DC: Contains strong consistent global data. For example, in case of ecommerce system, all users will see the same inventory.
Typical flow:
Step1. A request comes to API router layer with sharding keys (geographical location, user Id, order Id)
Step2. The API router layer component will determine the DC which contains the shard
Step3.
Step4. (Optional) It will call "Inter DC Call Router" in case it needs to use data in another sharded DC (e.g. Suppose the sharded DC is based on geographical location, a buyer on an ecommerce website wants to look at a seller's product who is in another city.)
Step5. (Optional) It will call "Global zone" in case it needs to access the global strong consistent data (e.g. )
Reship component: Forward the write requests coming in local DC to remote DCs.
Collector component: Read write requests from remote DCs and write to local DC.
Elastic search component: Update to DC requests are all written to elastic search to guarantee strong consistency.
For distributed ACID database, the basic unit is sharding. And the data consensus is achieved by raft protocol.
Disaster recovery support:
If any server room within city A is not available, then city B server room's vote could still form majority with the remaining server room in city A.
If it is single server providing timing, then Raft leaders for the shard will need to stay close to the timing. It is recommended to have multiple servers which could assign time.
Otherwise, exception will happen. For example
C1 talks to timing server in server room A for getting the time. And absolute time (AT) is 500 and global time (Ct) is 500.
A1 node talks to timing server to get time. A1's request is later than C1, so the AT is 510 and Ct is also 510.
A1 wants to write data to R2. At is 512 and Ct is 510.
C1 wants to write data to R2. Since C2 is in another city and will have longer latency, C1 will be behind A1 to write data to R2.
As a result of the above steps, although C1's global time is before A1, its abosolute time is after A1.
Availability
Connectability
Number of available connections
Performance (Using mySQL built-in variables to calculate)
QPS / TPS
Deadlock
Master-slave replication delay (Using the diff of binlogs)
Disk space
Load balancing among read slaves
Master failover
Monitor mySQL states
Pros:
Easy config
Cons:
Not suitable for scenarios having high requirements on data consistency
Deployment: Although dual master, only allows writing to a single master at a time.
mmm_mond: Coordinator scripts. Run on top of a monitoring machine
Create a set of virtual IPs. One write IP binds to the master and multiple read IPs bind to slave.
When a mySQL is down, it will migrate the VIP to another mySQL machine.
mmm_agentd: Run on the same machine as the mysql server
mmm_control: Provides administrative commands for mmm_mond
Fast failover: Complete the failover within 0-30 seconds
Max effort consistency: When a master goes down, it will try to save binlog in the failed master. It uses this way to keep the maximum data consistency. However, this isn't reliable way. For example, some hardware failures may result in failure of saving binlogs.
Compared with MMM,
Supports devops work like health check, suspend nodes
Supports semi-synchronous, GTID
Deployment:
MHA manager could be deployed in a separate machine for managing several master-slave clusters. It could also be deployed on a single slave.
MHA node runs on each mysql server.
Cons:
Needs at minimum 3 machines
Brain split
Not suitable for scenarios having high requirements on data consistency
[Used at Github](
Master discovery series
golden ant:
is a set of scripts written in perl providing the following capabilities:
)
DNS
VPN and DNS
app and service discovery
Proxy heuristics
Service discovery and Proxy
饿了吗:
异地多活架构:
微博异地多活:
Overview:
甜橙:
More:
魅族:
迁移角度:
李运华:
唐杨:
微服务多机房:
缓存多机房:
Google Ads 异地多活的高可用架构:
TiDB:
支付宝架构:
三地五中心:
Definition
Two DCs are located close to each other geographically. For example, the two DCs are within the same city
Two DCs are located distant from each other geographically. For example, the two DCs are cross region (e.g. New York and Log Angeles), or even cross continent (e.g. USA and Australia)
Cost
high (DC itself and dedicated line with same city)
extremely high (DC itself and dedicated line across same region/continent)
Complexity
Low. Fine to call across DCs due to low latency
High. Need to rearchitect due to high latency
Service quality
Increase latency a bit / increase availability
Decrease latency / increase availability