Scalability

High concurrent writes conflicts

  • Problem: How to prevent overselling for limited inventory products?

V1: Serializable DB isolation

  • Solution1: Set serializable isolation level in DB

V2: Optimistic lock

  • Set optimistic lock on the table where multiple writes to a single table happens often.

V3: Put inventory number inside Redis

  • Redis transaction mechanism:

    • Different from DB transaction, an atomic batch processing mechanism for Redis

    • Similar to put optimistic mechanism inside Redis

  • Flowchart

  • Implementation:

High concurrent read but low concurrent writes - Read/Write separation

Architecture example - Replication + PXC + Sharding proxy

  • PXC is a type of strong consistency MySQL cluster built on top of Galera. It could store data requring high consistency.

  • Replication is a type of weak consistency MySQL cluster shipped with MySQL based on binlog replication. It could be used to store data only requiring low consistency.

Architecture example - Disaster recovery

One active and the other cold backup machine

  • Definition:

    • Master databases in City A serves all traffic. Backup databases are only for backup purpose.

    • To be tolerant against failure in DC1. Deploy another backup DC 2 within the same city.

    • To be tolerant against failure for entire city A. Deploy another backup DC 3 in city B.

  • Failover:

    • If master database goes down, fail over to backup database.

  • Pros:

    • Improve availability.

  • Cons:

    • Backup DC capacity is not used 100%.

    • No confidence that after failing over to backup DC, it could still serve traffic.

    • Could not serve larger traffic volume.

Two active DCs with full copy of data

  • Definition:

    • Master DCs serve read/write traffic. Slave DCs only serve read traffic. All master DCs have full copy of data.

    • Slave DCs redirect write traffic to master DCs.

  • Failover:

    • If DC 1 goes down, fail over to DC 2.

    • If entire city A goes down, fail over to DC 3.

  • Pros:

    • Can be horizontally scaled to multiple DCs.

  • Cons:

    • Each DC needs to have full copy of data to be fault tolerant.

    • To avoid write conflicts, two masters could not process the same copy of data.

Same city vs different city

  • The following table summarizes the differences of these two pattern

Dimensions
two data centers within the same city
two data centers in different cities

Definition

Two DCs are located close to each other geographically. For example, the two DCs are within the same city

Two DCs are located distant from each other geographically. For example, the two DCs are cross region (e.g. New York and Log Angeles), or even cross continent (e.g. USA and Australia)

Cost

high (DC itself and dedicated line with same city)

extremely high (DC itself and dedicated line across same region/continent)

Complexity

Low. Fine to call across DCs due to low latency

High. Need to rearchitect due to high latency

Service quality

Increase latency a bit / increase availability

Decrease latency / increase availability

Two active DCs within a city

  • Since the latency within the same city will be low, it is fine to have one centralized database layer and have cross DC remote calls.

Two active DCs in different cities

  • Since the latency between two DCs across region/continent will be high, it is only possible to sync the data asynchronously.

Multi active DCs with sharded data

  • Definition:

    • Request routing:

      • API Router Layer: Route external API calls to the correct DC.

      • Internal DC call Router: Within a sharded DC, route cross DC calls.

      • Global Coordinator Service: Maintains the mapping from shard key -> shard id -> DC

        • Shard key varies with each request.

        • Shard Id -> DC mapping does not change much.

    • Data:

      • Sharded DC: Contains eventual consistent sharded data. For example, in case of ecommerce system, each buyer has their own orders, comments, user behaviors.

      • Global zone DC: Contains strong consistent global data. For example, in case of ecommerce system, all users will see the same inventory.

  • Typical flow:

    • Step1. A request comes to API router layer with sharding keys (geographical location, user Id, order Id)

    • Step2. The API router layer component will determine the DC which contains the shard

    • Step3.

    • Step4. (Optional) It will call "Inter DC Call Router" in case it needs to use data in another sharded DC (e.g. Suppose the sharded DC is based on geographical location, a buyer on an ecommerce website wants to look at a seller's product who is in another city.)

    • Step5. (Optional) It will call "Global zone" in case it needs to access the global strong consistent data (e.g. )

Synchronization mechanisms

  • Reship component: Forward the write requests coming in local DC to remote DCs.

  • Collector component: Read write requests from remote DCs and write to local DC.

  • Elastic search component: Update to DC requests are all written to elastic search to guarantee strong consistency.

Message queue based

RPC based

Distributed database (Two cities / three DCs and five copies)

  • For distributed ACID database, the basic unit is sharding. And the data consensus is achieved by raft protocol.

Pros

  • Disaster recovery support:

    • If any server room within city A is not available, then city B server room's vote could still form majority with the remaining server room in city A.

Cons

  • If it is single server providing timing, then Raft leaders for the shard will need to stay close to the timing. It is recommended to have multiple servers which could assign time.

  • Otherwise, exception will happen. For example

    1. C1 talks to timing server in server room A for getting the time. And absolute time (AT) is 500 and global time (Ct) is 500.

    2. A1 node talks to timing server to get time. A1's request is later than C1, so the AT is 510 and Ct is also 510.

    3. A1 wants to write data to R2. At is 512 and Ct is 510.

    4. C1 wants to write data to R2. Since C2 is in another city and will have longer latency, C1 will be behind A1 to write data to R2.

  • As a result of the above steps, although C1's global time is before A1, its abosolute time is after A1.

Parameters to monitor

  • Availability

    • Connectability

    • Number of available connections

  • Performance (Using mySQL built-in variables to calculate)

    • QPS / TPS

    • Deadlock

  • Master-slave replication delay (Using the diff of binlogs)

  • Disk space

Real world

Past utility: MMM (Multi-master replication manager)

  • MMM is a set of scripts written in perl providing the following capabilities:

    • Load balancing among read slaves

    • Master failover

    • Monitor mySQL states

  • Pros:

    • Easy config

  • Cons:

    • Not suitable for scenarios having high requirements on data consistency

  • Deployment: Although dual master, only allows writing to a single master at a time.

    • mmm_mond: Coordinator scripts. Run on top of a monitoring machine

      • Create a set of virtual IPs. One write IP binds to the master and multiple read IPs bind to slave.

      • When a mySQL is down, it will migrate the VIP to another mySQL machine.

    • mmm_agentd: Run on the same machine as the mysql server

    • mmm_control: Provides administrative commands for mmm_mond

Past utility MHA (Master high availability)

  • MHA

    • Fast failover: Complete the failover within 0-30 seconds

    • Max effort consistency: When a master goes down, it will try to save binlog in the failed master. It uses this way to keep the maximum data consistency. However, this isn't reliable way. For example, some hardware failures may result in failure of saving binlogs.

    • Compared with MMM,

      • Supports devops work like health check, suspend nodes

      • Supports semi-synchronous, GTID

  • Deployment:

    • MHA manager could be deployed in a separate machine for managing several master-slave clusters. It could also be deployed on a single slave.

    • MHA node runs on each mysql server.

  • Cons:

    • Needs at minimum 3 machines

    • Brain split

    • Not suitable for scenarios having high requirements on data consistency

Wechat Red pocket

WePay MySQL high availability

High availability at Github

MySQL HA github

Multi DC for disaster recovery

Last updated

Was this helpful?