🐝
Mess around software system design
  • README
  • ArchitectureTradeOffAnalysis
    • Estimation
    • Middleware
    • Network
    • Server
    • Storage
  • Conversion cheat sheet
  • Scenarios
    • TinyURL
      • Estimation
      • Flowchart
      • Shortening mechanisms
      • Rest API
      • Performance
      • Storage
      • Follow-up
    • TaskScheduler
      • JDK delay queue
      • Timer based
      • RabbitMQ based
      • Kafka-based fixed delay time
      • Redis-based customized delay time
      • MySQL-based customized delay time
      • Timer TimingWheel
      • Industrial Scheduler
      • Workflow Engine
      • Airflow Arch
    • GoogleDrive
      • Estimation
      • Flowchart
      • Storage
      • Follow-up
    • Youtube
      • Estimation
      • Flowchart
      • Performance
      • Storage
      • Follow-up
      • Netflix
    • Uber
      • Estimation
      • Rest api
      • Flowchart
      • KNN algorithms
      • Geohash-based KNN mechanism
      • Redis implementation
      • Storage
    • Twitter
      • Estimation
      • Flowchart
      • Storage
      • Scalability
      • Follow-up
    • Instant messenger
      • Architecture overview
      • Presence
      • Unread count
      • Notifications
      • Read receipt
      • Large group chat
      • Storage-Offline 1:1 Chat
      • Storage-Offline group chat
      • Storage-Message roaming
      • NonFunc-Realtime
      • NonFunc-Reliability
      • NonFunc-Ordering
      • NonFunc-Security
      • Livecast-LinkedIn
    • Distributed Lock
      • Single machine
      • AP model based
      • CP model based
      • Chubby-TODO
    • Payment system
      • Resilience
      • Consistency
      • Flash sale
    • Key value store
      • Master-slave KV
      • Peer-to-peer KV
      • Distributed cache
  • Time series scenarios
    • Observability
      • TimeSeries data
      • Distributed traces
      • Logs
      • Metrics
      • NonFunc requirments
  • Search engine
    • Typeahead
    • Search engine
    • Distributed crawler
      • Estimation
      • Flowchart
      • Efficiency
      • Robustness
      • Performance
      • Storage
      • Standalone implementation
      • Python Scrapy framework
    • Stream search
  • Big data
    • GFS/HDFS
      • Data flow
      • High availability
      • Consistency
    • Map reduce
    • Big table/Hbase
    • Haystack
    • TopK
    • Stateful stream
    • Lambda architecture
    • storm架构
    • Beam架构
    • Comparing stream frameworks
    • Instagram-[TODO]
  • MicroSvcs
    • Service Registry
      • Flowchart
      • Data model
      • High availability
      • Comparison
      • Implementation
    • Service governance
      • Load balancing
      • Circuit breaker
      • Bulkhead
      • Downgrade
      • Timeout
      • API gateway
      • RateLimiter
        • Config
        • Algorithm comparison
        • Sliding window
        • Industrial impl
    • MicroSvcs_ConfigCenter-[TODO]
    • MicroSvcs_Security
      • Authentication
      • Authorization
      • Privacy
  • Cache
    • Typical topics
      • Expiration algorithm
      • Access patterns
      • Cache penetration
      • Big key
      • Hot key
      • Distributed lock
      • Data consistency
      • High availability
    • Cache_Redis
      • Data structure
      • ACID
      • Performance
      • Availability
      • Cluster
      • Applications
    • Cache_Memcached
  • Message queue
    • Overview
    • Kafka
      • Ordering
      • At least once
      • Message backlog
      • Consumer idempotency
      • High performance
      • Internal leader election
    • MySQL-based msg queue
    • Other msg queues
      • ActiveMQ-TODO
      • RabbitMQ-TODO
      • RocketMQ-TODO
      • Comparison between MQ
  • Traditional DB
    • Index data structure
    • Index categories
    • Lock
    • MVCC
    • Redo & Undo logs
    • Binlog
    • Schema design
    • DB optimization
    • Distributed transactions
    • High availability
    • Scalability
    • DB migration
    • Partition
    • Sharding
      • Sharding strategies
      • Sharding ID generator overview
        • Auto-increment key
        • UUID
        • Snowflake
        • Implement example
      • Cross-shard pagination queries
      • Non-shard key queries
      • Capacity planning
  • Non-Traditional DB
    • NoSQL overview
    • Rum guess
    • Data structure
    • MySQL based key value
    • KeyValueStore
    • ObjectStore
    • ElasticSearch
    • TableStore-[TODO]
    • Time series DB
    • DistributedAcidDatabase-[TODO]
  • Java basics
    • IO
    • Exception handling
  • Java concurrency
    • Overview
      • Synchronized
      • Reentrant lock
      • Concurrent collections
      • CAS
      • Others
    • Codes
      • ThreadLocal
      • ThreadPool
      • ThreadLifeCycle
      • SingletonPattern
      • Future
      • BlockingQueue
      • Counter
      • ConcurrentHashmap
      • DelayedQueue
  • Java JVM
    • Overview
    • Dynamic proxy
    • Class loading
    • Garbage collection
    • Visibility
  • Server
    • Nginx-[TODO]
  • Distributed system theories
    • Elementary school with CAP
    • Consistency
      • Eventual with Gossip
      • Strong with Raft
      • Tunable with Quorum
      • Fault tolerant with BFT-TODO
      • AutoMerge with CRDT
    • Time in distributed system
      • Logical time
      • Physical time
    • DDIA_Studying-[TODO]
  • Protocols
    • ApiDesign
      • REST
      • RPC
    • Websockets
    • Serialization
      • Thrift
      • Avro
    • HTTP
    • HTTPS
    • Netty-TODO
  • Statistical data structure
    • BloomFilter
    • HyperLoglog
    • CountMinSketch
  • DevOps
    • Container_Docker
    • Container_Kubernetes-[TODO]
  • Network components
    • CDN
    • DNS
    • Load balancer
    • Reverse proxy
    • 云中网络-TODO
  • Templates
    • interviewRecord
  • TODO
    • RecommendationSystem-[TODO]
    • SessionServer-[TODO]
    • Disk
    • Unix philosophy and Kafka
    • Bitcoin
    • Design pattern
      • StateMachine
      • Factory
    • Akka
    • GoogleDoc
      • CRDT
Powered by GitBook
On this page
  • Flowchart
  • Initial load phase
  • mysqldump
  • XtraBackup
  • Dual write phase
  • Primary key
  • Dual write flag in ORM/AOP

Was this helpful?

  1. Traditional DB

DB migration

PreviousScalabilityNextPartition

Last updated 1 year ago

Was this helpful?

Flowchart

  • The motivation for introduce the third phase: In case something goes wrong on the target table, you could rollback to the original table.

Initial load phase

  • There are two typically used DB backup tools: mysqldump and XtraBackup.

mysqldump

  • Pros:

    • Free

    • Could backup the entire DB, including table structure and data

  • Cons:

    • Backup time could be long

Best practices

  • To increase speed when using mysqldump tool, there are a couple options:

    • Close the unique key check and external key check because source table already guarantees this.

    • Close binlog

    • Adjust the disk flush time of redo log by setting innodb_flush_log_at_trx_commit = 0

XtraBackup

  • Pros:

    • Faster backup speed

    • Support incremental backup

    • Backup process won't interrupt executing transactions

  • Cons:

    • Only applicable to InnoDB engine

    • Poor cross-platform compatibility.

Dual write phase

Primary key

innodb_autoinc_lock_mode parameter

  • "innodb_autoinc_lock_mode" parameter determines how InnoDB generates primary key. All three cases rely on table-wise lock.

    • 0: The table lock will be released after insert statements are executed.

    • 1: The table lock release times depend on SQL statements:

      • For INSERT INTO VALUE / INSERT INTO VALUES, the primary key lock will be released immediately after primary keys are generated, instead of after SQL statements are executed.

      • For INSERT / SELECT, the primary key lock will be released after SQL statements are executed.

    • 2: The table lock will be released immediately after primary keys are generated.

Consistency

  • When writing to the first table of dual write, return the primary key and use it in subsequent table.

Dual write flag in ORM/AOP

  • Typically there is a dual write flag inside the AOP/ORM.

  • This flag controls the dual write mode.

UpdateTime field based approach

Flowchart

for {
  // Execute the SQL query
  // SELECT * FROM xx WHERE update_time >= last_time
  rows := findUpdatedRows()

  for row in rows {
    // Use row primary key / id to find corresponding rows in target table
    tgtRow = findTgt(row.id)
    if row != tgtRow {
      // fix data
      fix()
    }
  }

  // Record the maximum timestamp of these records, use for next query. 
  last_time = maxUpdateTime(row)

  // Sleep for a while
  sleep(1s)
}

Soft delete on source table

  • If using hard delete on source table, delete failures on target table won't be discovered.

  • If we need to use hard delete on source table, there needs to be a reverse check from target table to source table.

Binlog triggered

Flowchart
Initial load phase
mysqldump
Best practices
XtraBackup
Dual write phase
Primary key
innodb_autoinc_lock_mode parameter
Consistency
Dual write flag in ORM/AOP
UpdateTime field based approach
Flowchart
Soft delete on source table
Binlog triggered
DB migration flowchart
Primary key consistency
Dual write flag structure
Hard delete problems
Hard delete with reverse check
Binlog triggered consistency