High availability
Last updated
Last updated
All master metadata is cached inside memory. There will be checkpoints where all memory is dumped to disk.
If the master has software failures, then it will first recover from checkpoints. And then operation logs after that timestamp will be replayed.
The above procedure could handle software but not hardware failures.
If the master has hardware failures, then it could failover to the backups which master synchronously replicates to.
The switch from master to master backup is by canonical name
The switch process could take seconds or minutes to complete. The worst case switch process needs
Monitor program detects the master failure.
Restarting master, loading data from disk checkpoints and replaying operation logs after the timestamp don't help.
Starting the switch from master to master backup by changing the canonical name.
The data in shadow back might be stale. But the chance that client read stale metadata from shadow backup is quite slim because it only happens when all these three conditions are met:
Master is dead.
The metadata on master has not completely been replicated to shadow backup.
The data clients is trying to read is just these metadata not replicated yet.