Skip to content

Latest commit

 

History

History
26 lines (20 loc) · 1.3 KB

85-improving-availability-with-failover.md

File metadata and controls

26 lines (20 loc) · 1.3 KB
slug id title date comments tags description references
85-improving-availability-with-failover
85-improving-availability-with-failover
Improving availability with failover
2018-10-26 12:02
true
system design
To improve availability with failover, there are serval ways to achieve the goal such as cold standby, hot standby, warm standby, checkpointing and all active.

Cold Standby: Use heartbeat or metrics/alerts to track failure. Provision new standby nodes when a failure occurs. Only suitable for stateless services.

Hot Standby: Keep two active systems undertaking the same role. Data is mirrored in near real time, and both systems will have identical data.

Warm Standby: Keep two active systems but the secondary one does not take traffic unless the failure occurs.

Checkpointing (or like Redis snapshot): Use write-ahead log (WAL) to record requests before processing. Standby node recovers from the log during the failover.

  • cons
    • time-consuming for large logs
    • lose data since the last checkpoint
  • usercase: Storm, WhillWheel, Samza

Active-active (or all active): Keep two active systems behind a load balancer. Both of them take in parallel. Data replication is bi-directional.