slug

id

title

date

comments

tags

description

references

85-improving-availability-with-failover

Improving availability with failover

2018-10-26 12:02

true

system design

To improve availability with failover, there are serval ways to achieve the goal such as cold standby, hot standby, warm standby, checkpointing and all active.

https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot

Cold Standby: Use heartbeat or metrics/alerts to track failure. Provision new standby nodes when a failure occurs. Only suitable for stateless services.

Hot Standby: Keep two active systems undertaking the same role. Data is mirrored in near real time, and both systems will have identical data.

Warm Standby: Keep two active systems but the secondary one does not take traffic unless the failure occurs.

Checkpointing (or like Redis snapshot): Use write-ahead log (WAL) to record requests before processing. Standby node recovers from the log during the failover.

cons
- time-consuming for large logs
- lose data since the last checkpoint
usercase: Storm, WhillWheel, Samza

Active-active (or all active): Keep two active systems behind a load balancer. Both of them take in parallel. Data replication is bi-directional.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

85-improving-availability-with-failover.md

85-improving-availability-with-failover.md

Files

85-improving-availability-with-failover.md

Latest commit

History

85-improving-availability-with-failover.md

File metadata and controls