BigQuery Managed Disaster Recovery adds soft failover

Summary of differences between hard failover and soft failover

	Hard failover	Soft failover
Use case	Unplanned outages, region down	Failover testing, requires primary and secondary to both be available
Failover timing	As soon as possible ignoring any pending replication between primary and secondary; data loss possible	Subject to primary and secondary acquiescing, minimizing potential for data loss
RPO/RTO	15 minutes / 5 minutes*	N/A

*Supported objective depending on configuration

BigQuery soft failover in action

Imagine a large financial services company, “SecureBank,” which uses BigQuery for its mission-critical analytics and reporting. SecureBank requires a reliable Recovery Time Objective (RTO) and15 minute Recovery Point Objective (RPO) for its primary BigQuery datasets, as robust disaster recovery is a top priority. They regularly conduct DR drills with BigQuery Managed DR to ensure compliance and readiness for unforeseen outages.

Before the introduction of soft failover in BigQuery Managed DR BigQuery, SecureBank faced a dilemma on how to perform their DR drills. While BigQuery Managed DR handled the failover of compute and associated datasets, conducting a full “hard failover” drill meant accepting the risk of up to 15 minutes of data loss if replication wasn’t complete when the failover was initiated — or significant operational disruption if they first manually verified data synchronization across regions. This often led to less realistic or more complex drills, consuming valuable engineering time and causing anxiety.

New solution:

With soft failover in BigQuery Managed DR, administrators have several options for failover procedures. Unlike hard failover for unplanned outages, soft failover initiates failover only after all data is replicated to the secondary region, to help guarantee data integrity.

Source Credit: https://cloud.google.com/blog/products/data-analytics/bigquery-managed-disaster-recovery-adds-soft-failover/