Backup and Recovery Policy

Backup and Recovery

WiserReview's approach to data backup, recovery, and business continuity. Every layer of the platform has defined recovery capabilities designed to protect customer data from loss, corruption, or infrastructure failure.

RTO Target

4 hours

RPO Target (Database)

Near-zero

Owner

Security Team

1. Backup Strategy

Primary Database: MongoDB Atlas

All review data, merchant accounts, customer emails, product catalogs, and configuration

Backup typeContinuous automated backups (cloud-provider managed by MongoDB Atlas)
Point-in-time recoveryData can be restored to any point within the backup window
Backup frequencyContinuous (near real-time)
Geographic redundancyReplica sets distribute data across multiple availability zones
EncryptionAES-256, consistent with primary data encryption
High availabilityMinimum 3-node replica sets; automatic failover within seconds if primary node fails; zero data loss
Access controlBackup access restricted to Security Officer and Engineering Lead only

File Storage: Azure Blob Storage and AWS S3

Review photos and videos uploaded by reviewers

RedundancyAzure Blob Storage: Locally Redundant Storage (LRS) minimum; Geo-Redundant Storage (GRS) where configured. AWS S3: 99.999999999% (11 nines) durability by design.
EncryptionServer-side AES-256 encryption at rest
Access controlTime-limited signed URLs for file access: no public bucket access

Message Queue: Azure Service Bus

Asynchronous email campaign delivery and event-driven workflows

Dead-letter queueMessages that cannot be processed are automatically moved to a dead-letter queue: no messages are silently dropped
Retry logicAutomatic retry with configurable intervals before dead-lettering
RecoveryDead-letter messages can be inspected and reprocessed after underlying issues are resolved

Application Layer: Docker Containers

Deployments are reproducible via CI/CD, not backed up in the traditional sense

Container imagesAll Docker images stored in Azure Container Registry with version history
Previous versionsPrior deployment images retained in the registry and redeployable at any time
Source codeAll source code stored in GitHub with full version history
RollbackAzure App Services supports deployment slot swapping for instant rollback to previous version

2. Recovery Objectives

MetricTargetBasis
RTO (Recovery Time Objective)4 hoursAzure App Services auto-recovery, MongoDB Atlas failover, and CI/CD redeployment capabilities
RPO: DatabaseNear-zeroMongoDB Atlas continuous backups and replica set replication provides near real-time redundancy
RPO: Other systems1 hourFile storage and queue-based systems; messages in Azure Service Bus dead-letter queue are recoverable
Widget delivery RTONear-zeroCloudflare CDN continues serving cached widget assets from 300+ edge locations even during origin outages

RTO and RPO targets are based on infrastructure capabilities. Actual recovery times depend on the nature and scope of the incident and will be refined as operational history is established.

3. Recovery Capabilities

Database Recovery

ScenarioMechanismExpected Time
Primary node failureReplica set automatic failover: secondary promoted to primarySeconds to 1–2 minutes (automatic)
Data corruption or accidental deletionPoint-in-time recovery to any point within backup window1–4 hours depending on data volume
Full cluster failureMongoDB Atlas cluster restoration from backup snapshotDepends on data volume and Atlas tier

Application Recovery

ScenarioMechanismExpected Time
Crashed containerAzure App Services health checks trigger automatic container restart1–3 minutes (automatic)
Faulty deploymentRollback to previous Docker image via Azure App Services deployment slot swap5–15 minutes
Full service rebuildGitHub Actions CI/CD pipeline: code → Docker build → Azure Container Registry → Azure App Services20–30 minutes

Infrastructure Recovery

ScenarioMechanism
Azure availability zone outageAzure App Services with zone redundancy or manual failover to secondary region
Cloudflare edge issue300+ global edge nodes provide inherent redundancy: widget delivery continues from other edges
Redis cache failureApplication falls back to direct database queries; Redis cache is rebuilt from database on restart
Azure Service Bus issueDead-letter queue preserves all messages; reprocessed after service restoration

4. Disaster Recovery Scenarios

ScenarioResponse
Azure region partial outageAuto-scaling redistributes load; MongoDB Atlas replica sets span availability zones for database resilience
Full Azure region outageFailover to secondary Azure region; MongoDB Atlas can be configured for cross-region replicas
Cloudflare outageWidget delivery may be impacted; review data and dashboard remain accessible through direct DNS fallback
MongoDB Atlas outageAtlas SLA at 99.995% uptime; replica sets provide database-level redundancy; severe outage handled per Atlas DR procedures
Mass data corruptionMongoDB Atlas point-in-time recovery to last known good state; scope assessed before restoration to minimize data loss
Security breach requiring platform shutdownCloudflare can enable maintenance mode at the edge; Engineering Lead coordinates shutdown and recovery per Incident Response Plan

5. Backup Testing

Current Practice

  • MongoDB Atlas automatic failover is inherently tested through replica set operations
  • Application rollback is exercised through normal deployment operations (deployment slot swaps)
  • Recovery procedures reviewed and updated as infrastructure evolves

Planned Formalization (2027)

  • Periodic scheduled recovery drills: database point-in-time recovery test and application rollback test
  • Formalized as part of the 2027 SOC 2 audit preparation
  • Recovery tests documented with results and improvement notes

6. Roles and Responsibilities

RoleResponsibility
Security OfficerOversight of backup and recovery policy; approves changes to recovery objectives; accountable for recovery in a crisis
Engineering LeadImplements and maintains backup configurations; executes recovery procedures; monitors backup health; maintains Azure, MongoDB Atlas, and AWS S3 configurations
Development TeamEnsures application code handles infrastructure failures gracefully (retry logic, fallback behavior); participates in recovery procedures as directed

7. Monitoring and Alerting

MonitoringImplementation
Database healthHealth-check endpoints monitor MongoDB Atlas connectivity; failure triggers immediate Slack alert
Cache healthHealth-check endpoints monitor Redis connectivity; failure triggers Slack alert
Application healthAzure App Services health checks trigger automatic container restart; Sentry monitors error rates
Backup statusMongoDB Atlas provides backup status and alerts through the Atlas portal; Engineering Lead reviews periodically
Queue healthAzure Service Bus dead-letter queue depth monitored; elevated counts indicate processing issues
Storage healthAzure Blob Storage and AWS S3 availability monitoring through respective cloud portals

All critical infrastructure alerts flow to the engineering team's Slack #alerts channel.

Backup and Recovery Inquiries

Tatvam Cloud Solutions, LLP

Email: [email protected]