Backup and Recovery Policy

Backup and Recovery

WiserReview's approach to data backup, recovery, and business continuity. Every layer of the platform has defined recovery capabilities designed to protect customer data from loss, corruption, or infrastructure failure.

RTO Target

4 hours

RPO Target (Database)

Near-zero

Owner

Security Team

1. Backup Strategy

Primary Database: MongoDB Atlas

All review data, merchant accounts, customer emails, product catalogs, and configuration

Backup type	Continuous automated backups (cloud-provider managed by MongoDB Atlas)
Point-in-time recovery	Data can be restored to any point within the backup window
Backup frequency	Continuous (near real-time)
Geographic redundancy	Replica sets distribute data across multiple availability zones
Encryption	AES-256, consistent with primary data encryption
High availability	Minimum 3-node replica sets; automatic failover within seconds if primary node fails; zero data loss
Access control	Backup access restricted to Security Officer and Engineering Lead only

File Storage: Azure Blob Storage and AWS S3

Review photos and videos uploaded by reviewers

Redundancy	Azure Blob Storage: Locally Redundant Storage (LRS) minimum; Geo-Redundant Storage (GRS) where configured. AWS S3: 99.999999999% (11 nines) durability by design.
Encryption	Server-side AES-256 encryption at rest
Access control	Time-limited signed URLs for file access: no public bucket access

Message Queue: Azure Service Bus

Asynchronous email campaign delivery and event-driven workflows

Dead-letter queue	Messages that cannot be processed are automatically moved to a dead-letter queue: no messages are silently dropped
Retry logic	Automatic retry with configurable intervals before dead-lettering
Recovery	Dead-letter messages can be inspected and reprocessed after underlying issues are resolved

Application Layer: Docker Containers

Deployments are reproducible via CI/CD, not backed up in the traditional sense

Container images	All Docker images stored in Azure Container Registry with version history
Previous versions	Prior deployment images retained in the registry and redeployable at any time
Source code	All source code stored in GitHub with full version history
Rollback	Azure App Services supports deployment slot swapping for instant rollback to previous version

2. Recovery Objectives

Metric	Target	Basis
RTO (Recovery Time Objective)	4 hours	Azure App Services auto-recovery, MongoDB Atlas failover, and CI/CD redeployment capabilities
RPO: Database	Near-zero	MongoDB Atlas continuous backups and replica set replication provides near real-time redundancy
RPO: Other systems	1 hour	File storage and queue-based systems; messages in Azure Service Bus dead-letter queue are recoverable
Widget delivery RTO	Near-zero	Cloudflare CDN continues serving cached widget assets from 300+ edge locations even during origin outages

RTO and RPO targets are based on infrastructure capabilities. Actual recovery times depend on the nature and scope of the incident and will be refined as operational history is established.

3. Recovery Capabilities

Database Recovery

Scenario	Mechanism	Expected Time
Primary node failure	Replica set automatic failover: secondary promoted to primary	Seconds to 1–2 minutes (automatic)
Data corruption or accidental deletion	Point-in-time recovery to any point within backup window	1–4 hours depending on data volume
Full cluster failure	MongoDB Atlas cluster restoration from backup snapshot	Depends on data volume and Atlas tier

Application Recovery

Scenario	Mechanism	Expected Time
Crashed container	Azure App Services health checks trigger automatic container restart	1–3 minutes (automatic)
Faulty deployment	Rollback to previous Docker image via Azure App Services deployment slot swap	5–15 minutes
Full service rebuild	GitHub Actions CI/CD pipeline: code → Docker build → Azure Container Registry → Azure App Services	20–30 minutes

Infrastructure Recovery

Scenario	Mechanism
Azure availability zone outage	Azure App Services with zone redundancy or manual failover to secondary region
Cloudflare edge issue	300+ global edge nodes provide inherent redundancy: widget delivery continues from other edges
Redis cache failure	Application falls back to direct database queries; Redis cache is rebuilt from database on restart
Azure Service Bus issue	Dead-letter queue preserves all messages; reprocessed after service restoration

4. Disaster Recovery Scenarios

Scenario	Response
Azure region partial outage	Auto-scaling redistributes load; MongoDB Atlas replica sets span availability zones for database resilience
Full Azure region outage	Failover to secondary Azure region; MongoDB Atlas can be configured for cross-region replicas
Cloudflare outage	Widget delivery may be impacted; review data and dashboard remain accessible through direct DNS fallback
MongoDB Atlas outage	Atlas SLA at 99.995% uptime; replica sets provide database-level redundancy; severe outage handled per Atlas DR procedures
Mass data corruption	MongoDB Atlas point-in-time recovery to last known good state; scope assessed before restoration to minimize data loss
Security breach requiring platform shutdown	Cloudflare can enable maintenance mode at the edge; Engineering Lead coordinates shutdown and recovery per Incident Response Plan

5. Backup Testing

Current Practice

MongoDB Atlas automatic failover is inherently tested through replica set operations
Application rollback is exercised through normal deployment operations (deployment slot swaps)
Recovery procedures reviewed and updated as infrastructure evolves

Planned Formalization (2027)

Periodic scheduled recovery drills: database point-in-time recovery test and application rollback test
Formalized as part of the 2027 SOC 2 audit preparation
Recovery tests documented with results and improvement notes

6. Roles and Responsibilities

Role	Responsibility
Security Officer	Oversight of backup and recovery policy; approves changes to recovery objectives; accountable for recovery in a crisis
Engineering Lead	Implements and maintains backup configurations; executes recovery procedures; monitors backup health; maintains Azure, MongoDB Atlas, and AWS S3 configurations
Development Team	Ensures application code handles infrastructure failures gracefully (retry logic, fallback behavior); participates in recovery procedures as directed

7. Monitoring and Alerting

Monitoring	Implementation
Database health	Health-check endpoints monitor MongoDB Atlas connectivity; failure triggers immediate Slack alert
Cache health	Health-check endpoints monitor Redis connectivity; failure triggers Slack alert
Application health	Azure App Services health checks trigger automatic container restart; Sentry monitors error rates
Backup status	MongoDB Atlas provides backup status and alerts through the Atlas portal; Engineering Lead reviews periodically
Queue health	Azure Service Bus dead-letter queue depth monitored; elevated counts indicate processing issues
Storage health	Azure Blob Storage and AWS S3 availability monitoring through respective cloud portals

All critical infrastructure alerts flow to the engineering team's Slack #alerts channel.

Backup and Recovery Inquiries

Tatvam Cloud Solutions, LLP

Email: [email protected]