MSP Disaster Recovery Testing: Don't Wait Until It's Too Late
Your MSP says you have a disaster recovery plan. Your backups are running. Your systems are protected. But when was the last time anyone actually tested whether recovery works?
In the Australian MSP industry, disaster recovery testing is one of the most neglected activities. Many MSPs back up data religiously but never verify that the backups can be restored in a real disaster. Here is why testing matters and how to ensure it happens.
Why Untested DR Plans Are Dangerous
A disaster recovery plan that has not been tested is a hypothesis, not a plan. It may look good on paper, but until you have actually recovered systems from backups and verified they work, you have no confidence that recovery will succeed when you need it.
Common DR Testing Failures
- Backup corruption — backups were running but the data was corrupted, making restoration impossible
- Missing dependencies — the primary system was restored but a critical dependency was not
- Insufficient infrastructure — the DR site cannot handle the load of production systems
- Configuration drift — the DR environment was not updated to match production changes
- Documentation gaps — the recovery steps were incomplete or outdated
- Personnel issues — the people who wrote the plan are no longer available
The Real-World Consequence
When a disaster strikes and recovery fails, the impact is catastrophic:
- Extended downtime (days instead of hours)
- Data loss beyond the backup window
- Revenue loss from inability to serve customers
- Regulatory penalties for breach notification delays
- Reputational damage that persists long after systems are restored
Types of DR Testing
Tabletop Exercises
A guided discussion where stakeholders walk through a disaster scenario:
- What happens if your primary data centre goes offline?
- Who is responsible for initiating recovery?
- How do you communicate with staff and customers?
- What are the recovery priorities?
Tabletop exercises are low-cost and reveal gaps in planning without any technical risk. They should be conducted at least twice per year.
Parallel Tests
Recovery systems are built and tested alongside production:
- Restore data to a separate environment
- Verify the restored systems function correctly
- Measure actual recovery time against targets
- Identify issues without impacting production
Parallel tests provide meaningful validation without the risk of disrupting live operations.
Full Failover Tests
Production traffic is actually switched to recovery systems:
- All critical systems fail over to DR infrastructure
- Production operations continue on DR systems
- Systems fail back to primary after verification
Full failover tests are the most realistic but carry the highest risk. They require careful planning and should only be conducted when the business can tolerate potential disruption.
What Recovery Metrics Mean
Recovery Time Objective (RTO)
The maximum acceptable time from disaster declaration to restored operations:
| RTO | What It Means | Typical Cost |
|---|---|---|
| 1 hour | Mission-critical systems | Premium |
| 4 hours | Business-critical systems | Moderate |
| 24 hours | Important but not urgent | Standard |
| 72 hours | Can tolerate significant downtime | Budget |
Recovery Point Objective (RPO)
The maximum acceptable data loss measured in time:
| RPO | What It Means | Backup Frequency |
|---|---|---|
| 1 hour | Can lose up to 1 hour of data | Hourly backups |
| 4 hours | Can lose up to 4 hours of data | 4-hourly backups |
| 24 hours | Can lose up to 1 day of data | Daily backups |
| 1 week | Can lose up to 1 week of data | Weekly backups |
Your MSP should define RTO and RPO for each critical system and test against these targets. Our MSP Backup and Disaster Recovery guide covers backup strategy in detail.
How to Verify Your MSP's DR Testing
Ask for Test Reports
Request documentation of the most recent DR test, including:
- Date of the test
- Scope (what systems were tested)
- Methodology (tabletop, parallel, or failover)
- Results (what worked, what failed)
- Issues found and remediation actions
- Actual RTO and RPO achieved vs targets
If your MSP cannot produce this documentation, testing is either not happening or not being recorded.
Request RTO and RPO Commitments
Ensure your MSP contract specifies RTO and RPO targets for each service tier. Without contractual commitments, there is no accountability for recovery performance.
Attend DR Tests
Request to participate in tabletop exercises. This ensures you understand the recovery process and can provide input on business priorities.
Review DR Infrastructure
Ask your MSP to demonstrate the DR environment. Where is it located? How is it maintained? How current is the data? What capacity does it have?
The Testing Calendar
A practical testing schedule for MSP clients:
| Test Type | Frequency | Participants |
|---|---|---|
| Tabletop exercise | Quarterly | MSP + client leadership |
| Backup restoration test | Monthly | MSP technical team |
| Parallel DR test | Semi-annually | MSP + client IT |
| Full failover test | Annually | All stakeholders |
| Communication test | Quarterly | MSP + client leadership |
Red Flags in MSP DR Testing
"We Test Automatically"
Some MSPs claim their monitoring tools "automatically test" backups. Automated verification confirms that backup files exist and are readable — it does not verify that systems can actually be restored and function correctly. Automated checks are useful but insufficient.
No Client Involvement
If your MSP tests DR without any involvement from your team, you are not prepared for a real disaster. Recovery requires coordination between the MSP and the business.
No Test Documentation
If there are no written test reports, the testing is not being taken seriously. Documentation creates accountability and provides evidence of due diligence.
Always "Successful"
If every DR test is reported as "successful" with no issues found, either the tests are too superficial or the reports are not honest. Real testing reveals real issues — and that is the point.
No Improvement Actions
If the same issues appear in consecutive test reports without remediation, the testing process is not driving improvement.
DR Testing and Compliance
For regulated industries, DR testing is a compliance requirement:
- Essential 8 — data recovery and backup controls require testing
- ISO 27001 — requires testing of business continuity and disaster recovery plans
- SOC 2 — business continuity testing is a trust service criteria
- APRA CPS 234 — requires testing of information security controls including incident response
Our Essential 8 Implementation Checklist includes data recovery requirements.
The Bottom Line
Disaster recovery is not a set-and-forget activity. It requires regular testing, documentation, and improvement. Your MSP should be conducting DR tests and providing you with evidence of results.
If your MSP cannot demonstrate that they test DR — with documentation, results, and improvement actions — you have an untested recovery plan. And an untested plan is no plan at all.
Use our MSP Health Score to evaluate your MSP's operational maturity, or review our Essential 8 Guide for data recovery requirements.
Was this helpful?