This week’s blog post was co-written by Cloud solutions experts, Anando Chatterjee and Ashish Patel.
Outages are Expensive!
Consider the recent IT outages that hit two major airline companies in the United States – Southwest Airlines & Delta Airlines. A failed networking device at Southwest’s operations centre rendered its computer systems offline for several hours which led to the cancellation of nearly 2000 flights over the next few days. Major news outlets predicted the loss to be somewhere between $54M and $82M. Similarly, a power outage at its main operations centre in Atlanta led to the cancellation of about 2000 Delta Airline flights over the next couple days. The cost? About $150M to its passenger revenue.
It’s quite evident from both of these events that they were caused by infrastructure failures as opposed to natural causes such as floods, fires or earthquakes. In fact, a recent study conducted by iLand which included responses from 250 IT decision makers from the UK, indicated that only 20% of outages were caused by natural disasters. It was further discovered that the two biggest contributors to IT infrastructure downtime were system failures and human errors.
If most outages are caused by system issues and human error, it would seem that every organization is vulnerable to disasters regardless of their geographic location or the nature of their business, and that planning for a disaster should be an organizational priority and not something that only IT departments should be responsible for.
Start planning now
To start off, it’s important to assess your IT department’s existing ability to withstand and recover from disasters. For example, some of the things that can be quickly assessed are how often data is backed up, whether there are multiple data centres and what the current processes are to address technology issues. This would provide a baseline from which the organizations future DR capabilities can be grounded on.
The next step would be to determine the business’s actual needs in terms of Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) – the acceptable amount of downtime and data loss, respectively. A Business Impact Assessment (BIA) needs to be conducted at this point to better understand the potential annual financial risks caused by downtime. The BIA is based on facts and statistics and would reflect whether the RTOs and RPOs desired by businesses are so costly to achieve that they end up outweighing the costs of downtime. This is an essential tool to help achieve consensus between business and IT, and will allow IT to begin the budgeting process for the required DR capabilities. It’s important to remember that how money is spent is more crucial than how much money is spent. Numerous studies have shown that organizations that dedicated a larger percent of their IT budget to DR actually had longer RTOs and RPOs.
The third and final step in a DR planning process should be to set Key Performance Indicators (KPIs) and Key Risk Indicators (KRIs). KPIs will help business and IT decision makers to focus on and evaluate which Disaster Recovery (DR) activities should be performed, the timeframe they should be completed in and the level of success achieved within each activity. On the other hand, KRIs would help in identifying and mitigating risks and recognize opportunities for future improvements within the DR program.
Some sample KPIs to consider:
- Number of testing exercises completed annually
- Updates to risk assessments
- Review of roles and responsibilities for DR teams
- Training received by key DR personnel
Some KRI’s to consider:
- Inability to meet pre-defined RTOs/RPOs and by how much
- Delays in completing risk assessments and BIAs on schedule
- Delays in updating DR documentation and runbooks
- Number of natural disasters in the vicinity
TeraGo is there to help you along the way
Get in touch with us so that our experts can guide you all the way from a DR assessment to solution design, implementation, maintenance & testing and should you ever require a fully managed recovery, using leading edge technologies on our fully owned and managed cloud infrastructure. Call 1-800-TERAGO-1 (837-24651)
 http://fortune.com/2016/08/11/southwest-airlines-delays-will-cost-millions/  http://money.cnn.com/2016/09/07/technology/delta-computer-outage-cost/  The State of IT Disaster recovery amongst UK Businesses – survey conducted, reviewed and audited by Opinion Matters Inc.