A security event can have a significant impact to the organization. In this video, you’ll learn how to determine a recovery time objective, recovery point objective, mean time to repair, and mean time between failures.
When recovering from an outage, one of the statistics that managers want most of all is how long is it going to be before we get back up and running. The technical term for this is the recovery time objective, or RTO. The RTO is a time frame that defines how long it is before we can get up and running. For example, your organization may not consider you up and running until both the database server and the web server are operational. The time frame it takes to get both of those systems running would be the recovery time objective.
Another useful measurement would be the recovery point objective, or RPO. Recovery point objective is a point in time where we can say that we are now up and running. For example, we may only consider ourselves operational if we have at least the last 12 months of data available for our customers to reference. And if we have to reload data from our backups, we know that we have to get at least 12 months of data in the database before we can say that we are now up and running. That 12 months of data is referred to as our recovery point objective.
When planning for outages, we need to understand how long it will take to fix a problem that has occurred. This describes the average amount of time it takes to resolve a problem that may have occurred. And that includes both time to diagnose, time to get replacement equipment, time to install the replacement equipment, and then get that equipment configured.
This is often a value that we can change based on what resources we might have available. For example, you might have a contract with a third party where they provide a replacement equipment within two hours if there’s an outage. Or you might purchase additional equipment to have on site, so if an outage occurs, you can simply pull that new equipment out of inventory. This means that you may be able to spend a little bit more money now to decrease the total mean time to repair.
If you’re purchasing new equipment for your network, you may notice that that equipment also includes a value for MTBF, or mean time between failures. This is the estimated time that the system will run before there is another outage, and it’s commonly used for planning purposes to know how risky it might be to use that particular piece of equipment. This might be provided by the manufacturer as a prediction based on the type of equipment that you’re using or it may be based on the historical performance of that equipment over time.
You can perform a rough calculation of mean time between failures by taking the total uptime for that equipment and dividing it by the total number of breakdowns. This allows you to manage the risk of that downtime and predict when there may be issues associated with that particular piece of equipment.