When designing a network, there are many different considerations. In this video, you’ll learn about resilience, cost, responsiveness, scalability, and more.
When we connect to a website or start an application, we want to login, run transactions, or use the capabilities of that website. We expect that those resources will be up and running, and we refer to that as “availability.”
In the world of security, there’s a bit of a nuance with availability. We want to make sure that our systems and resources are available, but we only want to be sure they’re available to the right people.
Information technology is very focused on availability. We spend a lot of money on purchasing redundant systems. We install very large and complex monitoring systems just so we can get information on how available our systems and resources might be. This is such an important metric that our success is often described as how much availability we’ve had.
You’ll often hear the summary of uptime described as a percentage, for example, our network has been available for 99.999% of the time over the last 12 months. Even with this emphasis on providing availability, there will still be times when a device or resource is no longer available. When that occurs, we want to know how quickly we’re able to recover.
This is probably one of the most common questions you’ll get whenever you run into an outage or you experience downtime. The problem is it’s difficult to determine just how quickly it’s going to take to get everything back up and running.
We have to first determine what the root cause of the outage happened to be. If hardware is our issue, we may need to replace that hardware. There’s a process for that to occur. If the issue is with software, we would need to provide a patch or a fix for that software. And if there are redundant systems, we may need to install or replace those redundant components.
One good measurement of resilience is MTTR, or mean time to repair. This describes the length of time that it would take to replace something that is no longer available with components that are available. Although the cost of a component or software to install on your infrastructure is not the only basis for making a decision, it’s still a very important consideration.
Whenever we’re putting together a plan to install a particular technology, one of the very first questions we ask is, how much is this going to cost? It can be difficult to calculate this cost value because it’s not any one thing. For example, we may have an initial installation cost, and that installation cost may be different depending on the platform that we’re installing it on.
Of course, once it’s installed, it will need to be maintained and there are certainly costs associated with maintaining that software or that resource. Those costs for maintaining may go directly to the manufacturer of the technology, or these may be internal costs that we have to cover. And of course, if something breaks or it needs to be replaced, there’s a cost for doing that as well. And if you ask the accounting team how much this costs, the answer becomes much more complex.
We have to consider depreciation, this is a capital expense. We have to think about the operational costs associated with this, and there most likely will be tax implications for all of those.
Another important infrastructure consideration is responsiveness. When we send a request to a service, we’re expecting a response back as quickly as possible. And as humans, we tend to be very sensitive, If there’s a delay in a response. This is a common metric associated with interactive applications, where we’re asking a request of a service and then we’re waiting to receive the response.
Responsiveness can also be challenging to quantify in some cases, because there may be multiple steps that occur for a single transaction. That means that the responsiveness may vary widely depending on what is happening with that particular application. When we calculate responsiveness we tend to look at the overall functions of an application.
Some of those functions may be occurring very quickly and are very responsive, whereas others may require additional work and it might take more time to receive a response from that app.
Many of the applications we use may be subject to a different amount of usage depending on what time of the day it might be or whether it’s the end of the quarter. If an application is highly used, we might want to increase the capacity of that app so that everybody can use that particular application without running into delays. We often refer to this as “elasticity” to describe how quickly we may be able to expand and contract the footprint of that application.
You might naturally think, why wouldn’t you build out the application to be the largest that it could possibly be? And the quick answer is that it costs money to have that type of scalability. One way that many organizations deal with that is to build out an application instance that matches the current load for that application and then, as more people need to use that app, we might extend or scale this application to be much larger.
Sometimes this can be done automatically behind the scenes and the end user has no idea that you’re increasing scalability as the load increases. From a security perspective, we need to make sure that we’re monitoring the entire application regardless of how it might scale. So, we might have security tools that are able to monitor the base configuration, but we also need to extend those tools if we happen to increase the scalability and add more resources to that application instance.
We often refer to applications as if they are one single thing. But in reality, there are many moving parts that make up a single application instance. For example, we might need a web server. That web server communicates with a database server. You’re able to increase performance by adding a caching server, and to protect all of this, you might need a firewall. And those might be just the base components of any application instance. This may be a relatively complex implementation, with tens of different components all working together.
When deploying these applications, you not only need to consider the technical infrastructure, but you’ve also got resources on the hardware that’s available. You might have certain budgets for building these out in the cloud, and of course, you always need to consider the process of change control. For many cloud based deployments, this might be a relatively easy process.
This may be completely automated through orchestration, which is a description of building out a cloud based infrastructure automatically on demand. This is one of the significant advantages of a cloud based infrastructure, is that you can very easily and programmatically build out an entire application instance in a moment’s notice.
This is why project management and project planning is so important during the deployment phase. You have to consider where this particular application instance will be deployed, who will be deploying it, and any additional resources you might need. If any one of those happens to be missed, it’s possible that this could delay the entire implementation.
With the complexities and risk associated with today’s technologies, it would not be unusual for an organization to want to transfer that risk to a third party. One common way to do this in the world of Information Technology, is to take out cybersecurity insurance. One common use of cybersecurity insurance, is to protect against ransomware.
This obviously doesn’t stop ransomware from occurring, but if your organization does have a financial loss based on a ransomware attack, you may be able to recoup some of that loss through cybersecurity insurance. This insurance might cover any outages that occurred during that particular security event. And if you do have some business downtime and financial losses, this might be able to provide some recovery of that financial loss. This might also help an organization to minimize the risk associated with what happens after a security event.
It’s not unusual for a downtime that causes financial loss to the customers to result in some type of legal proceeding the cybersecurity insurance may be able to assist with costs associated with hiring a lawyer and dealing with that legal process.
It’s certainly true that the longer it takes to perform a particular task, the more money that’s costing an organization, and that’s why if you’re trying to recover from some type of outage you should already plan to have that outage recovery process be as efficient as possible. For example, let’s take the situation of a malware infection.
We know that this system has malware on it. We know that the best practice is to delete everything on that system and start over again. One way that you could do this is by reloading the operating system from the original installation media, and that might take as long as an hour or perhaps you’ve already made images of your systems and you can easily recover from an image backup in approximately 10 minutes.
Although both of these recovery processes result in the same working system, one of them is much easier to process and indeed much less expensive. If you’re planning on implementing a new application or a new service in your organization, you might also think about what it will take to get this system back up and running if it runs into a problem and find the easiest way to recover from that particular situation.
They say that the only constant is change, and that certainly applies to the process of patch availability. We spend a great deal of time going through the patching process to fix bugs, provide security updates, or make our systems more available. Usually once we install an operating system or an application, one of the very first things we do, is check to see if there are any updates before deploying this into production and very often we will need to install additional patches that were released after this installation media was created.
In most organizations, this is a normal part of the IT process. We know that we will get monthly patches from Microsoft, and patches will arrive from other organizations throughout the month. Our organization of course will need to test all of these patches, make sure that it’s not going to break something that’s currently working in our production environment, and then we will need to deploy those to production as quickly as possible.
If you’re working for an organization that doesn’t seem to have an emphasis in the patching process, then you might want to reconsider your security posture as it’s associated with those systems. If you’re not going to patch a system or an application, you leave it open and more susceptible for some type of exploit. And from a security perspective, we want to be sure to limit any instances of a security risk.
But in some cases, you might find that an organization doesn’t release patches and there’s no process in place to patch that particular system. We see this often with embedded systems like HVAC controls for your heating, ventilation, and air conditioning, or it may be time clocks. These are purpose-built systems. There’s no internet access or other type of connectivity to this device. So there’s no way to easily patch what’s inside of that embedded system.
This is relatively short-sighted these days because attackers will find any technology and find a way to exploit that technology. There may certainly be security issues associated with the software that’s running on these systems, but until you find a patch, it will remain vulnerable. It might be a good idea to provide additional security around this particular device.
For example, if these devices are connected to the network, you might want to have a firewall between this device and the rest of the world. If only to provide another layer of security. One thing that today’s technology can simply not do without is power. We must have electricity to be able to run all of these systems, whether it’s something that is on premises or in the cloud.
Unfortunately, we often don’t think about our power infrastructure as being something that we might want to monitor. But in fact, it is one of the most important components of everything that we do in technology. It’s often a good idea to bring in a licensed electrician who can look at your current power usage and give you plans for extending this power in the future, and those requirements may be quite different depending on what you’re doing.
For example, if you have a data center, your requirements for power will probably be very different than the power requirements for an office building. For most organizations, there is a single primary power provider for your particular geography, but if you’re living in a very populous area, you might have multiple options for bringing power into your facility.
But there may be times when primary power is simply not available and you may need some type of backup. In those cases, you might want to look at a UPS, which is an uninterruptible power supply or something like a generator that can provide some or all of your organization’s power requirements.
In today’s cloud based environments, we tend to take the resources we need for an application instance and break them into their smallest components. One of the components that’s going to provide the heavy lifting and processing is the compute component. This is something that’s more than a single CPU and in the cloud it’s probably called a Compute Engine.
This is the part of the entire process that does the actual thinking and processing of the data. It might be a single processor running in a server that’s in our data center or it may be multiple processors that are located on multiple cloud based technologies, and we’re able to scale those systems to be whatever size we might need for the computing requirements of our applications.