With the right process, you can fix anything. In this video, you’ll learn about a network troubleshooting methodology that will help you find the solution to all of your problems.
<< Previous: Basic Forensic ConceptsNext: Command Line Tools >>
When something is broken on the network, it can be a challenge to find where the problem is and how to resolve it. In this video, we’ll look at a troubleshooting process that will help you solve any problems on your network. If you don’t know what’s happening, then you certainly can’t fix it. So your first step should be to gather as much information as possible.
You want to even try to duplicate the problem if you can. That way it will be easier to test if the problem was resolved later. You may be able to identify one or even many different symptoms that might be occurring because of this particular problem. Make sure you document all of these because you may find that the problem only resolves some of the symptoms, and you might end up having multiple problems on the network.
You can always talk to your users. They are a very good source of information. They’ll tell you exactly what they’re seeing or exactly the experiences they are having while this problem is occurring. And you need to find out if anything changed. If you have a formal change control process, it may be very easy to go back to your documentation and see the last date and time and the details of that change, or you may want to find out if anybody might be in the wiring closet right now moving cables around.
You want to look at each one of these problems individually to try to determine where the problem is happening. If you can, break the problems into smaller pieces and try to resolve a piece of the problem at a time. Now that we’ve gathered all of this information, we should take a step back and think about all of the different things that might be causing this particular problem. We should start with the obvious things, or certainly the things that are most common on our network, but we should really consider every possible thing that’s out there. Even things that might not be completely obvious at first may still have something to do with the problem at hand.
We should then make a list of every possible cause for this particular problem. We should, of course, start with the easiest things on the top, but we want to list out as many as possible so that we can then make a determination from this list where we should start and where we should stop with our troubleshooting process. It’s now time to put our list of theories to the test. So we’ll start with the very first theory that should be the easiest one, and we’ll see if that resolves our particular problem. We’ll then want to evaluate– was this something that resolved the issue or did it not resolve the issue?
And if it didn’t work, then we need to try the next theory on our list, or we may need to go back to the drawing board and come up with a completely different list of theories that might be associated with this issue. This might also be the point in the troubleshooting process where we bring in additional help. Now that we’ve collected all of the information, we can now provide that to an expert who might give us some more ideas about how we might resolve this issue.
If we’ve created a theory, we’ve tested it, and we see that in our lab, this resolves the problem, then now it’s time to apply that particular fix to the actual issue at hand. We want to build a plan that will allow us to implement this fix in the easiest way possible. In many cases, you can’t change anything on the network or with any of your devices during production hours. You might have to schedule a change control so that you can come in and apply this particular fix.
When you implement a plan like this, you need to have a plan A, a plan B, and perhaps even a plan C. Sometimes implementing the fix can cause problems with something else or it’s not able to be implemented in the way that you thought. This might also be a good time to have a back out plan. If you implement this and something else is having a problem, you need to have some way to get everything back to the way it was before you started this process. When the change control window arrives, you now have exactly a plan of what needs to be done during this very short window.
Sometimes you may need third party assistance to be able to get everything done at once, and it’s at this phase where you’re really doing the work to implement the fix. We don’t know if we really fixed anything until a test has been done. So a test should be a normal part of your implementation plan. You may even want to bring in the customer or the end user and have them perform the test as well so they can determine if this really resolved the problem or if the problem still exists.
This might also be a good time to put in a preventative measure. That way we can put some things in place that might help prevent this problem from occurring in the future. There’s nothing more frustrating than running across a problem and realizing this is the exact same problem you had a year ago except you can’t remember how you fixed it a year ago. That’s why documenting exactly what happened is so important. You can document what this problem was and what your fix was for this particular problem, that way you can refer back to it at any time in the future.
You might also want to create a database for this information. It’s common for Help Desk software to have ticketing features and database functionality built in so that when somebody is opening a ticket, you can reference things that have already been done to fix this problem in the past. If you want a good example of how documentation could help you, you should look at this document from Google Research.
This is “Failure Trends in a Large Disk Drive Population.” Google documented every time a hard drive failed, the circumstances around that hard drive, how long the hard drive had been installed, and many other characteristics. And from that, they were able to make a determination and a prediction on when drives might fail based on all of this documentation that they created. So that’s our network troubleshooting process. We’ll identify where the problems are, we’ll create some theories about resolving the issue, and then we’ll test those to see if they really resolve the issue.
And if they do, we can then create a plan to implement the fix. We can actually put the plan into place, make sure that everything is working properly at that point, document what we did, and now everything is back up and running. And by following these very standard set of troubleshooting tasks, we can resolve any type of issue we might find on our network.