Performance Issues - CompTIA Network+ N10-009 - 5.4 - Professor Messer IT Certification Training Courses

There’s never enough bandwidth. In this video, you’ll learn how to troubleshoot congestion, bottlenecks, latency, packet loss, and jitter.

<< Previous: Routing and IP Issues

Next Video: Wireless Issues >>

Our networks run at a predefined speed. For example, a 1000BASE-T gigabit network operates and sends traffic at 1 gigabit per second. This network cannot pass traffic faster than 1 gigabit per second.

But what if you have multiple 1-gig links plugged into a switch, and both of those connections are sending traffic at 1 gigabit per second to the same destination? Obviously, we can’t fit 2 gigabits per second into a 1 gigabit per second link, and in those cases, we run into congestion. This contention of both networks trying to send information at the same time will certainly queue up some packets into a buffer, but eventually, we will fill up that buffer and have congestion.

These buffers that are in a switch or a router are relatively small, and they’re not going to hold a lot of packets. So eventually, packets are going to need to be discarded so that we can keep the system running. This means that we’re going to lose information that we’re sending between one station and another. So to resolve that, we will need to either increase the size and speed of the network, or we will need to decrease the amount of traffic that’s going over that network.

Often, people will say the network is slow. And what they’re really saying is that there’s some type of bottleneck on the network that is causing a slowdown. Unfortunately, this can be very difficult to troubleshoot because the problem might be with a number of different technologies.

It might be related to the bus of the system that you’re using, maybe the speed of a CPU inside of a switch or a router. Perhaps you’re using a hard drive or an SSD, which would be very different than the speeds of that storage drive. And, of course, we have different networks with different speeds that are sent across different locations. We have to look at all of these different parameters between one device and another to really understand where the bottlenecks might be on a network.

Sometimes it might be very obvious what’s causing this bottleneck, but often, you need to drill down into the details of these systems to be able to understand what resources are being used or slowing down that’s causing this problem for everyone else. Here’s an example of a web transaction response time. You can see transactions on this network were running somewhere around 1,500 to 1,750 milliseconds. That’s almost 2 seconds of delay when somebody requested data.

Notice that there was a lot of time that was being used by the database. In this particular example, it seemed obvious that our problem was somehow located in the database server itself. And by making some configuration changes to this database server, we were able to eliminate this bottleneck. And you can see the response times went down to around 500 milliseconds.

If you want to know how much a network is being used, you might want to look at the bandwidth percentage. This is a measure of how much a network is being used over a particular amount of time. Usually, this is a percentage that’s being presented so that we can understand how much of this network is in use during that time frame.

We might also want to measure throughput, which tells us how much data we were able to move through that network during that time frame. This gives us information about what size of data we’re able to move and in what time frames we’re able to move it. There are different ways to monitor bandwidth statistics. You may be able to gather these directly from a switch, or you may use SNMP or NetFlow to be able to gather this over time. And if you’re looking at bandwidth usage over a number of different links between two devices, you’ll find that the slowest link is the one that probably is holding up the throughput for all of the other networks.

Latency is calculated as the delay between the request and the response. Whenever we’re measuring latency, we want to know, how fast or how slow is this transaction occurring? There will always be some type of latency on a connection because it takes time to move that information from one device to another. But ideally, we will want to measure these response times at every stop along the way. This allows us to understand just how quickly we can move data from one segment to another, and we can break it down into its smallest parts.

To get a true measurement of this, we would need some type of measuring tool in every single network link along that path. So this could be rather involved to be able to set up all of those devices. But once you do, you’ll be able to capture the packets and determine what the true latency is of that connection.

Because you’re capturing packets, you have microsecond granularity, so you’re able to know exactly how long a packet stayed in a device, how long it took to traverse that network, and how long it took to forward to the next segment. One of the problems a network administrator would like to avoid is packet loss. If we’re sending traffic across the network, the ideal situation is for all of that traffic to make it across the network all of the time. But there will be scenarios where that information simply can’t make it across the network for one reason or another.

A packet loss or a discard means that there weren’t any errors with the packet, but some other reason caused us to discard that packet instead of sending it to its destination. This may be due to an outage on the network, or it could be that we have contention. We simply don’t have enough bandwidth to send all of that traffic across the network.

Sometimes we’re on a bad wireless network. Maybe we have a bad cable and the information we’re sending across the network becomes corrupted. When that corruption occurs during the transmission of that data, it’s identified when it reaches the other side. And because that data is corrupted, it is completely discarded and we have to re-send that traffic across the network. This takes additional time and resources and could cause significant delays to your application.

Talking on a Voice-over-IP phone call or watching live video streams are very sensitive to any type of delay you might put on the network. We would like these packets to arrive at regular, predictable intervals. And as long as that’s happening, we can continue to have our phone call or watch our live stream, and everything is working as expected.

But if we have congestion on the network or the packet is corrupted, we have to discard that packet. We can’t rewind our conversation or rewind the live stream. We simply have to discard that packet and continue forward. This might cause a delay or a clicking noise on the phone call, or we might see a small stutter on our live stream.

To be able to see if we’re receiving frames at regular intervals, we can measure the jitter on the network. Jitter is the time between those frames, and we would like that jitter value to be a consistent value all the time. This is what we would like to see, which has a relatively small amount of jitter because there’s a small amount of delay between each of these packets. There is a little bit of variability between each of these packets, but they are being received at regular intervals.

If we’re having high jitter values, then we might have three of those packets come through, and then a long delay, and then three more packets very quickly, and then another delay. It’s these high jitter values that give you problems hearing on a phone call or give you that stutter during a live video feed.