Many network architectures will use multiple network interface cards in a single device. In this video, you’ll learn about NIC fault tolerance and link aggregation.
<< Previous: Troubleshooting Mismatched MTUsNext: Troubleshooting Firewall Security Issues >>
Let’s look at how we can provide fault tolerance for network interface cards inside of a server. This is something called “load balancing failover,” or LBFO. This allows you to aggregate bandwidth so you can get more information going back and forth that server, and at the same time have some redundancy so that if you lose one of those connections the network will still be up and available.
This is important in the physical world. It’s becoming even more important when we deal with virtual servers because inside of a single physical device we might have hundreds of servers that need a lot of bandwidth and a lot of resiliency if anything happens to the network.
We configure this fault tolerance by installing multiple interfaces inside of our server. This might be a single interface card with multiple ethernet connections, or it might be individual ethernet cards, all with one or multiple connections on those. But when we configure them in the operating system, it looks and feels just like a single adapter to the OS. This way we can integrate with external devices, but to the operating system it looks like one big ethernet connection.
All of these interface cards keep track if their working or not based on hello messages that they send to each other. If any of those interfaces is suddenly no longer available, it knows not to use that interface, but continues operating using all of the other available interfaces. One way to provide this fault tolerance is using link aggregation. In this case, we would have a single device with multiple interfaces, and both of those interfaces are connecting to a single switch.
We’re configuring this switch to interpret both of those as a link aggregation connection. And then it begins sending traffic across both of those, generally load balancing across them as well. If either one of these interfaces is disabled or cable is unplugged, this connection will remain up because you have another connection in this link aggregation that’s still available. And of course you can use more than just two. You can have multiple interfaces all combined together as one single link with link aggregation.
A similar configuration, but you can see the architecture is a little bit different, is one where we have simple fault tolerance enabled. We have multiple interfaces still inside of the server, but instead of connecting them both to a single switch, we connect them to multiple switches. This provides even more fault tolerance because if we lose an entire switch we still have a way to communicate out to the rest of the network.
And many switches will allow you to configure the network in a way that will also provide load balancing across these connections as well. So the server can send and receive information over any of these links, but if any of them go down it will continue to stay up and running by sending all of the data over the remaining connections.