A wealth of monitoring information is contained in various device log files. In this video, you’ll learn how to consolidate and report on log files and how NetFlow can be used to supplement this data.
If you’re working with routers, switches, firewalls, or other infrastructure devices connected to the network, there’s probably logs inside of that device that can tell you information such as the traffic flows and traffic summaries of the data traversing that device. If you’re looking at logs inside of a firewall, for instance, these logs are often very detailed. They will show you every single flow of traffic that’s traversing that firewall, and give you information about what’s contained within that traffic flow.
These logs are not only useful to use in real time so you can see the status of your network, but you can also store this information and go back in time to determine what may have happened before or after a particular event. Here, for example, is a set of logs from a firewall. Each one of the lines here on the left side is a separate flow of traffic.
On the top line, I have highlighted one particular flow that shows protocol, in this case, UDP has a hostname, a username, a client field, a client port number, a server IP address, a server port number, and then the disposition of this traffic. You also get a detailed view of this along the right side, and as you can see, an extensive number of variables are stored for each individual traffic flow.
This means if you wanted to run a search to look for a particular IP address or a particular port number, you’d be able to view all of the traffic flows on your network that match that particular value. Obviously, different devices will have different types of logs. For example, if you have an active directory infrastructure, it probably is going to have a series of audit logs that allow you to see who might have logged in or logged out of the network.
For example, this audit log shows a message saying that a process has been created. It shows a process ID. Shows the file name. In this case, it was cmd.exe that was used in this process. And you can see more information about the username and the login event. You can also see details on the time, the host name, event IDs, and other specifics. This allows us to keep a historical view of exactly who logged into the network, who logged out, and perhaps who may have tried to log in but used incorrect credentials.
When you start looking at the log files on switches, routers, firewalls, Windows servers, Linux servers, and other devices, you’ll see that each one of those log types is very, very different. But all of these diverse log files contain details that would allow us to correlate data flows together. Although all of the logs contain very different information, we can use a standardized process to retrieve those log files from every single one of these devices using a standard protocol called syslog.
We can configure all of these different devices to send information to a consolidated logging receiver using the standard syslog protocol. In many cases, we’re consolidating this information to SIEM. This is a security information and event manager that allows us to bring all of that data back to one central database. syslog receives this data from a particular device, it logs a facility code, which identifies the program that originally created the log, and it assigns a severity level to the information contained within that log.
This allows us as the network administrator to take action on the information we’re receiving from all of these different logs. Some of these logs will contain informational details that may not need any type of immediate action, whereas other logs may have critical details and alert information that require immediate action. We can use these severity levels as a filter. So we can go to a single screen and say that we’d like to see everything that is a warning level or higher or maybe we want to view all of the debug information sent in every log.
You as the network administrator get to determine exactly what information is important to you at any particular time. You may decide during normal operation that you only want to see alert or emergency log information, but if you’re doing research on a previous event, you may want to expand this filter out to view debug information and higher. It might also be useful to perform ongoing analysis of individual interfaces on these devices to see if there may be any problems with the data traversing the network.
For example, you may be looking at interfaces on a switch, and you may find that it has a large number of runts. The minimum size of an Ethernet frame is 64 bytes. So if you receive a frame that is smaller than 64 bytes, we qualified that as a runt, and in many cases, that means that a collision has occurred on the network. On most of our networks today, we’re using full duplex ethernet so the identification of a runt and potentially a collision is something you probably want to be aware of.
If you’re not using jumbo frames, the maximum size of an ethernet frame is going to be 1,518 bytes. If your networking device identifies a frame that’s larger than 1,518 bytes, it identifies that as a giant frame, and again, this may indicate that some type of communication problem is occurring on the network. If any of these ethernet signals are corrupted as they’re being sent across the network, you may receive a cyclic redundancy check error, or CRC error.
You might also see this referred to as a frame check sequence error, or FCS error. These are commonly caused by a bad cable or bad interface, and resolving the cable or interface causes the CRC errors to cease. And there may be cases where encapsulation errors might occur, where one switch is expecting one type of frame, and the other switch is expecting a different type of frame.
On older switches, for example, you may see trunk links that are using inner switch link protocol, or ISL. These days, we tend to use the 802.1Q standard, but if one side is set for ISL and the other side is set for .1Q, then you’ll have an encapsulation error. If you were to log in to a switch and view the statistics on an individual interface, you might get a message very similar to this. And although it seems like there’s a lot of information on the screen, you tend to start focusing in on very specific areas to get the information you need.
For example, you may want to look at the very first line that shows that fast ethernet interface 0/1 is up and the line protocol is up and connected, which means that this interface should be operating normally on the network. We can look at another line that shows this is configured for full duplex and 100 megabits per second, and the media type is a 10/100 interface. We can also see that there have been 5,701 packets.
We can see 1,157,662 bytes, and 0, no buffer frames. We can also see there have been 0 CRC errors, which means that this particular network should be performing normally. However, we can also see a large number of broadcasts. 5,130 broadcasts and 2,269 multicasts.
By looking at the individual interfaces on a switch, you can start putting together a view of exactly how each interface is performing and if there might be any errors or problems with individual ports. When you’re working with a data center or a computer room, you also have to be concerned about the environment. There are many different variables that can affect the overall health and availability of the network, such as the temperature in the room. These devices can get very hot and need constant cooling, so we want to be sure that the air conditioning that we have in the room is always working properly, so we’ll want to monitor that temperature.
We might also want to monitor the humidity level. If we have a lot of humidity in the air, then we may have some condensation, and we certainly want to keep that water away from our electrical equipment. If we have a low amount of humidity, we may have an excessive amount of static discharge, which is also bad for the silicon. Since we are connecting all of these power devices to an electrical system, it would also be useful to know if we’re receiving the proper amount of voltage and that everything is working as expected, so constant monitoring of the electrical system can be a very valuable metric.
In a very large data centers, you may be using water as part of your cooling system. This means we may want to have flood monitors in different parts of the room to make sure that that water never gets close to our electrical equipment. If you start collecting this information over time, you can begin to create a baseline of what may be expected.
For example, you may create an all events baseline that shows the number of information events that you have coming through during the day versus the debug or notice events, and this can be used to look at event categories, security events, important events, or other type of metrics on your network. SNMP and log files can tell you a lot about what’s happening on the network, but you occasionally would like to have a lot more detail of information that may be contained within the ethernet packets themselves. In those cases, you may need additional collection tools, such as NetFlow.
NetFlow allows you to gather statistics and details from the raw traffic traversing the network. There are many different ways to collect this data. It may require a separate individual NetFlow probe, or there may be NetFlow capabilities built into the equipment that you’re already using. Generally, NetFlow will have a NetFlow probe and a NetFlow collector.
The probe will sit out on the network sometimes as part of a tapped connection or it may be receiving traffic from a switched port analyzer or span connection, and it’s watching all of the packets traverse the network. It’s gathering those details and exporting all of that NetFlow traffic back to a central NetFlow collector. From the NetFlow collector, you can then create detailed reports of everything that’s been seen over time.
This allows you to get extensive information of what may be occurring on the network, such as the type of conversations, the endpoints communicating, the applications in use, and much more. For example, you can get a distribution showing the amount of TLS traffic versus port 0, UDP, in the clear HTTP, and Microsoft SQL server traffic on one view. This will give you some insight into exactly what might be traversing the network and might also help you interact with your security devices to make sure that only the proper information is being sent over the network.
We’ve talked about a lot of different types of metrics in this video, but sometimes, all you want to know is whether a service is up or down. Fortunately, many third party services, especially services in the cloud, can provide you with a list of all of those services and a status of how that service is performing. So if you happen to connect to the status page and you see that a particular problem was recently resolved, you may be able to track that back to your NetFlow or log files to determine if that, indeed, had an impact on your network.