The devices on our networks can create a massive amount of metrics, and it’s the responsibility of the network team to manage all of those logs and create some performance reports. In this video, you’ll learn about the management of logs and how all of that raw information can be converted into useful graphs and reports.
<< Previous: Baselines and BottlenecksNext: Utilization Statistics >>
The concept of log management is almost a science unto itself. It seems easy enough. You collect some information. You create some charts and some graphs, and you’re done. But the reality of this is so very different.
One of the challenges is that the data is coming from so many different sources. And we’re gathering the information in so many different ways. You’ve got logs that are in your web servers, your database servers, and your file servers. And all of those log formats are different. You’ve also got logs in your routers, in your switches, in your firewalls, and your security devices. And all of those are providing you with different kinds of information.
But the information itself is somewhat correlated. And that’s the challenge, is that you first have to get all of the data in one place. And then, you have to find a way to present that data in a way that makes sense.
Information from our infrastructure devices– so our routers and our switches and our firewalls– are usually being sent via syslog to a central syslog consolidation server. It’s usually stored in a very large drive array or series of drives, or some type of storage media that has a lot of space available.
You’re going to be grabbing log files from so many different devices, and you want to be able to store this information over time so that you can create some interesting trends of the data. In some cases, you are mandated by federal, state, or local requirements to keep that data a certain amount of time. So in those cases, you really do have to make sure that you have exactly the right amount of disk space so that you can go back 30, 60, 90 days, or even more than that.
There are also things you can do with the data to make it a little bit easier for the storage. For instance, you may want real-time information, or relatively fast information, if you’re looking at it over the short term. So you may be taking samples of CPU utilization, for example, every minute from a server. And you can pop up for an entire day and see, what was the CPU utilization for every minute of the day?
When you start examining the CPU utilization over a longer time frame– maybe an entire month– you don’t necessarily need one-minute granularity. Instead, you could do something that we call rolling up the data. You would take these one-minute samples and convert them into five-minutes samples, which would be an average over that five-minute period. And then, you store the five-minute sample for a 30-day time frame. So you’re effectively decreasing the amount of space that you would need by five. This way, you’re able to still maintain and get an idea of what the baseline was, but you’re not using up a lot of storage space to be able to do that.
After 30 days, maybe you also have a rule that rolls up the data to one-hour sample times, so that if you’re looking over the last three months, you’d be able to get a feel for what utilization was for one-hour averages. The exact metrics that you’re using to roll up, and the exact frames and sizes of those roll-up policies, will vary depending on how important that information is to you. And you’re generally changing these and tweaking them as you decide how you want to be able to report on these metrics.
Now that you have this data in one central repository, you can start thinking about the reports that you’d like to generate. You first have to examine the type of data that you have. And generally, we store data in two separate formats. One that would be the raw log files themselves. They’ve not been processed. They’ve not been summarised. You would have to go through the entire log file to be able to gather information from that.
Another type of data may be a summarized set of metadata, where you’ve already gone through the log file, and you’ve created some roll-ups yourselves of the type of data that consists in the log file. Usually, reporting on the metadata is much faster than reporting on raw log file information. But the metadata doesn’t always contain everything you need. And if there’s something very specific you need you may have to go through the entire log file to report on that information.
Now, we have to think about how we’re going to create those graphs and reports and those tables. Many times, we’re storing this information in a SIEM– a Security Information and Event Manager. And the SIEM is not only going to be a storage repository, but it often includes software that’s able to gather information and details from that data, and create reports and graphs for us.
An important thing to consider is that, depending on the graph or the chart or the table that you’d like to create, it may require some extensive computing resources. If you need to go back over a three-month time frame, and the only way to gather this information is to go through raw log files, it might take quite a bit of time to finally create that final report. If you’re accessing metadata and you just want to know for the last day what was the utilization on a single device, that report may only take a few seconds to gather. So you need to think about how you’re going to be gathering this information and where it’s coming from, so that you can understand exactly what it’s going to take to create that report.
Sometimes the job is halfway done for you, because a number of these SIEMs and reporting tools already have a number of built-in reports ready to go. So you can click on a report, click on a time frame, click Go, and it will create the report for you. Occasionally, your manager will come to you and want something very specific. You want to know something that, perhaps, isn’t in the built-in reports. And a number of these SIEMs have report writers as well, where you can drag and drop or create some advanced queries so that you can build a report that’s specific for what you’re looking for.