Web Filtering – CompTIA Security+ SY0-701 – 4.5

Many filtering methods are available to protect against attacks. In this video, you’ll learn about content filtering, URL scanning, proxies, DNS filtering, and more.


Many organizations have a firewall that provides them a way to allow or disallow access from certain applications. But what if you wanted to filter on the data inside of those web pages? You can do that by using a content filter. Sometimes you may hear these content filters referred to as a URL filter or simply website category filtering. Sometimes these web filters are designed to control what data is going out and what data is coming in. And this is especially important if your organization deals with a lot of sensitive types of data.

Most organizations will implement some form of content filtering to restrict what type of information is seen in the browser on users’ desktops. If you’re at home performing the same function, we sometimes call this parental controls because we’re filtering what information might be seen by others in your home. And some content filters are designed to block access to known-bad sites. These types of content filters might stop you from visiting a site where there is known to be viruses, malware, and other types of malicious code.

One type of content filter is one that filters based on a Uniform Resource Locator, or URL. Sometimes you’ll hear this referred to as a Uniform Resource Identifier, or URI. If you would like your users to be able to access a particular website, you can add that URL to an allow list. And if you want to block that site, you would add it to a block list. As you can imagine, adding individual fully qualified domain names to these lists can be somewhat difficult to manage. Instead, many of these filtering technologies will group together like URLs.

For example, you can have URLs managed by a category for auction, hacking, malware, travel, recreation, and many others. URL filters are very good at controlling the information that you see inside of a browser window. But obviously, there are many different ways to access data on the internet. In this video, we’ll not only talk about controlling information in a browser. But we’ll talk about other ways to provide content filtering as well.

There used to be a big market for standalone URL filters. These days, this URL filtering capability is commonly built into next-generation firewalls, so you need one single device to be able to manage all of your firewall rules, IPS, and URL filtering. A URL filter built into a firewall assumes that the users are going to be in a place where the firewall is managing that traffic. And in today’s networks, where people are very mobile and there’s many people working from home, you may not have that luxury. Instead, you may want to put the control of those URLs on the client itself.

These would be agent-based content filters that are installed on the users’ desktops and other devices. All of these are, of course, managed through a central console. But the decision process occurs on the user’s device directly. This means we don’t have to be behind a particular firewall or be located on a particular network to have this filtering work properly. Instead, the user can travel and connect to any network they’d like. And the agent that’s on their system will manage the control of the content.

With agent-based systems, we would also need to make sure that the agents were constantly updated with the latest list of URL categories. We would need to push out updates to all of these devices on a regular basis so that we always have the latest list of URLs on those agents. Instead of managing the control of this content from a standalone URL filter or next-generation firewall, some organizations use proxies.

A proxy is a device that sits between users and an external network and allows you to control the flow of traffic through that proxy. With a traditional firewall, the users communicate directly to the websites that may be located on the internet. But a proxy sits in the middle of this conversation and makes those requests on behalf of the user.

So the user makes a request to view a particular web page. The proxy then makes that request to the web page directly from the proxy and receives the response from that website to the proxy. The proxy can then make a decision based on what was received on whether that traffic should be forwarded to the user. And if everything in that response looks OK, it will send that down to the user’s computer.

Since this proxy is sitting in the middle of the conversation, we can have it do a lot more than simply provide URL filtering. For example, we could have this proxy act as a cache. If somebody makes a request to an external server, that information can be saved locally on the proxy. If someone else requests the same web page, the proxy can simply respond with the information that’s in the cache instead of going out to the internet and making a second request.

This proxy can also provide access control, which means it limits which devices are able to communicate to the internet. This control can be based on a username and password provided by the end user, or it may be based on an IP address. With some proxies, we have to tell our application to use a proxy for communication rather than communicating directly to a server. We refer to this as an explicit proxy because we are explicitly configuring that proxy in the application config. There are also proxies that don’t require that type of configuration and are simply able to work without any special configuration on the client.

Since this proxy is able to work without the end user even realizing that it’s there we refer to this proxy as a transparent proxy. This is a configuration of a proxy that we would install specifically for users to gain access to the internet. We often refer to this as a forward proxy. Sometimes you’ll see it referenced as an internal proxy. With a forward proxy, the user and the proxy are in the internal network of the organization. And generally, the organization has control over the configuration of that proxy.

The user makes a request to the proxy, and then the proxy makes its own request to that website on the internet. The proxy receives a response from that website where it can then provide additional security, such as URL filtering and checking for any type of malware. Once the proxy has checked this data and it knows that all of the information is safe, it can send that response down to the user.

These content filters and URL filters are designed to block based on a fully qualified domain name. And you could configure a block filter with a specific fully qualified domain name, such as professormesser.com. You can also set these rules based on a category of website. Most URL filters have over 50 different categories of sites, including adult, educational, gambling, government, home and garden, and many more. This allows you to set some very granular controls over what types of sites might be allowed and what sites might be blocked.

For example, in your organization, educational sites might be allowed. Home and garden sites might be allowed, but a message is put into a log or an alert is sent when someone visits one of those pages. And if someone tries to visit a page that’s categorized as gambling, it may be blocked by your URL filter.

Some content filters and URL filters look at more than just a fully qualified domain name. They might evaluate the reputation of that site and be able to allow or block based on the perceived risk of the data on that site. Websites with a good reputation would be allowed through, and anything with a bad reputation would be blocked by the URL filter.

There are also different levels of reputation. It would not be unusual to see a URL filter with trustworthy, low risk, medium risk, suspicious, or high risk as categories you can associate with a website’s reputation. With millions and millions of websites, it’s not possible to manually look at every site and provide a reputation for every one. This process is often automated. A scan will look at a website, evaluate the information that was received from that site, and make a determination on what reputation should be associated with that URL.

Of course, you can manually assign these reputations as well. If there is a particular site where you may not agree with the automated reputation, you can manually set the reputation yourself. This allows you to set some granularity with your filtering. If anything is categorized as high risk, you may decide to block that traffic. And anything that has been clearly marked with a reputation of trustworthy would be allowed.

There’s also a way to provide content filtering without a next-generation firewall, a proxy, or a URL filter. Instead, you can use DNS filtering. This is the Domain Name System. And every time you connect to a website, this is the device that provides an IP address when you give it a fully qualified domain name. There are many domain names that are known to have questionable content or may simply contain malicious code. In those cases, we can configure the DNS to not provide the user with the IP address of that site.

All of this information is automatically updated in the DNS server using real-time threat intelligence. And there are both commercial lists and publicly available lists that you can use for DNS filtering. This means that a user who makes a request to visit www.malicioussite.org will not receive the IP address of that malicious site from the DNS server. Instead, they are either provided with a default IP address or no IP address at all, and the connection is simply not made.

One nice feature of this DNS filter is it works on more than just web pages. If someone has installed malicious software that is trying to communicate to a command and control server, it may create a DNS request to receive the latest IP for that command and control. And if you have DNS filtering installed, that lookup for the malicious site will fail and hopefully will restrict the capabilities of that malware.