Web Content Filtering and Censorship
Traffic filtering might be a legal or compliance requirement for some networks, it might be deployed to prevent the network being used to access content that’s illegal, or even to protect against browsers downloading harmful data from particular servers. Conversely, individuals might want to set up something like Privoxy, to protect their privacy and gain some control over the data they share with third parties.
Of course, sometimes traffic filtering is done for political or ideological reasons.
It's important to understand roughly how the different types of traffic filtering work, and how these types of filtering are often combined in traffic blocking systems.
In a TCP/IP packet, there is the Internet Protocol (IP) component, which deals with how the packet is routed - this will include source address, destination address and TTL fields. The TCP component deals with the content of the data being communicated, and will have fields for the payload, session control flags, port number, etc.
Additionally, before the TCP/IP packet is even sent, the client might send a DNS request to determine the IP address for the URL. There are methods of preventing communication by blocking requests for certain URLs.
A paper called The Great Firewall Revealed (Global Internet Freedom Consortium) discusses the three methods in detail, but the following are basic descriptions of how they work, and how each can be countered individually.
IP Address Filtering
The destination IP address is checked against a blacklist of known addresses, whether they refer to blacklisted or proxy servers. In larger networks this is more commonly deployed at the gateway level, as TCP inspection can be resource-intensive where high loads are involved. Even in relatively advanced systems, sites must be reviewed and added to the blacklist manually, which makes it ineffective against undiscovered proxy services that become available.
Works by inspecting the packet being routed to determine whether given keywords exist in the payload. This is how the traffic filtering system determines whether to block or allow the site based on the content of the web pages, or whether the user has submitted certain information or queries. This can usually be defeated by encrypting the payload, usually through SSL/HTTPS by changing the URL from ‘http://‘ to ‘https://‘, assuming the browser doesn't have a TLS certificate installed that's signed for the organisation doing the filtering.
DNS and URL blocking are slightly different - one works on the TCP/IP packet and the other on the DNS lookup requests. Here the URL is scanned for keywords, or compared with a domain blacklist prior to the request being resolved, redirected or dropped. Many otherwise decent proxy services become unavailable on certain networks simply because their URLs contain the word ‘proxy’.
An Overview of Proxy Servers
A basic proxy server simply relays traffic between the client and the server, effectively enabling the client to access a blocked server through a different IP address and URL which the filtering system doesn’t recognise. The commonly available web proxies are servers running PHP forms that users enter the URL of whatever sites they wish to visit.
Most proxies themselves shouldn’t be trusted to relay sensitive data, such as bank account information or login details. And, if a law enforcement agency has a compelling reason to determine your identity, it's more than likely able to determine your IP address by acquiring the server logs from whoever is operating the proxy.
How Web Content Filtering Can Be Defeated
All the above techniques are effective against the different methods of filtering traffic, and they can be combined for getting around reasonably advanced traffic filtering. Here the objective is to find a proxy service that uses SSL/HTTPS at an unrecognised IP address, and with a URL that doesn’t contain any keywords suggesting its function.