About these ads
Tag Archive | squid

Update: SQUID transparent SSL interception : Squid v3.2

In order to keep this blog post a bit more relevant, there have been some improvements since that post was written. Squid v3.2 has been released earlier this year, making ssl interception more seamless and easier. The new features for HTTPS interception can be found while reading through the man page for http_port:


More specifically:

1. The “transparent” keyword has been changed to “intercept“:

           intercept    Rename of old 'transparent' option to indicate proper functionality.

INTERCEPT is now better described as:

intercept	Support for IP-Layer interception of
			outgoing requests without browser settings.
			NP: disables authentication and IPv6 on the port.

2. In order to avoid more certificate errors when intercepting HTTPS sites, squid now can dynamically generate SSL certificates, using generate-host-certificates. This means the CN of the certificate should now match that of the origin server, though the certificate will still be generated using SQUID’s private key:

SSL Bump Mode Options:
	    In addition to these options ssl-bump requires TLS/SSL options.

			Dynamically create SSL server certificates for the
			destination hosts of bumped CONNECT requests.When 
			enabled, the cert and key options are used to sign
			generated certificates. Otherwise generated
			certificate will be selfsigned.
			If there is a CA certificate lifetime of the generated 
			certificate equals lifetime of the CA certificate. If
			generated certificate is selfsigned lifetime is three 
			This option is enabled by default when ssl-bump is used.
			See the ssl-bump option above for more information.

Looks like the above is an offshoot of the excellent work here: http://wiki.squid-cache.org/Features/DynamicSslCert

Make sure to use the above two features for smoother HTTPS interception – though remember, always warn users that SSL traffic is being decrypted, privacy is a highly-valued right…

About these ads

Dansguardian : lessons learned

To dis-allow users from connecting to a site via IP rather than URL name (so bypassing filtering unless you use the time consuming forward / reverse lookup feature), uncomment the following line in the bannedsitelist:


To enable syslog, the default dansguardian.conf uses:

# Syslog logging
# Use syslog for access logging instead of logging to the file
# at the defined or built-in “loglocation”
#syslog = on

The line “syslog = on”  is incorrect and should be changed to:

logsyslog = on

The facility and priority used by dansguardian is:

In  order for danguardian to display the category when blocking a site, insert the following line at the beginning of each domain blacklist file:
#listcategory: “name_of_category_here”

A quick script to do insert the above mentioned line into each enabled blacklist (note: be careful, these statements are all one-liners):


#! /bin/bash

categories=`cat /usr/local/etc/dansguardian/lists/bannedsitelist | grep -v “#” | grep “Include” | cut -d “/” -f 8`

for category in $categories

echo ‘#listcategory: “‘$category’”‘ > /usr/local/etc/dansguardian/lists/blacklists/$category/domain.new

cat /usr/local/etc/dansguardian/lists/blacklists/$category/domains | grep -v “#” >> /usr/local/etc/dansguardian/lists/blacklists/$category/domain.new

rm -f /usr/local/etc/dansguardian/lists/blacklists/$category/domains

mv /usr/local/etc/dansguardian/lists/blacklists/$category/domain.new /usr/local/etc/dansguardian/lists/blacklists/$category/domains


In order to modify the blocked page displayed, change the following file:


Analyzing SQUID access logs

There are loads of programs on the internet which are squid log analyzers. While this article does address the same thing, it’s presented more with an eye to how you can use standard linux bash scripts to obtain almost any output you want from log files. In the following script I use no python / perl or other high level language. While this means that probably the resulting log analyser is not as fast and efficient as the others which do use them, the exercise is more to show how a non-programming-savvy admin can still make his life easier right from the linux shell. To skip all explanation and go direct to the final script, click here.

Also, the script is highly customizable so that it can parse other logs in other formats– it’s simply an iterative process where you identify which parts of the original log files you need.

In this particular script, I want to parse the access.log files to see which sites are the most visited, and which have the most cache hits. By default, entries in the squid access logs looks like (they should be on a single line):

1305800355.376   5030 TCP_MISS/200 32491 CONNECT m.google.com:443 – DIRECT/ -

1304761104.471   3189 TCP_MISS/204 252 GET http://www.google.co.uk/gen_204? – DIRECT/ text/html

First order of the day is to extract the site. For a human it’s very easy to see it’s the 7th field in the logs. Linux can extract the 7th field by using the “cut” command. So my first instinct was to run:

cat access.log | cut –d “ “ –f 7

which means using the space character as a delimiter and extract the 7th field. However, that works for some but not all the log entries. The reason being that the 2nd field varies in size, leaving a variable number of spaces before it… So we need to eliminate that variable. The solution is to “cut” the output twice. The first “cut” will remove the variable space, the second cut will extract the field we need.

For the CONNECT entry I used:

cat access.log | cut -d “/” -f 2 | cut -d ” ” -f 4

The first cut uses the “/” character as a delimiter and we take out the second field up to thenext “/” character. So we’re left with:

200 32491 CONNECT m.google.com:443 – DIRECT

The second cut just extracts the 4th field using space as a delimiter – and we’re done. But it wont work for the second HTTP entry due to the initial “http://”. This actually makes it simpler for us because if you look at the entry we see the host is in the 4th field delimited by “/”. So we can use:

cat access.log | cut -d “/” -f 4

We now have a way of extracting the hosts. Now we need a way to count the number of times they appear in the logs to get our hitcount. The linux command “uniq –c” will count the number of times a string appears consecutively in a file. Note the “consecutively”. This means that in order to actually count all instances of a string in a file, theses instances must be under each other for example:





and not:





This means we have to sort the file first, which is easily done with the “sort” linux command. One nice feature of the sort command is if you use:

sort file1 file2

It will sort the contents of both files together and output the result. This will be useful to us later on since we’re running different “cut” commands for HTTP and CONNECT log entries.

Finally so as to keep the report brief, we only want to see the sites with the highest number of hits… this means we only want the first few lines of the file – again easy to do with the linux “head” command.

The final script (see inline comments. Note some lines should be on a single line):


#use GREP to extract the host using CUT from log entries containing CONNECT
cat /usr/local/squid/var/logs/access.log.0 | grep CONNECT | cut -d “/” -f 2 | cut -d ” ” -f 4 > squid_ssl.log

#sort the file and report the unique entries, storing the output in a temp file
sort squid_ssl.log > squid_ssl2.log
uniq -c squid_ssl2.log | sort -r > squid_ssl.log

#use GREP to extract the host using CUT from log entries containing HTTP
cat /usr/local/squid/var/logs/access.log.0 | grep http | cut -d “/” -f 4 > squid_http.log

#sort the file and report the unique entries, storing the output in a temp file
sort squid_http.log > squid_http2.log
uniq -c squid_http2.log | sort -r > squid_http.log

#merge the two temp files together in a descending order
sort -r squid_http.log squid_ssl.log > squid_compiled.log

#insert pretty headers
echo “———————————————” > squid-final.log
echo “Most heavily visited sites” >> squid-final.log
echo “———————————————” >> squid-final.log

#use head to show only the sites with highest hit count
head squid_compiled.log >> squid-final.log

#do the whole process for cache HITS
cat /usr/local/squid/var/logs/access.log.0 | grep http | grep HIT | cut -d “/” -f 4 > squid_hits.log
sort squid_hits.log > squid_hits2.log
uniq -c squid_hits2.log | sort -r > squid_hits.log

echo “———————————————” >> squid-final.log
echo “Sites with highest cache hit count” >> squid-final.log
echo “———————————————” >> squid-final.log

head squid_hits.log >> squid-final.log

#cleanup – comment this out for debugging
rm squid_*

The above will give an output like so:

Most heavily visited sites
5734 img100.xvideos.com
4661 stork48.dropbox.com
4378 m.google.com:443
2484 profile.ak.fbcdn.net
1778 http://www.facebook.com
1716 http://www.google.co.uk
1318 0.59.channel.facebook.com
1297 http://www.google-analytics.com
1249 http://www.google.com
Sites with highest cache hit count
335 img100.xvideos.com
192 s7.addthis.com
125 http://www.cisco.com
125 static.ak.fbcdn.net
109 r.mzstatic.com
109 mmv.admob.com
97 ebooks-it.org
92 profile.ak.fbcdn.net
84 pagead2.googlesyndication.com
80 cachend.fling.com

I don’t expect this to be placed in anyone’s production environment,  but if you’re considering it, be aware that it’s probably better to integrate the script with a MySQL database if there is significant traffic. It will make it much more robust, archivable and reportable. If you go down this path, research the following for a start:

mysql -u[user] -p[pass] -e "[mysql commands]

Use your imagination to extend and write your own scripts that can make monitoring and troubleshooting so much easier…

SQUID transparent SSL interception

July 2012: Small update on new versions of squid (squid v 3.2) here

There seems to be a bit of confusion about configuring SQUID to transparently intercept SSL (read: HTTPS) connections. Some sites say it’s plain not possible:


Recent development in SQUID features have made this possible. This article explores how to set this up at a basic level. The SQUID proxy will basically act as a man in the middle. The motivation behind setting this up is to decrypt HTTPS connections to apply content filtering and so on.

There are some concerns that transparently intercepting HTTPS traffic is unethical and can cause legality issues. True, and I agree that monitoring HTTPS connections without properly and explicitly notifying the user is bad but we can use technical means to ensure that the user is properly notified and even gets prompted to accept monitoring or back out. More on this towards the end of the article

So, on to the technical details of setting the proxy up. First, install the dependencies . We will need to compile SQUID from scratch since by default it’s not compiled using the necessary switches. I recommend downloading the latest 3.1 version, especially if you want to notify users about the monitoring. In ubuntu:

apt-get install build-essential libssl-dev

Note : for CentOS users, use openssl-devel rather than libssl-dev

Build-essentials downloads the compilers while libssl downloads SSL libraries that enable SQUID to intercept the encrypted traffic. This package (libssl) is needed during compilation. Without it, when running make you will see the errors similar to the following in the console:

error: ‘SSL’ was not declared in this scope

Download and extract the SQUID source code from their site. Next, configure, compile and install the source code using:

./configure –enable-icap-client –enable-ssl
make install

Note the switches I included in the configure command:

* enable-icap-client : we’ll need this to use ICAP to provide a notification page to clients that they are being monitored.

* enable-ssl : this is a prerequisite for SslBump, which squid uses to intercept SSL traffic transparently

Once SQUID has been installed, a very important step is to create the certificate that SQUID will present to the end client. In a test environment, you can easily create a self-signed certificate using OpenSSL by using the following:

openssl req -new -newkey rsa:1024 -days 365 -nodes -x509 -keyout http://www.sample.com.pem  -out http://www.sample.com.pem

This will of course cause the client browser to display an error:


In an enterprise environment you’ll probably want to generate the certificate using a CA that the clients already trust. For example, you could generate the certificate using microsoft’s CA and use certificate auto-enrolment to push the certificate out to all the clients in your domain.

Onto the actual SQUID configuration. Edit the /etc/squid.conf file to show the following:

always_direct allow all
ssl_bump allow all

http_port transparent

#the below should be placed on a single line
https_port transparent ssl-bump cert=/etc/squid/ssl_cert/www.sample.com.pem key=/etc/squid/ssl_cert/private/www.sample.com.pem

Note you may need to change the “cert=” and the “key=” to point to the correct file in your environment. Also of course you will need to change the IP address

The first directive (always_direct) is due to SslBump. By default ssl_bump is set to accelerator mode. In debug logs cache.log you’d see “failed to select source for”. In accelerator mode, the proxy does not know which backend server to use to retrieve the file from, so this directive instructs the proxy to ignore the accelerator mode. More details on this here:


The second directive (ssl_bump) instructs the proxy to allow all SSL connections, but this can be modified to restirct access. You can also use the “sslproxy_cert_error” to deny access to sites with invalid certificates. More details on this here:


Start squid and check for any errors. If no errors are reported, run:

netstat -nap | grep 3129

to make sure the proxy is up and running. Next, configure iptables to perform destination NAT, basically to redirect the traffic to the proxy:

iptables -t nat -A PREROUTING -i eth0 -p tcp –dport 80 -j DNAT –to-destination
iptables -t nat -A PREROUTING -i eth0 -p tcp –dport 443 -j DNAT –to-destination

Last thing to be done was to either place the proxy physically in line with the traffic or to redirect the traffic to the proxy using a router. Keep in mind that the proxy will change the source IP address of the requests to it’s own IP. In other words, by default it does not reflect the client IP.

That was it in my case. I did try to implement something similar to the above but using explicit mode. This was my squid.conf file, note only one port is needed for both HTTP and HTTPS since HTTPS is tunneled over HTTP using the CONNECT method:

always_direct allow all
ssl_bump allow all

#the below should be placed on a single line

http_port 8080 ssl-bump cert=/etc/squid/ssl_cert/proxy.testdomain.deCert.pem key=/etc/squid/ssl_cert/private/proxy.testdomain.deKey_without_Pp.pem

As regards my previous discussion of notifying users that they are being monitored, consider using greasyspoon:


With this in place, you can instruct greasyspoon to send a notify page to the clients. If they accept this notify page, a cookie (let’s say the cookie is called “NotifySSL”) is set. GreasySpoon can then check for the presence of this cookie in subsequent requests and if present, allow the connection. If the cookie is not present, customers again get the notify page. Due to security considerations, most of the time cookies are only valid for one domain, so you may end up with users having to accept the notify for each different domain they visit. But, you can use greasyspoon in conjunction with a backend MySQL database or similar to record source IP addresses that have been notified and perform IP based notifications. Anything is possible :)

SQUID + GreasySpoon : enhancing your proxy deployment with content adaptation

When comparing the two proxy solutions I am most familiar with, these being BlueCoat ProxySG and SQUID, the most striking difference is the capability of the bluecoat to easily change and modify the traffic passing through it. For the Bluecoat-savvy of you, adding a “Web Access” and “Web Content” layer in policy allows you to modify traffic such as adding Headers, cookies, notify pages, and so on. This sort of modification is what is known as “Content Adaptation”. A SQUID article explains the various options available to SQUID users in doing this:


It definitely doesn’t look very easy to do this. The easiest way I’ve found is using an Internet Content Adaptation Protocol (ICAP) server to modify the traffic for SQUID. I wont go into much details on ICAP, in a nutshell the SQUID proxy sends traffic of interest (such as HTTP) over to the ICAP server, which then parses it, modifies it, and sends it back to the SQUID server. This opens up a lot of opportunities for achieving the same sort of Bluecoat functionality I mentioned previously… such as adding headers, cookies, inserting company headers within a website, and much more.

The easiest and most flexible open source ICAP server i’ve come across is GreasySpoon:


It requires some programming knowledge so it’s not as easy for first-timers but the upside is the possibilities are endless… apart from having a good performance and being cross-platform. If you are going to go through with setting this up, I advise reading through the website, they have some good documentation and script samples.

In this article I’ll be logging my test setup where I’ve used a CentOS 5 machine to host a SQUID proxy and a GreasySpoon server. As a test case, I wanted to instruct GreasySpoon to insert a header into YouTube server responses to force clients to use the HTML5 version of the YouTube site.

- Setting up SQUID proxy server

The first steps is installing a SQUID proxy on the server. In order to include ICAP functionality you need a later SQUID version (3.x). The SQUID versions I found in the CentOS repositories where v2.x, so this necessitated building SQUID from source. The only two pre-requisite packages I needed to download to do this were gcc and gcc-c++. From there, the process is quite normal:

  • Download latest package (v3.1 in my case)
  • Run ./configure –enable-icap-client to enable ICAP functionality on the SQUID proxy
  • Run make and make install
  • It should install successfully. Modify the squid conf file to your needs and start the proxy
  • Test the proxy by using a browser pointing to the proxy IP

- Setting up the GreasySpoon ICAP server

The installation of GreasySpoon is a breeze, the only issue to point out is to make sure you have the correct Java version installed, else the inbuilt JavaScript engine will not be enabled, leaving you with no languages to use to modify the traffic.

In my case, CentOS already had OpenJDK installed, but I ran into problems anyway because the JavaScript language was not selectable. I needed to download the Sun Java package and install that.

Download the tar.gz package from greasyspoon, extract this, and modify the file “greasyspoon” to point the JAVA_HOME variable to the java home directory, in my case /usr/java/jre1.6.0/.

To start greasyspoon give executable permission to the greasyspoon file: chmod +x greasyspoon. Then start the server: ./greasyspoon start

You should now be able to reach the admin interface of the server by visiting http://localhost:8088 on the server.

To double check, run netstat to make sure the server is listening on port 1344.

- Set up the SQUID + GreasySpoon interaction

This part of the setup informs SQUID to send traffic over to the GreasySpoon server. The basic instructions to do this are already explained here:


In a production environment you may want to modify this configuration a bit so not all traffic is sent to the ICAP server, usually only a subset of traffic should be sent.

- Writing a GreasySpoon script to modify HTTP traffic to youtube.com

In my case, I wanted GreasySpoon to check for the presence of a certain Cookie (pref=f2) and if not there, instruct the client to set this cookie using the Set-Cookie HTTP header. This cookie controls if youtube should be seen in HTML5 or not.

In the greasyspoon admin interface, navigate to the tab “greasyspoon scripts” > responses scripts > new script. Add a name and leave the language as ECMAScript. Here’s the script itself:

// This is a GreasySpoon script.
// ——————————————————————–
// WHAT IT DOES:force HTML5 version of youtube
// ——————————————————————–
// ==ServerScript==
// @name           youtube_HTML_5
// @status on
// @description    force browser to request the HTML5 version of youtube
// @include        .*youtube.*
// @exclude       
// @responsecode    200
// ==/ServerScript==
// ——————————————————————–
// Available elements provided through ICAP server:
// —————
// requestedurl  :   (String) Requested URL
// requestheader  :  (String)HTTP request header
// responseheader :  (String)HTTP response header
// httpresponse   :  (String)HTTP response body
// user_id        :  (String)user id (login or user ip address)
// user_group     :  (String)user group or user fqdn
// sharedcache    :  (hashtable<String, Object>) shared table between all scripts
// trace          :  (String) variable for debug output – requires to set log level to FINE
// —————
    headerstring = "Cookie: ";
    c = requestheader.indexOf(headerstring) + headerstring.length;
    c1 = requestheader.indexOf("\r\n", c);
    var Cookiestring = requestheader.substring(c,c1);

            if (Cookiestring.indexOf("f2")<0){
                responseheader = responseheader + "Set-Cookie: PREF=f2=40000000; path=/; domain=.youtube.com;\r\n";

Most of it is just comments but pay attention to the include and exclude comments since they control which sites the script will be applied to.

The rest of the comments describe which variables are available to you the programmer to use.

The actual script (starting headerstring=”Cookie: “) is an adaptation of a sample script on the site which basically just checks for the presence of the cookie, and if not found, sends the Set-Cookie header to the client.

Save and enable the script.

That’s about it. You can see if the script is being applied from the  “Data” tab > logs > access logs section of the greasyspoon interface.

This was just an example, just to show that with a bit of persistence and programming a free, open-source solution can match the functionality and flexibility of much more expensive commercial solutions. Of course, it’s not as easy to setup, use and maintain, but I still think this is a fantastic tool and setup that gives any network admin great granularity of control over his proxy traffic :)

PS : greasyspoon can serve as a very flexible ICAP server for Bluecoat also… all that’s needed is a web content rule that forwards traffic via ICAP to the GreasySpoon server.


Get every new post delivered to your Inbox.

Join 138 other followers

%d bloggers like this: