The Why question
Web-servers send a bit of metadata with each page they serve. This metadata
contains various useful and various useless things. More details on this
metadata follow later on. For small webpages, it isn't uncommon that this
metadata is bigger than the actual page. If your webpage is mostly intended
to be used by GPRS phones (where the users pay per kilobyte), it's a waste
of your users' money. An example of this would be traffic information that
is provided to navigation tools such as the TomTom GO navigator.
The Where question
This "gprs optimization" needs to be done at the server that provides the
content.
The What question
Lets start by showing what a typical HTTP response looks like, in this case
from a very simple PHP script that just gives a simple "You are in a traffic
jam" text back.
What the server sends back looks like this:
Date: Mon, 23 Jan 2006 09:07:00 GMT
Server: Apache/2.2.0 (Fedora)
X-Powered-By: PHP/5.1.2
Content-Length: 25
Connection: close
Content-Type: text.html; charset=UTF-8
You are in a traffic jam
As you can see, the actual "payload" is only a fraction of the actual sent
data here.
Lets go over the headers line by line to explain what they mean and what use
they are:
- The "Date" header: The Date header is mandatory in the HTTP
specification, and is there for various reasons, for example to allow proxy servers in the middle
to make an estimate on how long to cache this document (the proxy server
needs to know any time skew between its clock and the servers clock for
that, and it can use the Date: field to calculate this)
- The "Server" header: this header is like an advertisement of the
webserver program. This is generally not used, except by websites like
www.netcraft.net who collect statistics on web server popularity. Removing
this header has no real downsides.
- The X-Powered-By: header is another advertisement, this time for PHP,
the script language used here.
- The Content-Length: header tells the client how long the payload
actually is, this is important to know, the client after all needs to be
able to know when it's done with the downloading of the page. (This is also
used for things like the progress bar for long downloads)
- The Connection: header tells the client that after this file is done,
the connection will be closed. For complex webpages with many pictures it's
normal to keep the connection open for further requests; after all, when you
grab a page with many pictures, it's highly likely that you'll want to get
the pictures inside that page as well.
- The Content-Type: header gives the type of the document, so that the
browser knows how to display this. If you have a specialist client, such as
an embedded application, this header can in principle be omitted.
So lets assume this was for a small navigation device that gets the data via
a GPRS link. A more optimal response would have looked like this:
Date: Mon, 23 Jan 2006 09:07:00 GMT
Content-Length: 25
Connection: close
You are in a traffic jam
As you can see this is roughly half the data compared to the original
message, and thus.... half your GPRS bill!
The How
The how section will focus on the Apache webserver, which runs roughly 70%
of the internet, further it will assume a unix/linux like server, also a
common occurrence, although I expect the same tricks to apply to the windows
version of the Apache webserver.
Apache has its config file in the following location:
/etc/httpd/conf/httpd.conf
I'll do one simple example: the X-Powered-By: header.
To get rid of this header, adding the following to the end of httpd.conf is
enough:
Header unset X-Powered-By
To activate this change (like any config file change), the server needs to
be restarted.
Questions
If you have any questions or feedback about these pages, please contact me at
arjan@infradead.org.