The Well-Formed Web

Exploring the limits of XML and HTTP

There's gold those HTTP headers

There is a vast array of power in the HTTP headers that we currently leave untapped. The latest release of the server side of RESTLog demonstrates how easy it is to tap into that power. In this case the savings comes in bandwidth. The RSS file provided for this site contains the last 15 news items posted. On average the RSS file is 10K and with many aggregators hitting the feed once an hour 24 hours a day, with appromimately 100 people subscribed to the WellFormedWeb newsfeed that comes to about 100x24x10K or 24MB a day. Now that won't chewing up my monthly bandwidth allotment, but it can potentially be trimmed by adding support for gzip compression. The first thing we need to do is generate and store a gzip'd version of index.rss ever time we change index.rss. That is a three line addition to 'rebuildIndexFiles.py':


	output = file("index.html", "w")
	templating.transformItemsToHtml(template_file_name, itemFileNames, output)
	output = gzip.open("index.rss.z", "w", 9)
	output.write(rootNode.toxml())
	output.close()
	rootNode.unlink()
 
if __name__ == "__main__":	
	rebuild("./data/")
 

Now we just need to detect if the browser can accept gzip encoded content. That is determined by the presence of the Accept-Encoding header. If it contains as one value 'gzip' then we can returned the gzip'd version of index.rss, otherwise we drop back to the old behaviour of returning the original uncompressed file. In RESTLogImpl.py in class RootDispatch the member function GET_rss was modified as follows:


  def GET_rss(self):
    if os.environ.has_key("HTTP_ACCEPT_ENCODING") and -1 != os.environ["HTTP_ACCEPT_ENCODING"].find("gzip"):
      print "Content-Encoding: gzip"
      try:
        import msvcrt
        msvcrt.setmode(sys.stdout.fileno(),os.O_BINARY)
      except:
        pass
      dispatch.returnFileAsContent("index.rss.z", "text/xml")
    else:
      dispatch.returnFileAsContent("index.rss", "text/xml")

Note that the whole try: except: block is really just a fix for Windows, where we need to force stdout to be in binary mode to handle the gzip'd file. If you ignore that bit of detritus we really only added another three lines of code.

There were two other small changes that did not add any more lines of code but are worth mentioning. The first is the additional import of the gzip library into rebuildIndexFiles.py. The other change was in dispatch.py where the returnFileAsContent() opens the file in binary mode.

So six lines of code and now for aggregators that support gzip encoding the file transfer will be 1/3 the size. Not a bad return on investment. Similar gains can be had by applying gzip encoding to the main HTML file. Other powerful HTTP headers are If-Match and If-Modified-Since which I will be adding in support for in a future release. A nice aspect of all these headers is that they apply to any content, regardless the format.

2002-12-23 00:23 Comments (0)