<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw='http://wellformedweb.org/CommentAPI/' xmlns:dc='http://purl.org/dc/elements/1.1/' xmlns:rl='http://www.purl.org/RESTLog/'>
  <channel>
    <title>There's gold those HTTP headers</title>
    <link>http://WellFormedWeb.org/news/21</link>
    <description>
&lt;p&gt;There is a vast array of power in the HTTP headers that 
we currently leave untapped. The latest release of the server side
of RESTLog demonstrates how easy it is to tap into that power.
In this case the savings comes in bandwidth. The RSS file provided 
for this site contains the last 15 news items posted. On average the 
RSS file is 10K and with many aggregators hitting the feed once an hour
24 hours a day, with appromimately 100 people subscribed to the WellFormedWeb
newsfeed that comes to about 100x24x10K or 24MB a day. Now that won't
chewing up my monthly bandwidth allotment, but it can potentially be trimmed
by adding support for &lt;a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5"&gt;gzip compression&lt;/a&gt;.
The first thing we need to do is generate and store a gzip'd version
of &lt;code&gt;index.rss&lt;/code&gt; ever time we change &lt;code&gt;index.rss&lt;/code&gt;.
That is a three line addition to 'rebuildIndexFiles.py':&lt;/p&gt;
&lt;div class="example"&gt;&lt;pre&gt;&lt;code&gt;
	output = file("index.html", "w")
	templating.transformItemsToHtml(template_file_name, itemFileNames, output)
	&lt;ins&gt;output = gzip.open("index.rss.z", "w", 9)&lt;/ins&gt;
	&lt;ins&gt;output.write(rootNode.toxml())&lt;/ins&gt;
	&lt;ins&gt;output.close()&lt;/ins&gt;
	rootNode.unlink()
&amp;nbsp;
if __name__ == "__main__":	
	rebuild("./data/")
&amp;nbsp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we just need to detect if the browser can accept gzip encoded
content. That is determined by the presence of the &lt;a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3"&gt;Accept-Encoding&lt;/a&gt;
header. If it contains as one value 'gzip' then we can returned the gzip'd version 
of &lt;code&gt;index.rss&lt;/code&gt;, otherwise we drop back to the old behaviour
of returning the original uncompressed file. In RESTLogImpl.py in class RootDispatch
the member function GET_rss was modified as follows:&lt;/p&gt;
&lt;div class="example"&gt;&lt;pre&gt;&lt;code&gt;
  def GET_rss(self):
    &lt;ins&gt;if os.environ.has_key("HTTP_ACCEPT_ENCODING") and -1 != os.environ["HTTP_ACCEPT_ENCODING"].find("gzip"):&lt;/ins&gt;
      &lt;ins&gt;print "Content-Encoding: gzip"&lt;/ins&gt;
      &lt;ins&gt;try:&lt;/ins&gt;
        &lt;ins&gt;import msvcrt&lt;/ins&gt;
        &lt;ins&gt;msvcrt.setmode(sys.stdout.fileno(),os.O_BINARY)&lt;/ins&gt;
      &lt;ins&gt;except:&lt;/ins&gt;
        &lt;ins&gt;pass&lt;/ins&gt;
      &lt;ins&gt;dispatch.returnFileAsContent("index.rss.z", "text/xml")&lt;/ins&gt;
    &lt;ins&gt;else:&lt;/ins&gt;
      dispatch.returnFileAsContent("index.rss", "text/xml")
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that the whole try: except: block is really just a fix for Windows, where we need
to force &lt;code&gt;stdout&lt;/code&gt; to be in binary mode to handle the gzip'd file. If you ignore
that bit of detritus we really only added another three lines of code.
&lt;/p&gt;
&lt;p&gt;There were two other small changes that did not add any more lines of code
but are worth mentioning. The first is the additional import of the gzip library 
into &lt;code&gt;rebuildIndexFiles.py&lt;/code&gt;. The other change was in &lt;code&gt;dispatch.py&lt;/code&gt;
where the &lt;code&gt;returnFileAsContent()&lt;/code&gt; opens the file in binary mode.
&lt;/p&gt;
&lt;p&gt;So six lines of code and now for aggregators that support gzip encoding the file
transfer will be 1/3 the size. Not a bad return on investment. Similar gains can be had by applying gzip encoding to 
the main HTML file. Other powerful HTTP headers are 
&lt;a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.24"&gt;If-Match&lt;/a&gt;
and &lt;a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25"&gt;If-Modified-Since&lt;/a&gt;
which I will be adding in support for in a future release. A nice aspect 
of all these headers is that they apply to any content, regardless the format.
&lt;/p&gt;

</description>
    <dc:creator>BitWorking, Inc</dc:creator>
  </channel>
</rss>



