The Well-Formed Web is now cool.
That is to say that all the URLs on this site are cool, with cool being defined by Tim Berners-Lee in his article Cool URIs don't change. In the article he argues that URLs should never change and the best way to achieve that is to do some up front design of your URLs so that you won't need to change them in the future.
The URLs for the weblog entries on this site used to be of the form:
/RESTLog.cgi/1
In the article Tim Bernes-Lee gives a list of things to leave out of your cool URL. Now my URLs were mostly cool because they didn't include such information as the authors name, the subject, the status, or the file name extension. The only information embedded in that URL is the software mechanism: .cgi. It has to go. If a new and better method comes along to server up my content and I deploy it then I end up breaking my URLs, and that's not cool. So I need a way to remove any reference to .cgi and also allow the old URLs which are already linked to out in the wild of the internet to keep working. The server I am running on requires the use of the .cgi extension so just renaming the file won't work.
Apache to the rescue
The Apache module mod_rewrite comes
to the rescue. This powerful module allows rewriting of URLs on the fly. So I have
two ugly URLs to contend with, /RESTLog.cgi
and /stories/RESTLog.cgi
.
The first I want mapped to /news
and the second I want
mapped to /story
. Here is the section of my .htaccess
file that accomplishes that rewriting:
RewriteEngine on
RewriteBase /
RewriteRule ^news(.*) /RESTLog.cgi$1 [L]
RewriteRule ^story(.*) /stories/RESTLog.cgi$1 [L]
Note that these rules only modify a URI if they start with "news" or "story" so the old URIs will still work even after I switch to using the new URLs. Which means no broken links. Now that's cool.
This change also required some changes on the server side code.
The code was extended by adding a base_uri__
variable
to RESTLogImpl.py that is used as the base URI for
all urls generated. This fixes a problem when accessing
the web site and ModRewrite is in use. SCRIPT_NAME used
to be used to generate the urls which was easy to do but
if not robust. For example I want all the URLs on
WellFormedWeb to be of the form:
/news/N
but the server is only configured to execute scripts if the filename ends in .cgi so I am stuck with:
/RESTLog.cgi/N
I can use ModRewrite to accept /news
:
RewriteEngine on
RewriteBase /cgi-bin/
RewriteRule ^cgi-bin/news(.*) /cgi-bin/RESTLog.cgi$1 [L]
But even with this rewrite in place the SCRIPT_NAME still points to RESTLog.cgi
and the permalinks generated have RESTLog.cgi in them and not news. So the
answer I came up with is to set it in the main script RESTLog.cgi
. The alternative
was to look for some other potentially missing cgi environment variables
that are present if a rewrite was done, but then I realized that created
a whole new problem: If I had two installs of pamphlet and they were
configured differently, one for /RESTLog.cgi
and the other for /news
then posts from each install would have a different form of the permalink. Yuk.
Postscript: Much thanks to Mark Pilgrim for sending me some of his .htaccess files as examples and pointing me to A Users Guide to URL Rewriting with the Apache Webserver which is also loaded with examples.