The Well-Formed Web

Exploring the limits of XML and HTTP

RFC822

Here's the raw data of an HTTP GET, an e-mail, and a MIME encoded picture. Tell me if you see any patterns.

First, here is the raw source of a short e-mail. Note that it is also the format that it is kept on my harddrive, as the Mozilla mail client uses the mbox format.

From - Tue Apr 15 21:22:51 2003
X-UIDL: 3e9bf91300000055
X-Mozilla-Status: 0001
X-Mozilla-Status2: 02000000
Return-Path: <jo.....working.org>
Message-ID: <3E9C665A.603030....working.org>
Date: Tue, 15 Apr 2003 16:06:50 -0400
From: Joe Gregorio <jo.....working.org>
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.3) Gecko/20030312
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Joe Gregorio <jo.....working.org>
Subject: Test
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Status: RO

This is a test.

Now here is the source of a simple HTTP GET on the url /news/1?xml.

HTTP/1.1 200 OK
Date: Thu, 17 Apr 2003 03:37:56 GMT
Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux) mod_throttle/3.1.2 PHP/4.1.2 DAV/1.0.2 mod_ssl/2.8.12 OpenSSL/0.9.6
Transfer-Encoding: chunked
Content-Type: text/xml
Connection: close
Proxy-Connection: close

<?xml version="1.0" ?>
...

Here is the source of another mail message, this time with an attachement. (To keep things short I have removed some of the SMTP headers.)

From - Tue Apr 15 21:22:50 2003
Return-Path: <joe....working.org>
Message-ID: <3E9C652E.403060...working.org>
Date: Tue, 15 Apr 2003 16:01:50 -0400
From: Joe Gregorio <jo....working.org>
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.3) Gecko/20030312
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Joe Gregorio <jo....working.org>
Subject: Picture
Content-Type: multipart/mixed;
 boundary="------------030605020005060907080306"
Status: RO

This is a multi-part message in MIME format.
--------------030605020005060907080306
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Here is a picture.

	-joe

--------------030605020005060907080306
Content-Type: image/gif;
 name="Picture1.gif"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
 filename="Picture1.gif"

R0lGODlhCgAKAPcAAP//////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////
/////////////////////////////////yH5BAEAAAEALAAAAAAKAAoAAAgSAAMIHEiwoMGD
CBMqXMiwYcKAADs=
--------------030605020005060907080306--

And of course, who can forget Mark Pilgrim:

C:\>curl --include
HTTP/1.1 200 OK
Date: Thu, 17 Apr 2003 03:58:23 GMT
Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux) PHP/4.1.2 mod_gzip/1.3.26.1a DAV/1.0.3 mod_ssl/2.8.12 OpenSSL/0.9.6b
Vary: Accept-Encoding
X-Clerks: I'm not even supposed to BE here today!
Last-Modified: Thu, 17 Apr 2003 03:25:43 GMT
Transfer-Encoding: chunked
Content-Type: text/html
Connection: close
Proxy-Connection: close

<!DOCTYPE HTML >
...

Hat tip to Curioso for pointing out another example in Usenet news messages:

     Relay-Version: version B 2.10 2/13/83; site cbosgd.UUCP
     Posting-Version: version B 2.10 2/13/83; site eagle.UUCP
     Path: cbosgd!mhuxj!mhuxt!eagle!jerry
     From: [email protected] (Jerry Schwarz)
     Newsgroups: net.general
     Subject: Usenet Etiquette -- Please Read
     Message-ID: <[email protected]>
     Date: Friday, 19-Nov-82 16:14:55 EST
     Followup-To: net.news
     Expires: Saturday, 1-Jan-83 00:00:00 EST
     Date-Received: Friday, 19-Nov-82 16:59:30 EST
     Organization: Bell Labs, Murray Hill

     The body of the article comes here, after a blank line.

The pattern, if you missed it, is all those headers of the form:

Header: Value

Now those headers had their start in RFC 822, which is, and this is my point, one of the unsung pillars of the internet. Like HTML, it is theoretically the worst of all possible formats. It is 7 bit ASCII. Fixed line length. No centrally controlled way to add custom headers. But here it is today, the meta-data transport of choice for HTTP, SMTP and MIME. Now it has been updated from it's humble 7 bit ASCII roots with RFC 2822, and MIME has it's own cleaned up version, but they all owe their roots to 822.

2003-04-17 00:01 Comments (3)


You forget NNTP, also based on these headers. And RSS 3.0 of course ;-) See http://www.offback.com/IMPblog/index.php for blogging software based on storage of blog entries as e-mail-messages. And then there is http://www.fettig.net/projects/hep/ for handling blog entries as NNTP messages. There is a lot to say for unifying these alternatives....

Posted by Curioso on 2003-04-18 12:32

Thanks, I updated the post with a Usenet news example.

Posted by Joe on 2003-04-18 14:53

Then you might also add e.g. title: raelity bytes [--] description: The raelity bytes weblog. language: en creator: [email protected] Rael Dornfest generator: http://www.raelity.org/apps/blosxom/ errorsto: [email protected]

Thu, 17 Apr 2003

title: Google Phonebook Search Gives Some the Willies subject: /computers/internet/www/search_engines/google date: 2003-04-17

Posted by Curioso on 2003-04-18 15:12