hypermail – creating an www interface to a mailing list archive

hypermail – creating an www interface to a mailing list archive

hypermail is a program that takes a file of mail messages in UNIX mailbox
format and generates a set of cross-referenced HTML documents.  It allows for an
online archive of mail messages.  It’s ideal for providing a www interface to a
mailing list archive, which is what I’m going to do with it..

The hypermail
homepage is http://www.landfield.com/hypermail/
and contains a few examples of the interface.

The background

Regular readers will know that I’ve recently been Creating a
digest and archive
for a majordomo mailing list.  
This article documents the next step.  I wish to provide an on-line archive of the
mailing with a www interface.  I liked the archive
used by ipfilter
and noticed that it was hypermail.  Then I noticed
FreeBSD had a port for it.  Great start!

The installation

The first step was to install hypermail from the ports.  I followed the
instructions found in the FreeBSD handbook
for compiling
ports from the internet
.  You may want to see compiling port from CDROM
I had a problem in that my ports were out of date.  It was installing version
1.something and I noticed on the hypermail home page that the latest version was
2.something.  So I referred to my article on Updating the ports
collection
to refresh the ports tree.  It had been several months since I did
this.

After the refresh, here’s what I did:

cd /usr/ports/www/hypermail
make
make install  

The first run

I found /usr/ports/www/hypermail/work/hypermail-20b3/tests to be useful for
testing.  I suggest you do that test right after installing.  Note that there
are several tests within the file which are commented out.  You might want to invoke
them.

I took /usr/ports/www/hypermail/work/hypermail-20b3/configs/hmrc.example
as my starting config file.  Here are the changes I made to this file:

# diff -urN hmrc.example /usr/local/hypermail/adsl/hmrc.adsl 
--- hmrc.example        Sun Nov  7 00:15:52 1999
+++ /usr/local/hypermail/adsl/hmrc.adsl Sun Nov  7 00:15:25 1999
@@ -18,7 +18,7 @@
 # This is the default title you want to call your archives.
 # Set this to NONE to use the name of the input mailbox.
 
-hm_label = Hypermail Development List
+hm_label = ADSL Mailing list
 
 # hm_archives = [ URL | NONE ]
 #
@@ -200,7 +200,7 @@
 # The <link...> header can be disabled by default by setting
 # mailto to NONE.
 
-hm_mailto = webmaster@landfield.com
+hm_mailto = webmaster@freebsddiary.org
 
 # hm_domainaddr = [ domainname | NONE ]
 #
@@ -210,7 +210,7 @@
 # to domain-ize these addresses for delivery. In such cases, 
 # hypermail will add the DOMAINADDR to the email address.
 
-hm_domainaddr = landfield.com
+hm_domainaddr = freebsddiary.org
 
 # hm_body = [ HTML <BODY> statement | NONE ]
 #
@@ -225,7 +225,7 @@
 # used to submit a new message to the list served by the 
 # hypermail archive.
 # "NONE" means don't use it.
 
-hm_hmail = hypermail@landfield.com
+hm_hmail = adsl@freebsddiary.cx
 
 # hm_ihtmlheader = [ path to index header template file | NONE ]
 #

So now I was ready for my first test run.  I used one of the existing archive
files from within /usr/local/majordomo/lists/adsl.archive and used that as the
input.

hypermail -p -m "/usr/local/majordomo/lists/adsl.archive.9911" \
               -c "hmrc.adsl" \
               -d "/usr/local/www/data/freebsddiary/adsl"

This directs the output to /usr/local/www/data/freebsddiary/adsl.  
Well, that worked just fine.  I hope yours does too.

Data conversion

I strongly urge you to make digests/archives of every list you create.  It
doesn’t take much time and it require very little additional system resources.  I
wish I’d done that when I started my ADSL mailing list.  But then, if I had, I
wouldn’t have written the "Creating a digest and archive
for a majordomo mailing list" article.

But I
didn’t.  So now I’m paying the price for that omission.  Luckily, I was able to
enlist the assistance and expertise from other list members.  I had saved all of the
list messages within my email client (Pegasus, a great Windows client; see http://www.pmail.gen.nz/ for details).  I sorted
the messages according to year/month and saved each group to a file.  Then I ran an awk
script over the files to convert them to the required format.  There wasn’t actually
much of a change required.  Here’s a before and after:

Return-Path: owner-adsl
Received: (from majordom@localhost)
        by ducky.freebsddiary.cx (8.9.3/8.9.3) id WAA14705
        for adsl-outgoing; Wed, 30 Jun 1999 22:37:54 +1200 (NZST)
Received: from metis.host4u.net (metis.host4u.net [209.150.128.22])
        by ns.freebsddiary.cx (8.9.3/8.9.3) with ESMTP id WAA14519
        for <adsl@freebsddiary.cx>; Wed, 30 Jun 1999 22:35:35 +1200 (NZST)
Received: from wocker (210-55-152-83.ipnets.xtra.co.nz 
                                                    [210.55.152.83])
        by metis.host4u.net (8.8.5/8.8.5) with SMTP id FAA17866
        for <adsl@freebsddiary.cx>; Wed, 30 Jun 1999 05:34:56 -0500
Message-Id: <199906301034.faa17866@metis.host4u.net>
From: "Dan Langille" <dan.langille@dvl-software.com>
Organization: DVL Software Limited
To: adsl@freebsddiary.cx
Date: Wed, 30 Jun 1999 22:35:13 +1200

The above had to be converted to this:

From dan.langille@dvl-software.com  Wed Jun 30 22:35:13 1999
Received: (from majordom@localhost)
        by ducky.freebsddiary.cx (8.9.3/8.9.3) id WAA14705
        for adsl-outgoing; Wed, 30 Jun 1999 22:37:54 +1200 (NZST)
Received: from metis.host4u.net (metis.host4u.net [209.150.128.22])
        by ns.freebsddiary.cx (8.9.3/8.9.3) with ESMTP id WAA14519
        for <adsl@freebsddiary.cx>; Wed, 30 Jun 1999 22:35:35 
                                                       +1200 (NZST)
Received: from wocker (210-55-152-83.ipnets.xtra.co.nz 
                                                    [210.55.152.83])
        by metis.host4u.net (8.8.5/8.8.5) with SMTP id FAA17866
        for <adsl@freebsddiary.cx>; Wed, 30 Jun 1999 05:34:56 -0500
Message-Id: <199906301034.FAA17866@metis.host4u.net>
From: "Dan Langille" <dan.langille@dvl-software.com>
Organization: DVL Software Limited
To: adsl@freebsddiary.cx
Date: Wed, 30 Jun 1999 22:35:13 +1200

As you can see, it’s the first line of the message headers which has to be changed.
  Not much, but it had to be modified.  I didn’t want to write to code, so I had
a friend do it for me.  The code appears below.  It worked for my needs and is
particular to the delimiters my mailer was using but perhaps you can use it as a starting
point for your situation.

My thanks to Don Stokes <don@daedalus.co.nz>
for writing this code.

#! /usr/bin/awk -f
#
# Convert Pegasus file of saved messages to Unix mailbox format.
#
#   Don Stokes, <don@daedalus.co.nz>    7 November 1999
#
BEGIN {
   for(;;) {                  # For each message in the file:
      if((getline) <= 0) exit # Skip the Return-Path: line
      hdr = ""                # EOF now means we're done.
      h = ""
      date = ""
      from = "huh"

      #
      # Parse the header, looking for the Date: and From: lines.
      # Deal with header continuation currectly.
      #
      while((getline) > 0) {
         if(!$1) break       # Blank line indicates end of
                             # headers
         hdr = hdr $0 "\n"   # Add header to saved headers
         if(substr($0,1,1) > " ") { # If new header...
            h = tolower($1)
            if(h == "date:") date = substr($0,length($1)+2)
            if(h == "from:") from = substr($0,length($1)+2)
         } else {      # else continuation from 
                  # previous line
            if(h == "date:") date = date $0
            if(h == "from:") from = from $0
         }
      }

      #
      # Parse From: address
      # Remove whitespace and RFC 822 comments in ()s
      # If an address in <>s is found, use that, otherwise
      # what is left after removing whitespace and comments is
      # the address
      #
      gsub("[\t ]", "", from)      # Kill whitespace
      gsub("\\(.*\\)", "", from)   # Remove (comment)s
      if(i = index(from, "<")) {   # Extract <u@h> if present
         from = substr(from, i+1)
         from = substr(from, 1, index(from, ">") - 1)
      }

      #
      # Parse the date
      # If no day ("Day,") found, assume it was Monday.  
      # Deal with 2-digit years
      #
      $0 = date
      if(!gsub(",","",$1)) $0 = "Mon " $0
      if($4 < 70) $4 += 2000      # <70 = 20xx
      if($4 < 1900) $4 += 1900   # <1900 = 19xx

      #
      # Output Unix From line:
      # From user@host  Day Mon DD HH:MM:SS YYYY
      # Follow that with the saved header.  Note that header 
      # terminates with a LF, so we don't need to add another.
      #
      printf "From %s  %s %s %2d %s %s\n", 
         from, $1, $3, $2, $5, $4
      print hdr

      #
      # Read the body and put it to the file, until we hit EOF
      # or the Pegesus delimiter "-- End --".  Quote any lines
      # starting with a naked From with a ">" in the canonical
      # (broken) Unix mail way.
      #
      while((getline) > 0) {
         if($0 == "-- End --") break
         if(substr($0,1,5) == "From ") printf ">"
         print
      }
   }
}

Hooking it all together

I ran the above code over the message files and then through hypermail
Like this:

awk -f pegasus.conversion.awk may99.txt > may99.txt.out
hypermail -p -m "may99.txt.out" -c "hmrc.adsl" -d 
                 "/usr/local/www/data/freebsddiary/adsl/199905

Then all I had to do was create an index.html for the top directory and I was up and
running.

What’s left to do?

See periodic – using it to run shell scripts for
details on how to automagically update the online archives at the end of each day. 

I
also wrote up how I capture these messages in redirecting
majordomo mailing lists
.

Leave a Comment

Scroll to Top