Stopping Spam and Trojan Horses with BSD

Brett Glass
P.O. Box 1588
Laramie, WY  82073-1588
http://www.brettglass.com/mailbrett.html

Presented at BSDCon on October 19, 2000
(Slides at http://www.brettglass.com/spam/index.html)

Abstract: A properly configured BSD mail server can protect users (including those running other operating systems on client machines) from spam and Trojan horses while rejecting virtually no legitimate content. This tutorial describes how to configure BSD systems to use DNS blacklists, procmail, mail "sanitizing" scripts, daemons that watch logs for evidence of spamming and "mail bombing," and similar utilities. Prevention of unauthorized relaying and detection and blocking of outbound spam are also discussed. Countermeasures against address harvesting and privacy invasion techniques such as "Rumplestiltskin" attacks, fingerd scans, tracking via identd, e-mail cookies, and malicious image tags in HTML mail are covered in detail. Links to source materials and relevant software tools are provided.

[Note: This paper is formatted for viewing via a Web browser. References to additional information and helpful source material are presented within this paper in the form of HTML links, rather than as footnotes, for easy online access. If you receive this paper in printed form and wish to follow up the references, access the master copy of this document online at http://www.brettglass.com/spam/paper.html and follow the links, which the author will endeavor to keep current.]


Introduction

Of all the services provided via the Internet, e-mail is the one that most of us can least afford to see disrupted. Yet, most network and system administrators have only partial knowledge of the many ways in which their mail systems can be abused, or of how to prevent such abuses. As a result, mail servers are often co-opted to deliver spam and other undesirable material. Spammers have learned how to probe servers for user account information, "harvesting" addresses not only from Web pages but from utilities which are enabled by default in many operating system configurations. (In some cases, the mail transfer agent itself can be subverted to provide useful information.) Self-propagating malware (malicious software), in the form of Trojan horses such as ILOVEYOU, can burden or even halt e-mail servers, as it did the servers of the British Parliament in the early part of the year 2000. This paper will describe some common threats against which many mail servers are defenseless by default, and explain some of the ways in which one can harden a mail server running BSD and sendmail against them. The techniques mentioned here implement most of the recommendations mentioned in RFC 2505 (a "Best Current Practice" RFC regarding spam) plus some others not contemplated by that document. While the experienced system administrator will already have implemented some of these measures, few are aware of all of them. Many of the ideas described in this paper may also be applicable to mail servers running on other operating system platforms and/or employing other mail transfer agents.


sendmail and BSD

Many operating systems and mail transfer agents (MTAs) can be used to implement robust, abuse-resistant mail systems. However, by far the most popular OS/MTA configuration uses sendmail, originally written by Eric Allman at UC Berkeley, on a Unix-like operating system platform. In June 1999, http://www.zdnet.com/sp/stories/news/0,4538,2268098,00.html estimated the market penetration of sendmail to be in excess of 75%, while the introduction of O'Reilly's Open Sources: Voices from the Software Revolution, edited by Chris DiBona, Sam Ockman, and Mark Stone, places this figure at a yet more optimistic 80%.

sendmail was originally distributed under the Berkeley (or "BSD") license, but is no longer. It now comes in two versions: a free version with source (http://www.sendmail.org/), released under a license called the Sendmail License, and a commercial version with Web-based configuration utilities (http://www.sendmail.com/).

The status of Unix-like operating systems the most popular operating system for mail servers follows directly from the choice of sendmail as the MTA. As sendmail author Eric Allman notes in the Sendmail FAQ:

Generally speaking, I adhere to the old axiom that you should choose what software you want to run first, then choose the platform (hardware and OS) that best runs this software. By this token, if sendmail is the software, then a recent version of BSD Unix would probably be best, since sendmail was developed at UC Berkeley on BSD Unix.
The BSDs are thus the recommended platforms for sendmail MTAs. Solaris (originally BSD-derived, though it now incorporates Unix System V) is also common, especially when e-mail services for a large organization are concentrated in a single multiprocessor server. Linux, which does not share BSD's pedigree but works similarly, is often used as well. Sendmail, Inc. offers a commercial port of sendmail for Microsoft Windows NT/2000 called Sendmail for NT Mail Routing. However, due to the greater demands of Microsoft's operating system platforms, a Windows NT/2000 mail server requires substantially more memory and computing power than those which run Unix-like OSes. (Sendmail, Inc. recommends a minimum of 256 MB of RAM and a 300+ Mhz CPU for Windows 2000 servers that run its port of sendmail.)

Because sendmail and a BSD-derived operating system are the most robust, economical, and popular choice for industrial strength mail servers, this presentation will assume the use of this software configuration. As mentioned above, many of the techniques described in this paper may be useful with other configurations as well.


Blocking Incoming Spam

Mathematically speaking, the question of whether a message is "spam" -- or unsolicited commercial e-mail -- is generally not decidable. No computer can tell for certain with whom an arbitrary user has had a legitimate personal or business relationship, or what sort of unexpected message he or she will consider to be appropriate. Nonetheless, the overwhelming majority of spam carries one or more telltale signs that it was broadcast to a large and unwilling audience, often by abusing the computing resources and bandwidth of innocent third parties. These signs include: Mail transfer agents and filtering programs which correctly read these telltale signs achieve very good success at blocking spam. Commercial services such as Spamcop, Brightmail, and stop.mail-abuse.org offer varying degrees of spam filtering (and, in the case of Spamcop, automatic complaint generation) for a fee. Despammed.Com, a service provided by a company called Nextek Innovations, does not charge for its basic service but may charge for premium services in the future. All of these services are handy; however, administrators can retain more control of what is blocked if they configure their own site-specific spam blocking mechanisms.

Anti-Abuse Features in sendmail

sendmail has many anti-spam and security features. Recent versions do not relay mail unless instructed to do so, and reject by default messages whose "envelope From" address  (the address mentioned in the SMTP "MAIL FROM:" command) contains a host name that does not resolve. However, other useful features and configuration options are not always enabled, and many are not well known. This section covers some of the most important and useful ones. I have included a brief overview of how to configure sendmail so that the reader can add the recommended options and features to an existing system; however, this is not intended to be a complete tutorial on sendmail.

Configuring sendmail: A Brief Overview

The sendmail configuration file, sendmail.cf, has an extremely terse and arcane syntax; it is notoriously difficult for humans to edit without introducing errors. Therefore, it is generally best to edit a .mc file, which contains somewhat more readable macros, and pass it through the m4 macro processor to generate a new sendmail.cf.

You may have to hunt a bit to find the directory where the original .mc file for your operating system distribution is hidden. (Looking at the top of the default sendmail.cf is often helpful, since it will tell you the location of the .mc file on the machine where sendmail.cf was created.) You may even find that the default .mc file is not present at all! For example, releases of FreeBSD prior to 4.1.1 did not install the default .mc file (freebsd.mc), nor the components required to rebuild sendmail.cf from it, unless one installed the full source distribution.

If you can find the original .mc file on your system, a Makefile that will generate a .cf file from it is usually present in the same directory. Make a copy of the default file (e.g. cp default.mc localhost.mc) and edit the copy. Typically, you can use the command make localhost.cf to build a new .cf file. If all goes well, you can often install the new .cf file with the command make localhost.install. Finally, issue the command kill -HUP `head -1 /var/run/sendmail.pid` to restart the main sendmail process.

More detailed instructions describing how to build and install sendmail.cf are available at the sendmail.org Web site or in the sendmail book by Bryan Costales and Eric Allman.

Comments in sendmail .mc files

Perhaps the most confusing thing about writing or editing a sendmail .mc file is, ironically, not the functional content but the comments. By default, the m4 macro processor uses a convention for comments which is similar to that of other languages: a "#" character starts a comment and a "newline" character ends it. But the macro files that build sendmail.cf override this default so that a "#" does not set off a comment; it is treated as ordinary text. Instead, the "dnl" and "divert" macro commands are used to set off comments.

The divert(-1)  macro command you'll find near the top of many .mc files turns off all output from the macro processor until the following divert(0) command. This pair of commands is sometimes used to comment out a large block of text such as a copyright notice. However, it is really not wise to do this. Even though output is disabled, there could be untoward side effects if any of the text in between is misinterpreted as a macro command. The author therefore uses the "dnl" macro command (see below) for all comments in a .mc file.

The dnl you'll see at the end of nearly every line of a .mc file stands for "delete to newline," and is roughly equivalent to a double forward slash (//) in a C++ program. Used after a command, it prevents the m4 macro processor from being affected by comments at the end of the line or from copying extraneous trailing whitespace into sendmail.cf. Placing "dnl " (including the space) at the beginning of a line turns the entire line into a comment. A common mistake among system administrators is to place "#" at the beginnings of lines in a .mc file, believing that m4 will treat them as comments. But because of sendmail's macro files, m4 sees these as ordinary input! At best, m4 will just graft the lines verbatim into sendmail.cf, where they will be treated as comments. At worst, parts of these lines will be interpreted as macro commands, with undesirable results. So, it is important to set off lines that are intended to go no farther than the .mc file with "dnl ".

Controlling Relaying in sendmail

In the early, halcyon days of the Internet, any mail server would relay mail from any other to a requested destination. Alas, spammers began to abuse this courtesy, hijacking servers to distribute their unwanted mail. "Open relays," as they are now called, quickly devolved from a useful resource to an attractive nuisance.

By default, a host running sendmail 8.9 or later will not relay mail from a second host to a third host; exceptions must be specified explicitly when relaying is desired. For example, to allow relaying to or from a group of hosts that share the same domain name suffix, one can add that suffix to the file /etc/mail/relay-domains. The FEATURE commands in Table 1, which can be placed in a .mc file, change the criteria which sendmail uses to determine whether a message should be relayed.
 

Feature Effect
FEATURE(`relay_entire_domain')dnl Enables relaying to or from any host in the same domain as the server. Usually safe so long as sendmail is configured properly and network and DNS are secure. Can create an open relay if server is misconfigured.
FEATURE(`relay_based_on_MX')dnl Enables relaying to (but not from) any host for which this server is a mail exchanger. Convenient, but somewhat risky in that anyone can make you an MX for a host in a domain he or she owns. Sometimes fails when source routing (e.g. "%", "@host:",  or "!") is used. Use of /etc/mail/relay-domains or access_db is preferable.
FEATURE(`access_db')dnl Enables fine-grained control of relaying (as well as blocking by user and domain) via an access control database. By default, the database is /etc/mail/access.db; it's built from the text file /etc/mail/access by a Makefile in /etc/mail.) This is the most powerful and flexible option.
FEATURE(`relay_hosts_only')dnl Normally, domains listed in /etc/mail/relay-domains (and also in RELAY entries in an access control database) are treated as domain name suffixes; that is, bar.com also matches foo.bar.com, etc. Enabling this feature "tightens" the interpretation of these entries so that domain names are always treated as individual host names only. Use of this feature may make maintenance of the server more labor-intensive, since new hosts must be listed individually before they will be allowed relaying privileges. This feature is a somewhat awkward attempt to compensate for the fact that neither access control mechanism uses wildcards (e.g. *.domain.com) to distinguish between entries that indicate a single host and those that indicate a domain suffix. 
Table 1: Some "FEATURE" commands that control relaying in sendmail

For more information on commands that affect relaying and what can go wrong with them, see the sendmail.org Web site.

Note that relaying is not the same as forwarding, which is done at the request of the recipient or system administrator rather than at the sender's request. Forwarding addresses may be specified in a user's .forward file, or in the global /etc/mail/aliases or /etc/mail/virtusertable file, regardless of the system's relaying policy.

Use of sendmail's Access Control Database

In recent versions of sendmail, the access_db feature allows fine-grained control over the acceptance and relaying of mail. A specially formatted text file at /etc/mail/access is converted by the makemap utility into a hashed database normally stored at /etc/mail/access.db. Each record in the database consists of a key containing a pattern and a value indicating a desired action. The key consists of an optional tag ("Connect:", "From:", or "To:") followed by a user name and/or a domain name. The action may be RELAY, OK, REJECT, DISCARD, or an SMTP error code and a message.  (In the last case, the server rejects the mail and returns the code and message to the client.)

The sample /etc/mail/access file shown in Listing 1 demonstrates many access control database features, including the To:, From:, and Connect: tags. Note that more specific rules win out over more general ones. For example, a line accepting mail from innocent.bystander.spamhaven.net will take priority over one rejecting mail from spamhaven.net. Also, three separate checks, triggered by different rules in sendmail.cf,  are normally done on each message. "Connect:" tags are checked after the HELO command, "From:" tags after the MAIL FROM: command, and "To:" tags after each RCPT TO: command. Untagged records are checked at all three stages. The delay_checks feature macro causes the other checks to be delayed until after recipients are checked. (The "Connect:" check is skipped altogether if the connecting host has authenticated itself to the server by this time.) If a message is rejected at any stage, it won't proceed to the next.
 

# Access control database. This database overrides the policies set by
# FEATURE(`relay_entire_domain'), FEATURE('relay_based_on_MX'), and
# blacklists. Records match connecting host, envelope MAIL FROM:,
# and/or envelope RCPT TO: (if FEATURE(`blacklist_recipients') is on). 
#
# Domain names in this file refer to hosts if FEATURE('relay_hosts_only')
# is activated and entire domains or subdomains otherwise.
#
# Block most traffic to/from spamhaven.net but exempt innocent customer.
# The "OK" overrides any DNS blacklists that might be in use.
spamhaven.net                     REJECT
innocent.bystander.spamhaven.net  OK
#
# Customized rejection messages for specific situations
cyberpromo.com                    550 Nice try, Spamford
#
# Block messages from this user name in any domain. With
# FEATURE(`blacklist_recipients') enabled, block mail to it as well.
FREE.STEALTH.MAILER@              550 Stealth mailer detected by radar
#
# Relay mail from internal workstations using reserved IPs. (Do not
# relay to them, however.) "Connect:" tag, not "From:" tag, is used
# to control relaying of mail coming from a host or domain. To: tag 
# controls relaying to a host or domain.
Connect:192.168.0                 RELAY
#
# Silently discard mail from an annoying user. "From:" tag
# filters on "envelope From:" (RFC 821 MAIL FROM:). This tag
# should be used only to reject or discard mail, since the
# RFC 821 MAIL FROM: can be spoofed.
From:kvetch@aol.com               DISCARD
#
# Discard mail to local user joe who is no longer with the company. If
# there are other local recipients, they still get the mail! "To:" tag
# requires FEATURE(`blacklist_recipients') to work.
To:joe@ourdomain.com              DISCARD
Listing 1: A sample /etc/mail/access file for sendmail

The syntax and semantics of records in the access control database, and the subtly different criteria used for each of the three checks, create many pitfalls for the unwary. For example, RELAY -- counterintuitively -- is a superset of OK and is more permissive. RELAY allows mail to be relayed or received while OK only allows it to be received.

The behavior of DISCARD is likewise subtle. DISCARD causes the server to feign acceptance of a message and then not deliver it to some or all of the intended recipients. When a DISCARD record matches the sender (MAIL FROM:), the message will not reach any recipient. However, when it does not match the sender but does match a recipient (this will only occur if FEATURE(`blacklist_recipients') is present in the .mc file), it prevents only the specified recipient from getting the message. Others will receive it unless they too are "blacklisted."

One cannot use wildcards in domain names, so every domain name in the file must be interpreted either as a suffix or as a complete host name. (The interpretation is controlled by the presence or absence of the relay_hosts_only feature.) Adding or removing the relay_hosts_only feature without a careful review of the database can cause unexpected and very undesirable results. Finally, despite the use of hashing, pattern matching is not as efficient as it might be. Multiple lookups must be done, so searches of a large database can be time-consuming.

For more information about the use of access control databases, see the readme file in the /cf directory of the most recent sendmail distribution.

Other Useful sendmail Features and Options

Table 2 lists some additional features and options which can help to secure a mail sender against spam or other types of abuse.
 
Feature or Option Effect
FEATURE(smrsh)dnl Limits programs into which sendmail pipes mail (for example, as a result of an entry in a .forward file or /etc/mail/aliases) to those in a specific directory, usually /usr/libexec/sm.bin. The administrator usually creates symbolic links in this directory to programs such as vacation and majordomo. smrsh, the "sendmail restricted shell," also rejects commands with certain metacharacters and strips the directory path from the front of the program name. This feature is not useful if procmail is installed and users can supply their own .procmailrc files, since other applications can then be invoked through procmail.
define(`confPRIVACY_FLAGS',`goaway')dnl Use of "goaway" privacy setting turns off SMTP commands, such as EXPN or VRFY, that may reveal local user names to spammers.
define(`confMAX_HEADERS_LENGTH',16384)dnl Limits the combined size of all RFC 822 headers. Some spamming programs try to prevent MTAs from reporting accurate transaction information by adding very long headers that "push" legitimate information out of the buffer used to hold headers. (While this may not cause a destructive overflow, it does hide information.) Limiting header sizes also makes it more difficult to exploit recently discovered buffer overflows in UW IMAP, Outlook and Outlook Express.
define(`confCONNECTION_RATE_THROTTLE',3)dnl Limits the number of new connections per second. This caps the overhead incurred due to forking new sendmail processes. May be useful against DoS attacks or barrages of spam. (As mentioned below, a per-IP address limit would be useful but is not available as an option at this writing.)
define(`confSMTP_LOGIN_MSG',
`$j server ready at $b')dnl
Changes the sendmail welcome message. MTA name and version number can be removed or forged, making it more difficult for would-be intruders to probe for vulnerabilities. (This is security by obscurity, but is nonetheless very effective against scripted probes.) "ESMTP" will be inserted in the first whitespace in the wecome message to signal ESMTP capability.
define(`confMAX_RCPTS_PER_MESSAGE,25)dnl Limits the number of recipients per message. Some spamming software does not know how to respond to the SMTP 452 error code (which asks to defer the remaining recipients to another session) and gives up when the limit is reached.
Table 2: Some other useful sendmail features and options

Eliminating Version Information from sendmail's HELP Response

If one wishes to conceal the version of sendmail that is being run, changing the SMTP_LOGIN_MSG configuration option is necessary but not sufficient. The response to the SMTP HELP command must also be altered. This can be done by replacing the help file (found at /etc/mail/helpfile in recent versions) with one containing only the following two lines:
#vers   2
smtp    I'm sorry, Dave, I'm afraid I can't do that.
Note that the first area of whitespace on each line above must be a tab character, not spaces. The message may be altered to suit the administrator's taste.

Matching RFC 822 Headers in sendmail

Recent versions of sendmail add the ability to match patterns in, and act on, RFC 822 headers. These headers typically include To:, From:, Subject:, and Message-Id:. They precede the body of  the message and are separated from it by a blank line.

This capability allows sendmail to reject messages that claim, for example, to be from all-numeric America Online screen names. (AOL does not allow such names.) sendmail is not the optimal tool for this sort of filtering, since one must be a highly competent sendmail.cf hacker to draft the rules properly. Nonetheless, this facility can be used to reject messages bearing certain obvious signs of abuse. Listing 2 contains several commonly used snippets that do useful things with RFC 822 headers.
 

LOCAL_CONFIG
#
#  Regular expression to reject:
#    * numeric-only localparts from aol.com and msn.com
#    * localparts starting with a digit from juno.com
#
Kcheckaddress regex -a@MATCH
   ^([0-9]+<@(aol|msn)\.com|[0-9][^<]*<@juno\.com)\.?>
#
#  Names that won't be allowed in a To: line (local-part and domains)
#
C{RejectToLocalparts}   friend you
C{RejectToDomains}      public.com

LOCAL_RULESETS
HTo: $>CheckTo

SCheckTo
R$={RejectToLocalparts}@$*      $#error $: "553 Header error"
R$*@$={RejectToDomains}         $#error $: "553 Header error"

HMessage-Id: $>CheckMessageId
# make sure message ID has two parts separated by an @
SCheckMessageId
R< $+ @ $+ >                    $@ OK
R$*                             $#error $: "553 Header error"

LOCAL_RULESETS
SLocal_check_mail
# check address against various regex checks
R$*                             $: $>Parse0 $>3 $1
R$+                             $: $(checkaddress $1 $)
R@MATCH                         $#error $: "553 Header error"

LOCAL_RULESETS
HSubject: $>Check_Subject
# crude check for Melissa virus
D{MPat}Important Message From
D{MMsg}This message may contain the Melissa virus.

SCheck_Subject
R${MPat} $*                     $#error $: 553 ${MMsg}
RRe: ${MPat} $*                 $#error $: 553 ${MMsg}
Listing 2: Examples of sendmail rules that match RFC 822 headers

sendmail Mail Filter ("Milter") API

Because drafting rules for sendmail itself is difficult, a better option if one wishes to do much filtering in sendmail is to use the sendmail mail filter, or "Milter," API. Filters can be written in any language and can examine and edit message bodies as well as headers.

sendmail communicates with the filter process via sockets, allowing the filter to run on a different host if desired. The filter process is persistent and can therefore watch for patterns in message traffic.

sendmail can be instructed to pass messages through filters via commands such as

INPUT_MAIL_FILTER(`archive', `S=local:/var/run/archivesock, F=R')dnl
INPUT_MAIL_FILTER(`spamcheck', `S=inet:2525@localhost, F=T')dnl
in the .mc file. For documentation of the INPUT_MAIL_FILTER macro, see /libmilter/README in the most recent sendmail distribution.

Examples in the "libmilter" directory of the sendmail distribution show how to create sendmail filters in C. However, the easiest and fastest way to get started is to use the Sendmail::Milter Perl module. This module can be found on SourceForge.

At this writing, the "Milter" interface is labeled "for future release" and may change before it is finalized and officially supported.

Allowing Relaying by Roving Users

If users of your mail server rove to places off the local network but still want to use the server for both inbound and outbound mail, anti-relaying features can get in their way by preventing them from sending outbound mail through the server. The best solution is to use SSH port redirection, SMTP authentication, SSL, or VPN software to authenticate the user and/or allow him or her to "tunnel" back into the home network. (The author favors SSH port redirection, because it provides encryption and compression as well as authentication and is available on most platforms.) Unfortunately, configuring an e-mail client to use any of these techniques is beyond the ken of most users. Therefore, two techniques have been developed that allow a remote user to use a server for outbound mail once he or she has been recognized by one of the authentication mechanisms of POP3 (Post Office Protocol Version 3, RFC 1225).

POP Before SMTP

The most common of these techniques is called POP before SMTP, which uses the authentication features of POP3 to provide authentication for SMTP relaying. Before attempting to send outbound mail through the server, the user first checks his or her incoming mail via POP on the same server. (If APOP is used, one can avoid sending passwords across the Internet in the clear.) After a successful POP transaction, the server allows relaying of messages from the client's IP address for a limited period of time. Anti-spam activist Scott Hazen Mueller's Web site provides instructions for enabling POP before SMTP using qpopper, sendmail, and perl (all of which are readily available for all of the BSDs).

POP before SMTP creates a minor security risk in that someone who is "sniffing" network traffic may be able to use the temporarily authorized IP address to send spam. (This could occur, for example, on a network which a hotel provides to guests, or at an Internet café where users plug their laptops into a common hub.) However, the chance of exploitation is slight, and the amount of damage that can be done is limited -- especially if the mail server is monitored for outgoing spam and mail bombing. (See Detecting Outgoing Spam and Mail Bombing below.)

XTND XMIT

An even more straightforward (though little known) way to enable roving users to send mail via their "home" server is via POP's optional XTND XMIT command. As the name implies, the XTND XMIT command is an extension to the Post Office Protocol (POP) that allows a client to send outbound mail through a POP server. The POP server software accepts the message, then submits it to an MTA running on the same host for delivery. Like POP before SMTP, XTND XMIT provides security against unauthorized relaying because it requires authentication. But unlike POP before SMTP, it works without creating a temporary SMTP relay. In fact, the client need not "speak" SMTP at all to send and receive mail.

If all POP clients and servers supported XTND XMIT, it would be possible to reserve SMTP for communication between mail servers and prohibit SMTP traffic from dial-up Internet connections and/or from hosts without fixed IP addresses. This would make it much easier to control spam.

Berkeley's popper and Qualcomm's qpopper both support XTND XMIT. Unfortunately, not all mail user agents support it, and in some the feature is present but well hidden. In Eudora, for example, use of XTND XMIT cannot be enabled via the graphical user interface. One must edit the EUDORA.INI file manually, adding the line

UsePOPSend=1
to the [Settings] section. (If one is using more than one "persona," the line shown above must be added to the section describing each "persona" for which XTND XMIT will be used.)

Perhaps the most important advantage of  XTND XMIT is that it works even if a user's ISP has blocked outgoing SMTP connections (a restriction which will cause POP before SMTP to fail). Also, if one has a limited number of roving clients, it may require much less effort to reconfigure them for XTND XMIT (or write a simple script to do so) than to reconfigure the server for POP before SMTP.

XTND XMIT does have one minor disadvantage. Because the POP server does not invoke an MTA until it has received an outgoing message in its entirety, it does not validate recipient addresses as they are submitted. Some POP servers will notify the sender of invalid addresses after the message has finished uploading, while others silently drop invalid addresses and send the message to the rest.

DNS Blacklists

Among the most common and popular spam fighting tools are DNS blacklists, which help a mail server to reject SMTP sessions from hosts which have been known to send or relay spam. The most popular of these are three databases maintained by Paul Vixie's MAPS (Mail Abuse Protection System), a not-for-profit California company whose purpose is to fight spam. ORBS (Open Relay Behaviour-modification System), another blacklist maintainer, is known to take a particularly activist stance. ORBS describes itself as a "small group of (grumpy) volunteers who are sick and tired of receiving junk email via open relays." The group's somewhat controversial policies are consistent with the degree of annoyance expressed in this description. Unlike the MAPS Relay Spam Stopper (RSS), which tests a server only after spam is reported, ORBS actively probes Internet mail servers to determine whether they are open relays. Servers which block the probes are assumed to be open relays and are blacklisted. Unfortunately, ORBS' probes sometimes trigger security alerts and automatic blocking at highly sensitive sites, causing these sites to be blacklisted when in fact they are taking extra precautions against abuse. The so-called "ORBS Lite" subset of the database avoids this problem but may fail to block some spam (see Table 3).

A mail server checks an IP address against a DNS blacklist by attempting to resolve a specially constructed dummy host name. By convention, the name consists of the IP address in dotted decimal format, reversed and prepended to a domain name suffix. For example, when querying ORBS to see whether the host at 1.2.3.4 is in the database, a mail server would attempt to resolve 4.3.2.1.relays.orbs.org. If the name resolves, the address is blacklisted. In some cases, the address to which it resolves is a sentinel value that provides additional information about why the host was listed. ORBS, for example, returns 127.0.0.2 if the host was blacklisted due to a relaying test. It returns 127.0.0.3 if the host was entered into the database manually, 127.0.0.4 if the host was untestable, and 127.0.0.5 if the address is within a block of IP addresses controlled by a known spammer.

At this writing, anyone on the Internet can configure his or her server to query most of these databases at no charge. (MAPS' RBL+, a premium service, requires an access fee; see below.) Each blacklist is claimed by its maintainer to contain only hosts which meet specific criteria, as shown in Table 3.
 

Database Contents
MAPS Real Time Black Hole List (RBL) Hosts and networks which MAPS believes to be "friendly, or at least neutral, to spammers who use these networks either to originate or relay spam." Many mistakenly believe that the "real time" in this list's name refers to the speed with which hosts are added to the list. However, the name is intended to indicate that a mail server can block a message from the offending site in real time by querying the list. Getting a host or network added to the MAPS RBL is usually quite difficult. Entries are added only after substantial evidence has been presented that spam has been sent and complaints ignored. Queries of this database use the domain suffix rbl.maps.vix.com.
MAPS Dial-Up List (DUL) IP addresses used by ISPs' dial-up modems, as well as some pools of addresses which are dynamically assigned to DSL and cable modems. Since it is normally desirable to send mail via a server with a fixed IP address, mail sent by direct SMTP from a non-dedicated address directly to its destination is often spam. Use of the DUL to filter messages is highly effective in that it blocks much of the spam sent from "throwaway" or trial accounts with dial-up ISPs. Queries use the domain suffix dialups.mail-abuse.org.
MAPS Relay Spam Stopper (RSS) Mail servers whose bandwidth and computing power have been used to duplicate and relay spam to its final destination. In the overwhelming majority of cases, this has been done without the owner's knowledge or consent. These "open relays" are usually mail servers which run old or improperly configured mail transfer agents. Unlike ORBS (see below), the MAPS RSS only lists hosts which have been reported to have sent spam and have been confirmed to be open relays (that is, servers which will relay mail from an untrusted, outside source). It does not list "multihop relays" -- relays which consist of two or more servers working together. Queries use the domain suffix relays.mail-abuse.org.
MAPS RBL+ Master Service (RBL+) IP addresses contained in any of the MAPS databases (RBL, DUL, or RSS). The RBL+ database combines results from the three others so that a mail server can "vet" a message with a single query. A flat fee, used to support the maintenance of all of the databases, is charged for the ability to query this blacklist or mirror it via DNS zone transfers.
Open Relay Behaviour-modification System (ORBS) Mail servers which test positive for relaying, are manually entered into the ORBS database, block access when ORBS probes them, or are within the address blocks of known spammers. All servers which participate in a multihop relay are listed as open relays. The full ORBS database is queried via the domain suffix relays.orbs.org. Queries that use the suffix inputs.orbs.org instead will return a positive result only if a server has tested positive as an open relay, but not if it was blacklisted for other reasons. Some administrators see this so-called "ORBS Lite" as a good compromise between the cautious policy of the MAPS RSS and the very aggressive policy of the full ORBS database.
Table 3: Some popular DNS blacklists

In general, it is best to consult blacklists as early as possible when processing mail -- that is, at the MTA (or an MTA wrapper such as smtpd) rather than in a spam filter that is applied afterward. The next section explains how to configure sendmail to use one or more DNS blacklists.

Configuring sendmail to use DNS Blacklists

On systems running sendmail 8.10 or later, using blacklists is simple: just add lines invoking the built-in "dnsbl" feature to the .mc file. The lines to be added to the .mc file for some of the blacklists mentioned above look like this:
FEATURE(dnsbl,`rbl.maps.vix.com',`Rejected - see  http://www.mail-abuse.org/rbl/')dnl
FEATURE(dnsbl,`dul.mail-abuse.org',`Dialup - see http://www.mail-abuse.org/dul/')dnl
FEATURE(dnsbl,`relays.mail-abuse.org',`Open relay - see http://www.mail-abuse.org/rss/')dnl
FEATURE(dnsbl,`input.orbs.org',`Open relay - see http://www.orbs.org/')dnl
For instructions on how to configure other mail transfer agents, including older versions of sendmail, for DNS blacklists, see http://maps.vix.com/rss/how.html.

Normally, sendmail rejects connections from blacklisted servers so quickly that it doesn't even wait for any RCPT TO: commands. It is thus impossible to tell from the sendmail log files which unlucky user was the target of the spam. MAPS therefore suggests that administrators add the line

FEATURE(`delay_checks')dnl

to the .mc file. This changes the order in which the rules in sendmail.cf are applied so that the names of the intended recipients are logged before the sender's IP address and domain are checked. This slightly increases overhead but coaxes useful information out of the spammer's server; it reveals which users of a system are on commercially distributed spamming lists or have had their addresses "harvested."

Spam Filter Kits

The built-in anti-spam features in sendmail, combined with DNS blacklists, eliminate the majority of spam messages. However, a server's resistance to incoming spam can be enhanced via the use of freely redistributable "filter kits." These kits perform tests -- some of them quite clever -- that MTAs do not. They also allow individual users to perform tests that are not enabled globally in the MTA for policy reasons. (For example, if the MTA does not query the full ORBS blacklist, a user who wants to use that list can configure a filter kit to consult it for his or her mail only.) Most importantly, because they come with source code, these kits can be customized and enhanced by the site administrator or by individual users.

The most popular filters consist of "recipes" written for procmail, a local mail delivery agent (abbreviated LDA or MDA) which sorts and filters mail based on content. (For an excellent overview of what procmail does and how, see Procmail Minl-Tutorial: Automated Mail Handling by Jim Dennis.) procmail is available as a ported application for all of the BSDs which have port collections, and compiles from source on all of them as well. procmail is licensed under the GNU GPL. (Unfortunately, a non-GPLed equivalent is not currently available.)

Most procmail spam filters do not use procmail alone, but combine it with other programs or subroutine packages. These include perl, formail (included with procmail), mktemp, mimencode (part of metamail), or Mime::Base64 (a CPAN module for perl).

The Spam Bouncer, developed on FreeBSD by Catherine Hampton, is among the most popular Procmail spam fighting kits. It uses Procmail and formail to detect, block, and optionally complain to the host ISP about spam. SpamDunk, by Walter Dnes, is similar, as is Greg Sutter's junkfilter. John Hardin's Procmail Filters Kit is noteworthy in that it sorts its many spam detecting rules into easily recognizable categories. (For example, if an administrator or user does not want to block messages that appear to be unwanted religious diatribes, he or she can disable the "proselytize-trap" filter.) Other Procmail-based spam filters include Bob's Spam Filter, Steve Tucker's Spamkill, and Farhad Anklesaria's Spamtrap.

Administrators who use sophisticated procmail filters to catch spam and/or malware should be warned that they may incur significant overhead. An instance of procmail (as well as instances of other programs, such as the perl "comterpreter") will be spawned for each incoming message. At least one copy of the entire message -- perhaps more! -- will likely be created in memory. It is thus wise to provide plenty of RAM and swap space, limit the number of simultaneous incoming SMTP connections, limit the maximum system load at which sendmail will fork new processes, and/or set a maximum message size in the mail transfer agent. Otherwise, an extremely large message or a barrage of spam could soak up much or all of the server's memory as it passes through the filter. (The author has watched elaborate procmail filters exhaust virtual memory on heavily burdened servers.) To limit the sizes of incoming messages in sendmail, add a line such as

define(`confMAX_MESSAGE_SIZE',2097152)dnl
to the .mc file from which sendmail.cf is built. (The example above limits messages to 2 megabytes.) The number of concurrent sendmail processes can be limited by a line such as
define(`confMAX_DAEMON_CHILDREN',12)dnl
(This line limits the number of processes sendmail can fork to accept incoming messages or process its message queues to 12.) sendmail refuses to accept connections once it has reached its quota of child processes.

To prevent sendmail from forking processes to accept or deliver mail when the system load average is very high, add commands such as

define(`confREFUSE_LA',8)dnl
and
define(`confQUEUE_LA',6)dnl
to the .mc file. All of the commands mentioned here are documented in the readme file in the /cf directory of the sendmail distribution.

Using Procmail as the Local Mail Delivery Agent (LDA or MDA)

If only a few users need spam or malware filtering via procmail filter kits, it is easiest to encourage them to pipe their mail through procmail via their .forward files. However, if many users want filtering, or if the site's policy is to filter malware globally using a procmail filter kit (see Shielding Users Against Malware below), it is desirable to make procmail the local mail delivery agent for the entire system. On recent versions of sendmail, this can be done by installing procmail and adding the line
FEATURE(`local_procmail')dnl
to the .mc file from which sendmail.cf is built. (All other feature macros beginning with "local_" should be removed to avoid conflicts.) Global filters can then be added to the global procmailrc file (often at /usr/local/etc/procmailrc on BSD systems), and individuals can choose to invoke additional filters via ~/.procmailrc files in their home directories.

On most of the systems which the author administers, malware is filtered globally and unconditionally for safety's sake. sendmail (or, optionally, smtpd) is configured as a first line of defense against spam; it blocks unauthorized relaying, validates "From" addresses, and consults one or more DNS blacklists to identify suspect hosts. However, at all but a few sites, the use of content-based spam filter kits is optional; they are activated for individual users via ~/.procmailrc files.

smtpd/smtpfwdd

Obtuse Systems' smtpd and smtpfwdd are a complementary pair of SMTP proxy daemons that can be combined to serve as a front end for sendmail. As with the sendmail access_db feature, smtpd allows the administrator to create  rules that govern the acceptance or rejection of mail. The format of these rules, described here, allows them to make decisions based on envelope addresses (that is, those used in the SMTP "MAIL FROM:" and RCPT TO:" commands); the IP address, domain, or address range of the connecting host; DNS blacklists; or a user name obtained via identd. The rules allow fine-grained control of relaying, but do not examine RFC 822 headers, the body of the message, or attachments.

Because smtpd is far more "lightweight" than sendmail, it reduces the overhead of rejecting mail that fails to satisfy the rules. Also, use of smtpd and smtpfwdd to "wrap" sendmail may have security advantages. These daemons are much smaller and simpler than sendmail, and their code has been thoroughly audited. They also run in a chroot "jail" that confines them to /var/spool/smtpd. Like sendmail, smtpd can "fib" about its identity in the SMTP welcome message. A "skript kiddie" probing for an MTA with a known security hole may be misled by this ruse and not mount an attack on an otherwise vulnerable server.

Perhaps the best reason to use smtpd to wrap sendmail, however, is that the semantics of the smtpd access control rules are much more straightforward than the idiosyncratic ones of sendmail access_db patterns.

smtpd and smtpfwdd come as part of the standard OpenBSD installation but are not enabled by default. They must be installed as ported software or compiled from source on most other operating system distributions.


Defeating Address Harvesting

Spammers would have a difficult time if they could not easily obtain large numbers of e-mail addresses to spam. Unfortunately, newsgroup postings, mailing lists, and Web pages represent a bountiful crop of information, ripe for the harvesting, about potential victims. ISPs routinely warn new users that anyone who posts to a newsgroup via the utility included with his or her browser is effectively publishing his or her e-mail address for the whole world to see. Most mailing list software publishes the sender's e-mail address verbatim, and if the list is archived on a Web server or subscribed to by a spammer all of the participants' addresses are compromised. On AOL, chat rooms provide spammers with a rich source of spammable "screen names." And the "mailto:" links found on more than 25% of all Web pages can likewise be harvested by a simple Web "spider." Even the finger daemon, enabled by default in many operating system distributions, can be a bountiful source of addresses.

Securing the finger Daemon

In the early days of the Net, when it was used almost entirely by a safe and friendly group of academics, users allowed anyone, anywhere to see certain information about them. Typing finger bullwinkle@wassamatta.edu would tell you whether Bullwinkle was online, whether he had mail, when he'd last read his mail, and (optionally) anything else Bullwinkle wanted to say to introduce himself to the world. (This file of personal information became known, for historical reasons, as one's "finger plan.") If you typed finger @wassamatta.edu with no user name, you'd see the names of all of the users who were logged onto the machine.

Needless to say, it wasn't long before spammers realized that this was a great way to harvest addresses. In 1996, the author resumed use of a shell account on The WELL, a public access Unix host and conferencing system. He didn't send mail, post to newsgroups, or browse the Web from the account, yet within a day his mailbox -- which had received mail only occasionally before -- was suddenly awash in spam. His user name had been harvested via the finger daemon.

Harvesting via fingerd was such a serious problem on this particular system that administrators provided users with a way to delete their names from the output of the finger program. But by the time the author learned of this feature it was too late; the address was already on several widely circulated spam CD-Rs. The vacation program now directs those who send mail to that account to a Web e-mail form.

Fortunately, it is relatively easy to secure fingerd, the Berkeley finger daemon, and its derivatives against harvesting. The author often uses the following entry in /etc/inetd.conf:

finger stream tcp nowait nobody /usr/libexec/fingerd fingerd -s -l -p /usr/local/bin/nonetfinger
The -s option prevents the daemon from listing all of the users on the system if no user name is specified. -l enables logging, and -p directs network finger requests to a program called nonetfinger. The source for this almost trivial program appears in Listing 3.
 
#include <stdio.h>
main() {
       puts("Sorry; for security reasons, and to prevent our users");
       puts("from being targeted for unsolicited \"junk\" e-mail, this");
       puts("site does not honor network finger requests. We apologize");
       puts("for any inconvenience.");
}
Listing 3: nonetfinger.c -- A tiny program that politely rejects network finger queries

Because the -s, -l, and -p options apply to fingerd (the finger daemon) but not the finger program, typing finger or w still works properly from the shell. One can also shut down fingerd altogether, or rewrite nonetfinger.c to provide bogus or useless output to a spammer, but in the author's experience a warning has proven to be the best policy.

Securing identd Against Spammers

Many servers on the Internet -- especially Web servers -- try to use identd (RFC 1413) to determine the identity of the user who is making a request. Some of these servers respond more slowly (or even reject requests) if the query fails, so it pays to activate identd and cause it to respond with something. However, the response should not include a user's login name (the default). If identd reveals login names, any user who visits a Web site or makes any other request from a remote server can become a target for spam.

Most versions of identd included with BSD-based operating systems can be configured to supply only the user's uid number (a unique identifier that isn't useful as an e-mail address) by placing the following line in /etc/inetd.conf:

ident stream tcp wait root /usr/local/sbin/identd identd -w -o -t120 -F%U
This protects against spamming, but makes it possible for the operator of a server (or a cross-site advertising service) to tell when the same user returns.

Alternatively, many versions of identd can send an encrypted 32-character string which can be decoded to reveal a uid number, IP addresses, port addresses, and a timestamp. The encrypted token allows the administrator of a remote system which has been subjected to abuse to forward the string to the local administrator so that the perpetrator can be identified. The encryption is weak (single DES) and is subject to known plaintext attacks, but it is sufficient to discourage tracking of users. Note that this mechanism offers protection against address harvesting even if the encryption is cracked, because only a uid number (not a user name) is present in the encrypted data. To use this facility, replace -F%U in the line above with -C, and place a passphrase in the first line of the file /etc/identd.key.

OpenBSD's version of identd dispenses with the weak encryption and offers a better solution. It can send a completely opaque token and record the token and the user's identity in the system log. Abusers can be identified so long as logs are retained.

There is also a GPLed identd replacement program called ident2 which can generate random replies or give each user control over what it sends. Since user-defined responses allow a user to hang the blame for mischief on someone else, and random responses prevent the administrator from identifying those who engage in network abuse, neither is a good option.

Web Page Address Harvesting

Web pages provide one of the richest troves of addresses for spammers. Address harvesting robots, often called "spambots," troll Web sites, ignoring the restrictions suggested by robots.txt files. Webmasters have therefore devised many ways to prevent address harvesting from their pages (or at least make it less fruitful.)

Perhaps the simplest way Webmasters obscure addresses and mailto: links is via the use of HTML entities. Instead of

mailto:clueless@newbie.com
the Webmaster might code the link as
&#109;ailto&#58;clueless&#64;newbie&#46;&#99;om
Surprisingly, the majority of address harvesting programs do not recognize addresses obscured in this manner. Even if more are created which do understand this ruse, many spammers will use outdated or pirated software. So, it may be worthwhile (and certainly cannot hurt) to process one's pages to add this small bit of obscurity. The task of creating a perl script which automatically obscures colons, "@" signs, and some or all of the other characters of e-mail addresses in an HTML source file is left as a simple exercise for the reader. Unfortunately, it's easy to forget to run pages through such a filter, and the average user who creates his or her own personal or business Web pages may not understand the dangers of placing addresses there. Therefore, it may be desirable to create a filter which automatically performs this transformation on all of the outgoing traffic from a Web server.

Another good way to avoid being placed on many spammers' lists is to obtain an address in a .edu or (especially) .gov domain. Most (though not all) spammers purge their lists of such addresses to avoid running afoul of government entities.

Some address harvesting robots betray themselves via the "HTTP_USER_AGENT" field, allowing you to recognize them and refuse access. Charles Brabec's list of HTTP_USER_AGENT fields returned by known "spambots" explains how to to block access in this way. Apache, running on BSD, makes it easy to do this via the mod_rewrite module.

Greg Subino Mullane's excellent Spambot Beware site describes other methods by which spambots can be recognized, and explains how to "poison the well" by feeding spambots bogus addresses. All of the scripts on his site run unchanged, using perl and Apache, on all of the BSDs.

When all is said and done, however, the most certain way to prevent Web pages from providing addresses is not to put them there -- at least not in a form that a spambot can understand. Some users, for example, have taken to rendering their addresses as bitmaps. This prevents automatic scanning but may make addresses inaccessible to users of text-based browsers, who are frequently blind or visually impaired. (One way to avoid this pitfall is to place text which can be pronounced by a screen reader -- but doesn't look like an e-mail address -- in the image's ALT tag, e.g. "clue less at new bee dot com".) It is also possible to create a mailto: link on the fly via a client-side script (which a spambot cannot execute), or supply the link via a POST method (which a spambot generally cannot use). Mullane covers these and other techniques in the avoidance section of his site.

Finally, some enlightened ISPs now provide Web e-mail input forms for every user. These allow a stranger to make an initial contact; if the message is legitimate, the recipient can respond via e-mail, revealing his or her address.

Address Harvesting from Usenet

Any e-mail address that appears in a Usenet posting (in the body or in the headers) is likely to receive spam. Therefore, the best way to avoid spam is to post with a return address which is either "mangled" (that is, it is not the correct address but a human can deduce the correct address from it) or nonexistent. One useful practice is to change the return address to the URL of a Web e-mail form, so that one can be contacted by individuals but cannot be added to mass mailing lists.

Unfortunately, e-mail clients with newsgroup reading capabilities, such as Netscape, often supply one's address whether one likes it or not. (In other clients, such as Opera, one can assume multiple identities with different addresses, but this feature requires some technical skill to configure.) Therefore, the best spam avoidance technique for the average user may be to use Web-based services such as Deja.com instead of a standard news server.

Address Harvesting from Domain Name Registries

Apparently, many spammers are uninformed enough to believe that the contacts listed by domain name registrars are likely prospects for spam. (In fact, they are extremely likely to be seasoned Internet veterans with no tolerance whatsoever for spam.) Unfortunately, it is unwise to forge domain name contact addresses, since billing information is often sent to them. And surprisingly, while it would be easy for domain name registrars to offer Web-based e-mail forms for use when one must reach a domain contact, no registrar has yet offered spam-resistance as a competitive feature. For the nonce, administrators can assign unique addresses for domain name contacts and filter the inevitable spam via blacklists and content-based filters.

"Rumplestiltskin" Attacks

In 1999, system administrators throughout the Internet noticed that their mail servers were being slowed by attacks from programs which attempted to guess user names -- as does the heroine of the Grimm Brothers' story Rumpelstilzchen (usually Anglicized as "Rumplestiltskin"). The author observed such an attack on one of his own servers in mid-August. The program would open several SMTP (Simple Mail Transfer Protocol) connections to the server, as if it were going to send mail. On each, it would send a HELO command, a MAIL FROM: command with a bogus return address, and a series of RCPT TO: commands. The first command contained a common first or last name, such as "mark", "brian", "smith", etc. Next, the attacker tried the name plus digits from 1 to 5 -- for example, "john1", "john2", etc. No more than six names would be sent per connection. Each connection was dropped immediately thereafter without an SMTP DATA command, a QUIT command or even a TCP/IP FIN packet. The mail server would reject most of the guessed addresses, but the few that were not rejected were recorded as target addresses for spam. A log excerpt from such an attack (with the victim's name changed to protect the innocent) appears in Listing 4.
 
Aug 11 211601 myhost sendmail[5119] VAA05119 <mark@myhost.com>... User unknown
Aug 11 211601 myhost sendmail[5120] VAA05120 <brian3@myhost.com>... User unknown
Aug 11 211602 myhost sendmail[5119] VAA05119 <mark1@myhost.com>... User unknown
Aug 11 211602 myhost sendmail[5120] VAA05120 <brian4@myhost.com>... User unknown
Aug 11 211606 myhost sendmail[5120] VAA05120 <brian5@myhost.com>... User unknown
Aug 11 211607 myhost sendmail[5119] VAA05119 <mark2@myhost.com>... User unknown
Aug 11 211607 myhost sendmail[5126] VAA05126 <smith@myhost.com>... User unknown
Aug 11 211608 myhost sendmail[5126] VAA05126 <smith1@myhost.com>... User unknown
Aug 11 211610 myhost sendmail[5126] VAA05126 <smith2@myhost.com>... User unknown
Aug 11 211610 myhost sendmail[5135] VAA05135 <wilson3@myhost.com>... User unknown
Aug 11 211611 myhost sendmail[5137] VAA05137 <me@myhost.com>... User unknown
Aug 11 211613 myhost sendmail[5119] VAA05119 <mark3@myhost.com>... User unknown
Listing 4: Log entries from a "Rumplestiltskin" attack

Other attacks have been observed in which larger numbers of addresses were tested per connection. Often, the RCPT TO: commands sent in these attacks were "pipelined;" that is, they were sent before the server had responded to the previous command. This consumed buffer space and rendered the server's attempts to pace requests by delaying responses less effective.

Because it acts as if it is actually about to send a message and is naming the recipients, this type of address harvesting "bot" is able to extract addresses even from mail servers whose administrators have wisely disabled the VRFY (Verify Address) command. Worse still, its modus operandi can bring mail servers to a screeching halt. The overhead of checking many addresses can choke sendmail, which runs addresses through many rules to verify them. As mentioned above, at least one of these "bots" never bothered to send a message or close the connection to the server properly; it simply left the server hanging until it timed out. Because mail servers traditionally have very generous timeouts (to accommodate slow modem connections and network congestion), and because most of them limit the number of incoming connections they'll accept at one time, mail servers attacked by this program simply stopped working.

Recent versions of sendmail automatically slow their responses after a certain number of bad commands have been sent. However, an analysis of the code of sendmail 8.11.0 shows that RCPT TO: commands specifying invalid recipient addresses are not counted as "bad" -- nor are they counted toward the quota of recipients per message! Also, slowing responses does not help if the attacker is patient. In fact, it may prolong the attack, increasing the amount of time for which the server is at its maximum number of connections. Thus, the current safeguards in sendmail do not defend well against this particular attack. (Most other MTAs are also susceptible.)

Countermeasures Against "Rumplestiltskin" attacks

Because legitimate senders should be notified when they send mail to invalid addresses, "pretending" to accept bad addresses with no warning is not a good option. Accepting all addresses during the SMTP transaction and later sending a "bounce" message is one possibility, and has the advantage that it clutters the spammer's list with worthless addresses. However, the RFC 822 From: header and envelope MAIL FROM: address, if present,  may be forged. Therefore, any notification mechanism that operates after the fact should be designed so that it does not deluge an innocent bystander with mail. It should limit the number of messages it sends, and should not be activated when there is no SMTP DATA command or when return addresses are questionable.

One way to prevent an attacker from consuming the MTA's quota of incoming connections, or slowing the server with many quick connections, is to use inetd (or a similar program, such as juniperd) to limit the number of connections the server will accept from a single IP address in one minute. (sendmail can limit the number of connections it accepts per minute, but not by IP address.) However, since sendmail is designed to run best as a stand-alone daemon, a better solution would be to modify sendmail itself so that it takes IP addresses into account. sendmail and other MTAs should limit the number of connections per minute and the number of simultaneous connections from any one address. It may also be desirable to limit the number of invalid recipients per message, and/or to count bad recipients toward the maximum number of bad commands and the maximum number of recipients.

Finally, because most robots of the type described above send several (sometimes many!) RCPT TO: commands without waiting for the SMTP response after each one, it may be desirable for the server to insist that the conversation be synchronous. RFC 821 allows the server to do this:

The communication between the sender and receiver is intended to be an alternating dialogue, controlled by the sender.  As such, the sender issues a command and the receiver responds with a reply.  The sender must wait for this response before sending further commands.
Thus, a server that has not specifically authorized SMTP pipelining (RFC 1854) is "within its rights" to cut off any connection where another command is received before a response to the previous one has been sent. An option which turns off pipelining, and cuts off clients which attempt to pipeline commands anyway, should be considered for future releases of sendmail and other MTAs. (Postfix is the only MTA that, to the author's knowledge, already offers this feature.)

If the MTA does not provide internal protection against name guessing attacks, it is possible to add it, either via patches or via a daemon which monitors the system's mail logs. This daemon would note when a particular host was generating many error messages, and could block it via a firewall rule, via a "blackhole" route in the system's routing table, or via an access control mechanism such as the smtpd rule file or the sendmail access_db feature. It is also possible to cause sendmail to limit the number or rate of connections per IP address via a "Milter" filter. However,  this would incur much more overhead than inserting the feature directly into the program.


Detecting Outgoing Spam and Mail Bombing

One of the most difficult tasks faced by a network administrator, especially at an ISP, is detection of outbound spam and mail bombing. Spammers' software often connects directly to the target system and/or a relay host that is to be exploited, preventing logging of the attempt in a mail server.

One controversial technique used by an increasing number of ISPs is to block connections destined for hosts outside the local network on IP port 25 (SMTP), especially when they originate from dial-up ports or from cable or DSL modems without fixed IP addresses. This is roughly equivalent to the function of the MAPS DUL blacklist, but since it operates at the source it protects those who do not subscribe to the blacklist as well. If this strategy is implemented, provisions should be made to make exceptions for users who have a legitimate need for outgoing SMTP and can be verified not to be engaged in spamming.

A more subtle technique which avoids complaints caused by outright blocking of outgoing SMTP is to redirect outgoing mail through a transparent proxy. This can be done in a router or by the IP Filter firewall software, which is available for the three cooperatively developed BSDs as well as several other BSD-derived OSes. A redirect rule for ipnat (part of the IP Filter package) which looks like

rdr ed0 0.0.0.0/0 port smtp -> 127.0.0.1 port smtp
(where ed0 is the internal interface on the gateway router) redirects all outbound SMTP connections to the mail server on the router regardless of the mail's final destination. If the server is correctly configured, it will relay and log all outgoing mail without becoming an "open" relay. For the sake of efficiency, trusted hosts can be allowed to bypass the proxy.

Once this mechanism is in place, either a sendmail "Milter" filter or a log monitor can be used to detect and stop abuse. A log monitor may visit the sendmail log file periodically and look at the last few entries; it may also be designed to accept piped output from syslogd, the Unix system log daemon. If there is evidence of abuse, the monitor can alert the system administrator via e-mail or pager and/or lock out the offending party. The author has found log monitors to be highly effective in his own work.

Kai's SpamShield (which was developed for BSD/OS and runs on all of the BSDs) is a good example. A simple log monitor written in Perl, it periodically examines the most recent entries in the sendmail log file and notes the number of recipients to which a host or local user is sending mail. If it sees an excessive amount of traffic in a short period of time, it can notify the administrator and/or quarantine an offending host by creating a "blackhole" route on the mail server. In some network and server configurations, it may be more efficient (or even necessary) to modify the script to block connections via other means, including the mail server's access control mechanism and/or firewall rules. 2swatch, a more general log monitoring program, may also be adapted for this purpose.


Shielding Users Against Malware

Computer viruses and "Trojan horse" programs have plagued computer users since the early days of commodity personal computers. At first, non-self-propagating Trojan horses, as well as boot sector and program-infecting viruses, were the norm; these were later followed by document-infecting viruses such as "Concept," which leveraged the macro capabilities of programs such as word processors, spreadsheets, and databases. However, the rise of the Internet and the skyrocketing popularity of e-mail greatly increased the risk by allowing malware to propagate far more quickly and effectively. E-mail is now the primary vector for malware. This section will describe ways in which software can examine e-mail traffic and guard users against these threats.

"Trojan Worms"

Internet users face constant assaults from self-propagating malware such as the Melissa, Happy99, PrettyPark, ExploreZip, and ILOVEYOU Trojan horses. Sometimes called "Trojan worms," these programs usually require the recipient to activate them (as does a Trojan horse) but then propagate on their own to new victims (as does a worm).

These programs often exploit "social engineering" techniques to persuade users to activate them. The ILOVEYOU worm, for example, makes use of a hidden extension exploit to make it appear that an executable attachment is an innocuous text file. ExploreZip -- perhaps the most cunning example of a "social engineering" Trojan worm to date -- is even more subtle. It operates as an e-mail autoresponder, replying immediately to incoming mail with a message that reads:

I received your email and I shall send you a reply ASAP.
Till then, take a look at the attached zipped docs.
Attached to the message is an executable file which appears at first glance to be a self-extracting archive file. It is actually a copy of the worm.

Because the Subject: header of the automatic response matches that of a recently sent message, and the From: address is familiar, the correspondent believes that the automatically generated message is part of an ongoing conversation and trustingly runs the attachment. Unfortunately, this particular worm carries a nasty payload: it destroys files not only on the victim's hard disk but on any shared drives or directories to which he or she has access.

Malware which taps users' e-mail address books for the addresses of likely victims, or otherwise attempts to exploit existing relationships between correspondents, is sometimes called a "Friends and Family virus," after MCI's famous promotional program for its long distance service.

From "Trojan Worm" to Worm

The danger from such programs has recently increased. Newly discovered security holes in mail clients, most commonly Microsoft's Outlook and Outlook Express but also others such as Eudora, make it practical and in fact simple to create "true" worms that propagate via e-mail but do not require user interaction to run on the recipient's machine. Many can also use network file sharing (especially common on Windows LANs) to spread automatically. Information about how to create such programs has been widely disseminated on Internet mailing lists such as Bugtraq. The exploits which these worms use to launch themselves usually fall into three categories: All of these vulnerabilities can also be exploited by malware that does not self-propagate but is targeted at a specific recipient.

Invasion of Privacy and DoS attacks via E-mail

E-mail may also contain exploits that compromise the recipient's privacy or render his or her system unusable. An HTML message containing an image tag will cause many e-mail clients to retrieve the image automatically when the mail is read. If the recipient's e-mail address (or any other unique identifier) is included in the image tag (e.g. <IMG SRC="http://images.spammer.com/picture.jpg?clueless@newbie.com">), a spammer can determine from his or her Web server logs that the address is valid and that the mail was opened. Such an image tag is sometimes called a "mail bug." Because the image will be retrieved via HTTP, the server may also be able to place a cookie on the recipient's machine if browser software is used to display the mail. (The most popular e-mail clients all use browser software to render mail. Outlook, Outlook Express, and AOL use Microsoft Internet Explorer; Netscape Communicator uses Netscape Navigator; Opera uses its own internal HTML rendering software; and Eudora uses Internet Explorer unless explicitly configured not to do so.) The user may not know that any invasion of privacy has taken place, especially if the image is a "clear GIF" or "Web bug." An e-mail message may also contain active content exploits which extract personal information from the recipient's computer.

A hostile script embedded in e-mail may "take control" of the recipient's machine by opening an advertising or pornographic Web page in the user's browser. It can then prevent him or her from closing the window or shifting the focus. A malicious script can freeze the browser or the entire machine. (Many of these exploits are cross-platform due to the cross-platform nature of HTML, JavaScript, and Java.) A message with intentional formatting errors may crash some vulnerable e-mail clients or even some MTAs. For example, it was recently reported that some versions of Microsoft Exchange will halt, refusing to revive until queue files are manually deleted, if they encounter a null MIME boundary string.

Stopping Malware with procmail

Detecting and disabling e-mail exploits relies on careful examination of the message content. In some cases, the name of a MIME attachment is a telltale sign that something is amiss. For example, a message with an attachment named Happy99.exe or Pretty Park.exe is almost certain to be dangerous and should be flagged or quarantined. The names of executable attachments, as well as attached documents which may contain macro viruses, can be "mangled" so that the attachments cannot be activated by a careless click but can be opened if they are verified to be legitimate. Active content, such as image tags and "live" scripts, can likewise be disabled so as to prevent untoward results when the message is viewed.

The most common way of detecting such exploits is via procmail filter kits designed for this purpose. John Hardin's Procmail Sanitizer, perhaps the best of these, disables active content, "mangles" file extensions, optionally disables image tags, and can quarantine messages which are likely to contain malicious code. It can also scan Microsoft Office documents for macros and score them according to their potential virulence. Bjarni R. Einarsson's Anomy is similar and credits Hardin's work as inspiration.

procmail malware filter kits are installed in the same way as procmail filters for spam. (See the sections titled Open Source Spam Filter Kits and Using Procmail as the Local Mail Delivery Agent (LDA or MDA) above.) Since procmail is a mail delivery agent, procmail filters normally process only mail which is bound for local users on the system where they run. However, with special changes to rules in sendmail.cf, they can work with sendmail to filter all the mail that passes through a server.

Stopping Malware with sendmail "Milter" Filters

The sendmail "Milter" interface may also be used to write filters which "defang" messages bearing suspected malware. Again, refer to the "libmilter" directory of the most recent sendmail distribution for an example.

Using Commercial Virus Checkers on the Server

Inflex, Amavis, and Petr Rehor's Antivirus for Sendmail can invoke commercial virus checkers, such as those published by Kaspersky Lab and Network Associates, on e-mail that arrives at a server. While this technique is certainly useful, it is not advisable to rely on it as one's sole malware detection method for several reasons. First, new patterns for commercial products are often released only after a virus has spread in the wild, and may therefore come too late to prevent a crippling outbreak. Second, the patterns used by these products are often very specific, so as to prevent unsophisticated users from encountering false alarms. They can thus miss even the slightest mutation. (A variant of the of the PrettyPark "Trojan worm" in which the same executable file was simply delivered in an unpacked format was missed by most commercial virus checkers but was caught by the author's rule-based filters.) Finally, because malware can spread by means other than e-mail (for example, via network file sharing), a worm or Trojan horse program can propagate even if the mail server is secured.

Ideally, filtering on the mail server should be heuristic (that is, it should be able to recognize new as well as existing malware) and should focus on catching malware for which e-mail is an important vector. The best approach, in the author's experience, is a combination of rule-based checking on the server and a regularly updated commercial virus checker on each client. While the repertoires of the server and client software can and should overlap, both are necessary for good security.


Conclusion

Spam and malware -- like crabgrass and athlete's foot -- will always be nuisances and will crop up from time to time despite our best efforts to control them. Nonetheless, properly configured servers can reduce them from a relentless plague to a minor and occasional nuisance. BSD-derived operating systems -- together with programs such as sendmail, perl, smtpd, Apache, log monitors, IP Filter, and specialized e-mail filters -- are the state of the art software for such servers.

The elimination of spam is more art than science. So long as people allow themselves to be contacted by strangers (something which is not always undesirable and is sometimes quite delightful), opportunities will exist for spammers to send them unsolicited junk mail. However, the hijacking of computer systems and networks to send mass quantities of spam should be eliminated, as should spam which is an unwarranted intrusion or which is fraudulent. The spam fighting tools and techniques mentioned here rank among the best that have been developed to date.

Heuristic e-mail filters have proven to be remarkably effective at checking the spread of new malware. The filters on the author's servers have caught every copy of Melissa, ExploreZip, ILOVEYOU (all variants), Happy99, PrettyPark, and similar malicious programs sent to or through them, protecting users from massive damage and saving untold hours of cleanup work. Equally impressive has been the total absence of false positives in two years of operation. Nonetheless, because malware also spreads via means other than e-mail, the use of filters on the server does not and cannot eliminate the need for malware-eliminating tools on client machines. Since there is sometimes a delay of a week or more between the start of an outbreak and the time when new pattern sets for the "brand name" virus checkers are ready, the two can work in concert, with the server "holding the fort" until patterns are ready for the client machines. Ultimately, only a check at the client can deal with all malicious software regardless of the vector via which it arrived.


Acknowledgements

Many thanks to Terry Lambert, Joel Maslak, John Hardin, Scott Hazen Mueller, Jim Dennis, and Gregory Neil Shapiro, who suggested improvements to this paper and pointed out many misteaks and omssions. Thanks also to the authors and maintainers of the BSDs and of the other utilities mentioned in this document for their contributions to the state of the art. Trademarks mentioned in this document are the property of their respective owners.


About the Author

Brett Glass has more than 25 years of experience designing, building, writing about, and crash testing computer hardware and software. A consultant, author, and programmer based in Laramie, Wyoming, Brett obtained his Bachelor of Science degree in Electrical Engineering from the Case Institute of Technology and his MSEE from Stanford. He writes and architects software, designs hardware (including chips, embedded systems, and network servers), and has more than 1500 published articles to his credit. During his long and eclectic career, Brett has designed, written code for, or documented such widely varied products as Borland's Pascal "toolboxes" and compilers, Living Videotext's ThinkTank outline processor, Cisco Systems routers, Earth Computers' Earthstation diskless workstations, and Texas Instruments' TMS380 Token Ring networking chipset. When he's not writing, consulting, speaking, or cruising the Web in search of adventure, he may be playing the Ashbory bass, doing carpentry, teaching Internet courses for LARIAT (Laramie's community network and Internet users' group), cooking up a storm, or enjoying spicy ethnic food. He is available for consulting, writing, and speaking engagements and can be reached at http://www.brettglass.com/mailbrett.html (no spam please).