SPAM
E-mail has become the subject of much abuse, in the form of both spamming and E-mail worm programs. Both of these flood the in-boxes of E-mail users with junk E-mails, wasting their time and money, and often carrying offensive, fraudulent, or damaging content. This article describes the efforts being made to stop E-mail abuse and ensure that E-mail continues to be usable in the face of these threats.Defense against spam
There are a number of services and software systems that mail sites and users can use to reduce the load of spam on their systems and mailboxes. Some of these depend upon rejecting email from Internet sites known or likely to send spam. Others rely on automatically analyzing the content of email messages and weeding out those which resemble spam. These two approaches are sometimes termed blocking and filtering.Blocking and filtering each have their advocates and advantages. While both reduce the amount of spam delivered to users' mailboxes, blocking does much more to alleviate the bandwidth cost of spam, since spam can be rejected before the message is transmitted to the recipient's mail server. Filtering tends to be more thorough, since it can examine all the details of a message. Many modern spam filtering systems take advantage of machine learning techniques, which vastly improve their accuracy over manual methods. However, some people find filtering intrusive to privacy, and many mail administrators prefer blocking to deny access to their systems from sites tolerant of spammers.
Spam blocking and filtering techniques
Content-based filtering
Until recently, content filtering techniques relied on mail administrators specifying lists of words or regular expressions disallowed in mail messages. Thus, if a site receives spam advertising "herbal Viagra", the administrator might place these words in the filter configuration. The mail server would thence reject any message containing the phrase.Content based filtering can also filter based on content other than the words and phrases that make up the text of the message. Primarily, this means looking at the headers of the email, the part of the message that contains information about the message, and not the text of the message. Spammers will often spoof headers in order to hide their identities, or to try to make the email look more legitimate than it is; many of these spoofing methods can be detected. Also, spam sending software often produces headers that violate the [[RFC 2822]] standard on how email headers are supposed to be formed.
Disadvantages of this static filtering are threefold: First, it is time-consuming to maintain. Second, it is prone to false positives. Third, these false positives are not equally distributed: manual content filtering is prone to reject legitimate messages on topics related to products advertised in spam. A system administrator who attempts to reject spam messages which advertise mortgage refinancing may easily inadvertently block legitimate mail on the same subject.
Finally, spammers can change the phrases and spellings they use, or employ methods to try to trip up phrase detectors. This means more work for the administrator. However, it also has some advantages for the spam fighter. If the spammer starts spelling "Viagra" as "V1agra" (see leet) or "Via_gra", it makes it harder for the spammer's intended audience to read their messages. If they try to trip up the phrase detector, by, for example, inserting an invisible-to-the-user HTML comment in the middle of a word ("Via<---->gra"), this sleight of hand is itself easily detectable, and is a good indication that the message is spam. And if they send spam that consists entirely of images, so that anti-spam software can't analyze the words and phrases in the message, the fact that it is image only can be detected.
Statistical filtering
Statistical filtering was first proposed in 1998 by Mehran Sahami, et al., at the AAAI-98 Workshop on Learning for Text Categorization. A statistical filter is a kind of text classification system, and a number of machine learning researchers have turned their attention to the problem. Statistical filtering was popularized by Paul Graham's influential 2002 article, which used Naive Bayesian classification to predict whether messages are spam or not -- based on collections of spam and nonspam ("ham") email submitted by users.Statistical filtering, once set up, requires no maintenance per se: instead, users mark messages as spam or nonspam and the filtering software learns from these judgements. Thus, a statistical filter does not reflect its author's or administrator's biases as to content, but it does reflect the user's biases as to content; a biochemist who is researching Viagra won't have messages containing the word "Viagra" flagged as spam, because "Viagra" will show up often in his or her legitimate messages. It can also respond quickly to changes in spam content, without administrative intervention.
Spammers have attempted to fight statistical filtering by invisibly inserting many random but valid words into their messages, making more likely that the filter will classify the message is neutral; they make the words invisible by giving them a very tiny font, by making the words the same color as the background, or both. However, the countermeasures seem to have been largely ineffective.
Software programs that implement statistical filtering include Bogofilter, the e-mail programs Mozilla and Mozilla Thunderbird, and later revisions of SpamAssassin. Another interesting project is CRM114 which hashes phrases and does bayesian classification on the phrases.
You can also check POPFile [3] (http://popfile.sourceforge.net) that will sort mail in as many category as you want (family, friends, co-worker, spam, whatever) with bayesian filtering.
Checksum-based filtering
Checksum-based filter takes advantage of the fact that, for any individual spammer, all of the messages he or she sends out will be mostly identical, the only differences being web bugs, and when the text of the message contains the recipient's name or email address. Checksum-based filters will strip out everything that might vary between messages, reduces it to a checksum, and compares it to a database which collects the checksums of messages that email recipients consider to be spam (some people have a button on their email client which they can click to nominate a message as being spam); if the checksum is in the database, the message is likely to be spam.The advantage of this type of filtering is that it lets ordinary users help identify spam, and not just administrators, thus vastly increasing the pool of spam fighters. The disadvantage is that spammers can insert unique invisible gibberish -- known as hashbusters -- into the middle of each of their messages, thus making each message unique and having a different checksum. This leads to an arms race between the developers of the checksum software and the developers of the spam-generating software.
Checksum based filtering methods include:
- Distributed Checksum Clearinghouse
- Vipul's Razor
- Pyzor
Protocol extensions
A number of proposals and specifications have been written to extend the SMTP protocol to avoid spam, including:- Sender Policy Framework (SPF, formerly known as Sender Permitted From)
- Trusted Email Open Standard (TEOS)
- Tripoli protocol
Messages certified as not being spam
There are several third-party organizations which guarantee that certain messages aren't spam, and have the means to prevent spammers from fraudulently using their system, by fining or suing them, for example. Administrators can use this to let through messages that would otherwise be filtered or blocked as spam, thus reducing the false positive rate.Organizations that implement such systems include:
- Habeas Sender Warranted Email
- Bonded Sender.
Heuristic filtering
Heuristic filtering, such as is implemented in the program SpamAssassin, uses some or all of the various tests for spam mentioned above, and assigns a numerical score to each test. Each message is scanned for these patterns, and the applicable scores tallied up. If the total is above a fixed value, the message is rejected or flagged as spam. By ensuring that no single spam test by itself can flag a message as spam, the false positive rate can be greatly reduced.Spam tips for users
Aside from installing client-side filtering software, end users can protect themselves from the brunt of spam's impact in numerous other ways.Address munging
One way that spammers obtain email addresses to target is to trawl the Web and Usenet for strings which look like addresses. Thus, if one's address is never listed on these forums, they cannot find it. Posting anonymously, or with an entirely faked name and address, is one way to avoid this "address harvesting". Users who want to receive legitimate email regarding their posts or Web sites can alter their addresses in some way that humans can figure out but spammers haven't (yet). For instance, joe@example.net might post as joeNOS@PAM.example.net, or display his email address as an image instead of text. This is called address munging, from the jargon word "mung" meaning to break.Address munging does not, however, evade so-called "dictionary attacks" in which the spammer generates a number of likely-to-exist addresses out of names and common words. For instance, if there is someone with the address adam@example.com, where 'example.com' is a popular ISP or mail provider, it is likely that he frequently receives spam.
Disposable e-mail addresses
Many email users sometimes need to give an address to a site without complete assurance that the site will not spam, or leak the address to spammers. One way to mitigate the risk of spam from such sites is to provide a disposable email address -- a temporary address which forwards email to your real account, but which you can disable or abandon whenever you see fit.A number of services, such as Spamgourmet, SpamDay (http://spamday.com/) or E4ward.com (http://www.e4ward.com), provide disposable address forwarding. Addresses can be manually disabled, can expire after a given time interval, or can expire after a certain number of messages have been forwarded.
Another similar service is Mailinator (http://www.mailinator.com), which allows you to check all mail received at any username@mailinator.com-style address received in the past few hours.
Defeating Web bugs and JavaScript
Many modern mail programs incorporate Web browser functionality, such as the display of HTML and images. This can easily expose the user to pornographic or otherwise offensive images in spam. In addition, spam written in HTML can contain JavaScript programs to direct the user's Web browser to an advertised page, or to make the spam message difficult or impossible to close or delete. In some cases, spam messages have contained attacks upon security vulnerabilities in the HTML renderer, using these holes to install spyware. (Some computer viruses are borne by the same mechanisms.) Also, the images can be used to find out whether a spam message is actually read and seen by a user.Users can defend against these methods by using mail clients which do not automatically display HTML, images or attachments, or by configuring their clients not to display these by default.
Avoiding responding to spam
It is well established that some spammers regard responses to their messages -- even responses which say "Don't spam me" -- as confirmation that an email address refers validly to a reader. Likewise, many spam messages contain Web links or addresses which the user is directed to follow to be removed from the spammer's mailing list. In several cases, spam-fighters have tested these links and addresses and confirmed that they do not lead to the recipient address's removal -- if anything, they lead to more spam.In Usenet, it is widely considered even more important to avoid responding to spam. Many ISP have software that seeks out and destroys duplicate messages. Often someone sees a spam and responds to it before it's cancelled by their server. This can have the effect of reposting the spammer's spam for them... and since it's not just a duplicate, this reposted copy will actually last longer.
In late 2003, the FCC launched a public relations campaign to encourage email users to simply never respond to a spam email -- ever. This campaign stemmed from the tendency of casual email users to reply to spam, in order to complain about the spam and ask the spammer to stop sending spam. This has the effect of alerting spammers to the existence of a person who actually reads spam email, and it has the effect of increasing spam rather than stopping it.
Reporting spam
The majority of ISPs explicitly forbid their users from spamming, and eject from their service users who are found to have spammed. Tracking down a spammer's ISP and reporting the offense often leads to the spammer's service being terminated. Unfortunately, it can be difficult to track down the spammer -- and while there are some online tools to assist, they are not always accurate.Two such online tools are SpamCop and Network Abuse Clearinghouse. Both provide automated or semi-automated means to report spam to ISPs. Some spam-fighters regard them as inaccurate compared to what an expert in the email system can do; however, most email users are not experts.
Defense against email worms
In the past several years, scores of worm programs have used email systems as a conduit for infection. The worm program transmits itself in an email message, usually as a MIME attachment. In order to infect a computer, the executable worm attachment must be opened. In almost all cases, this means the user must click on the attachment. The worm also requires a software environment compatible with its programming.Email users can defend against worms in a number of ways, including:
- Avoiding email client software which supports executable attachments. The most frequently-targeted client software for email worms is Microsoft Outlook and Outlook Express, both of which can easily be made to open executable attachments. However, other Windows-based email software is not immune to worms.
- Using an operating system which does not provide an environment compatible with present worms. Essentially all current email worms affect only the Microsoft Windows operating system. They cannot execute on Macintosh, Unix, Linux, or other operating systems. In some cases, it is conceivable that a worm could be written for one of these systems; however, various security features militate against it.
- Using up-to-date anti-virus software to detect incoming worms and quarantine or delete them before they can take effect.
- Being skeptical of unsolicited email attachments. Since worms and other email-borne malware arrive in this form, some email users simply refuse to open attachments that the sender has not given them advance notice of.
Building lasting marketing bridges between business and the Internet
Internet strategic planning / Search Engine Marketing / Advertising and PR media / Interactive business communities / International marketing / New product introduction / Internet risk management consulting / custom application integration / automated E-Commerce and E-Business solutions / Interactive website solutions / start up website templates / Website development consulting / Custom programming solutions
