
What methods can be used to detect and remove Spam?
Due to the increasing ingenuity of spammers, it now requires a number of methods to be utilised when attempting to both detect and block the various types of SPAM that are in-the-wild and being developed. This is an ongoing battle between the spammers and the anti-SPAM industry with no single ‘golden bullet’ currently available to solve it.
The following are current methods available to be utilised for the detection of SPAM:
A simple and quite effective method to block the most basic forms of spamming is to carry out reverse lookups of addresses that are being used to connect to the e-mail gateways. This will ensure that the address of the sending gateway is valid for the domain that the message is purported to be coming from but can have some impact on the performance of the SMTP gateway where large volumes of messages are received.
Reverse DNS Checks
Lists of addresses that are known to generate SPAM can be generated to be used as a blacklist or a whitelist can be generated for addresses that are trusted. This has a very high administration overhead if administered centrally and can have a much higher possibility of false-positives (valid messages that are identified as SPAM). Some products allow end-users to maintain their own blacklists and Whitelists which pushes the administration requirements out to the individual.
Sender Blacklists/ Whitelists
A number of free sites, such as spamhaus and spamcop, exist on the Internet that keep databases of sites which are susceptible to being originators of SPAM. Although they are free, they can be very effective in blocking 70-80% of SPAM but can be subject to a number of false-positives. Most SMTP gateways offer an option to use these sites to carry out lookups of mail being received, and actions can be carried out against these messages such as rejecting or controlling the message depending on the features available.
Free Real-time Blacklists
Commercial real-time blacklist sites, such as MAPS who are now owned by the anti-virus company Trend Micro, have various types of blacklist that can be utilised. In addition to the standard blacklists of sites with open-relays etc that are provided by the free services, they utilise supplementary blacklist such as lists of non-commercial ADSL addresses that should not host SMTP gateways.
Subscription Real-time Blacklists
In the earlier days of spamming, many messages contained standard words or phrases that could be identified through content checking. The content of SPAM messages have now become more complex and in many cases is generated randomly or contains words that are obfuscated so that they are not easily read by anything other than the human eye. Techniques have evolved and many products now use a combination of rules-based (otherwise known as Bayesian where mathematical calculations are used to provide a probability rating for SPAM based on the content of the message) and static methods to identify text.
Content Filtering
Many providers of SMTP gateway products provide subscriptions to download SPAM signature databases that can be utilised to check for know SPAM that they have previously identified. They will usually use a combination of ‘honey pot’ mail accounts and reports from customers to populate these databases with signatures for SPAM that is in the wild. These signatures provide a very accurate method of detecting known SPAM with static contents and has a low probability of false-positives.
Signature based
To protect against directory harvesting / dictionary attacks, the SMTP gateway can monitor connections that are generating messages to a large number of invalid addresses and terminate the session. In many cases, if a single session sends to more than five invalid addresses there is a good probability that this is a directory harvesting attack. By dropping the connections it would take a long time for a spammer to obtain any useful information about the addresses within the organisation and they will not want to expend the effort. Some products can also use a method known as tar-pitting which will slow down reception of messages through suspect sessions, which usually results in the spammer giving up.
Directory Harvesting Detection
This is a fairly new method and the effectiveness is being debated by many companies. Sender and recipient details of a message are checked to identify whether it has been received into the organisation before and a database of trusted combinations is built up. An initial connection from a new sender/recipient combination will be rejected and it will only be accepted on the second attempt. This follows the assumption that SPAM messages are sent using send-and-forget so will never be retried, whereas a legitimate system will always retry.
In theory, this should build up a profile of real e-mail to the organisation and reject a large amount of SPAM. The main concerns are over delays in message delivery so many vendors have not yet implemented the method or will provide it as an option.
Greylisting
An increasing amount of SPAM utilises spoofing of the sender address so that it appears to the recipient and e-mail gateway as though the message originated from an internal source so would be more trusted. Many products include anti-spoofing functionality so that these type of messages can be either rejected or controlled.
Message spoofing detection