Home Text Only Quick link to main content

Home | Services | Events | Features | Interviews | Profiles | Reviews | News | Resources | Press | Archive

Posted: Wed, September 19, 2007

The rise of PDF spam

by Nick Johnston

Spammers are known to be highly creative and versatile in their attempts to bypass spam filters. For years, image spam has been very popular, with spammers using a variety of different techniques to randomise their images, making detection more difficult. As both MessageLabs and the wider anti-spam community have improved their image processing techniques, spammers are increasingly switching to a new format: PDF


The beginning of PDF spam

PDF (Portable Document Format) is a popular document format invented by Adobe Systems, and is widely used for document exchange in the business world. As such, it is a "trusted" format, and many naïve anti-spam solutions automatically whitelist all messages containing a PDF file. Such is the importance and general acceptance of PDF in the business world that practically all computers in a corporate environment will have a PDF viewer installed. This makes PDF an excellent "vector" for spam messages.

MessageLabs first saw large-scale PDF spam in the middle of June 2007. This "spam run" or "campaign" was a "pump and dump" scam promoting a German stock. Many new types of spam start primitively, and PDF spam was no exception. This first spam run included exactly the same document in each message, making it easy to stop the messages using hashes or "fingerprints".


PDF Spam example 1


PDF spam evolves quickly

Soon after seeing the first major PDF spam run, MessageLabs began seeing more. But this time, each message had a different PDF file attached. Spammers have long had the ability to randomise images, and have now updated their botnet software to simply insert these random images into PDF documents. This technique means that each PDF that a spammer sends out will be different, and will be more difficult to stop.


Randomised PDFs

The images below (taken from randomised PDFs) illustrate this concept well. Each image includes exactly the same text, but the shape of the image is different, as are the colours:


Randomised PDF Spam 1

Randomised PDF Spam 2

Randomised PDF Spam 3


In contrast to legitimate business PDF files, PDF files from this randomised spam run do not use standard paper sizes such as A4 or Letter. For example, one document might be 74.4 × 96 mm, and another might be 168.3 × 54.7 mm.


Corrupted files to avoid detection

Many images in image spam were deliberately corrupted - in other words, the images were constructed without complying with the appropriate specification or standard. By corrupting files, spammers make it more difficult for the analysis tools used by the anti-spam companies to open and analyse the images. Some computer programs would fail to process such images, and indeed these images could cause some programs to become much slower, use more resources or crash. However, spammers rely on the fact that other applications (like many common email clients) are more forgiving and display the images without problems.

MessageLabs has seen similar tactics employed with PDF spam, detecting many corrupted PDF documents. It's unclear if this corruption is accidental or deliberate, but as with corrupt images, strict processing programs tend to fail on these PDF files and so analysis and identification becomes more difficult. The messages can still be viewed by the recipients though because Adobe's Acrobat Reader displays the PDF correctly (by rebuilding part of the PDF document's internal structure). Some older versions of Acrobat Reader briefly display a dialog box telling the user that the file is damaged and is being repaired, but this requires no interaction from the user.


Variable length PDF files

A more recent tactic seen by MessageLabs is the use of variable length PDF documents. Until recently, most PDF documents sent in spam were simple, single page documents. In contrast, with variable length PDF spam, the first half or so of the first page includes the spam message, and the rest of the page and a random number of subsequent pages contain text "poison".

This poison is designed to foil statistical anti-spam techniques such as Bayes. We have seen spam PDF documents containing up to 14 pages of poison. The poison can be random words, programmatically-generated "nonsense text" or legitimate text "scraped" off popular web sites. Some examples of this text include:

    But in light of their back-stabbing, Artificial
    Intelligence-inspired offenses and their sinister,
    temptation-ridden environment this response is degenerate.

    Ships from and sold by Amazon.

    I also had my tripod and took several amazing long
    exposure shots of the interior.

It is likely that spammers think longer PDF documents are more likely to be considered legitimate business documents like reports, manuals and so on.


PDF spam diversity

Most press coverage around PDF spam has solely concentrated on "pump-and-dump" stock spam. MessageLabs has seen PDF documents used in other types of spam, such as pharmaceutical spam and online casino spam. Recent examples include:


PDF Spam example 2


PDF Spam example 3


PDF spam construction

Spammers are using a wide variety of tools to produce their PDF documents. Many tools include their name as the document "producer" or "creator" in the PDF file itself. Some spammers are using common office applications such as Microsoft Word and OpenOffice:

    /Producer(GNU Ghostscript 7.07)
    /Creator(OpenOffice.org 1.1.4)

    /Title(Microsoft Word - sancashtemplate.doc)
    /Creator(PScript5.dll Version 5.2.2)

Some spammers have also used tools like PowerPDF, text2pdf and so on to produce their PDF documents. More recently, spammers have written their own tools to produce PDF documents. This gives them maximum flexibility, and lets them specify random "producer" names and titles which are difficult to detect by anti-spam software, for example:

    Title: One of the most interesting things about the
    present development of the automobile is the trend to
    give cars a retro look.
    Producer: For pure and simple ugly no one has been able
     to beat them


    Title: , has a new promotion that puts its money where
    its mouth is.
    Producer: The flights will be convenient for travellers
     coming from the U

Although many people are familiar with PDF documents, there are also some related formats which are comparatively unknown. Recently MessageLabs has seen spam claiming to have FDF (Forms Data Format) attachments, which also open with Adobe's Acrobat Reader. The attachments are actually PDF files merely labelled with a '.fdf' extension. This is likely to be another attempt by the spammers to bypass anti-spam software that only looks at the file extension ('.fdf' in this case), rather than doing reliable checking of the actual file.


Meet the threat

PDF spam is an increasing problem and now accounts for around 20% of spam. The damage that spam can cause any business should never be underestimated. Efficiency, productivity and profitability can all take a serious hit if electronic junk email gains access to inboxes, with valuable time and effort eaten up in identifying and deleting unwanted messages. MessageLabs stops PDF spam using several different broad techniques:

  • Skeptic® heuristics updated around the clock to ensure the highest protection possible from PDF spam
  • Automatic fingerprint-based blocking of known spam PDF files
  • Honeypot monitoring systems for identifying new PDF spam runs
  • Tools to detect corrupted PDF files
  • Generic approaches such as IP blacklisting


About the Author
Nick Johnston is with Anti-Spam Development at MessageLabs. The company offers a managed anti-spam service, allowing customers to benefit from seamless, continual system improvement. Combined with 24 hours per day, 7 days a week, 365 days a year operations and development teams, ensuring that MessageLabs customers are always protected against PDF spam and other emerging spam threats.


For more information visit www.messagelabs.com.


Send a comment about this article to editor@itwales.com.





Home | Services | Events | Features | Interviews | Profiles | Reviews | News | Resources | Press | Archive
About ITWales | Privacy Policy

All material on this website ©2002-2008 ITWales
spacer

Search ITWales

Advanced Search
envelope Subscribe to
ITWales Updates
Click Here!