How to Measure the Quality of Anti-Spam

Sébastien Goutal

—

September 26, 2011

—

3 min read

An anti-spam solution is a filtering engine which detects all unsolicited e-mails. The entity hosting the anti-spam solution then decides to perform operations with such e-mails.

An Anti-spam solution is integrated and distributed in several forms:

- Desktop: this module can be installed on the e-mail software and it performs filtering at reception of e-mails.

- SaaS (Software as a Service): the filtering is performed upstream from the client's infrastructure.

- Physical or virtual appliance: physical - or virtual - equipment is placed within the client's infrastructure, upstream from the e-mail software.

- SDK: software library interconnected with the e-mail software, within the client's infrastructure.

Today an anti- spam engine can detect e-mails well beyond simple spam, i.e.:

- The solicited and therefore legitimate e-mails:

- Interpersonal messages: e-mails you exchange professionally or privately via your messaging system,

- The solicited advertising: advertising you requested,

- The notifications: notifications of e-mail not delivered (DSN), social network alerts, e-commerce sites alerts, technical alerts (firewall, webcam ...) ...

- The unsolicited and therefore illegitimate e-mails :

- The spam: unwanted e-mails sent in bulk,

- The phishing: online fraud attempts (credit card numbers theft, messaging system IDs ...)

- The scam: e-mail fraud attempts (inheritance, money transfer ...)

- The viruses,

- The unsolicited advertising: advertising you have not requested.

Today the anti-spam market has a large number of actors, each of them having their own scope of analysis. You are therefore entitled to ask the following question:

How to Measure the Quality of Anti-Spam ?

The efficiency of an anti-spam :

- False negative (FN) rate

- False positive (FP) rate

- Message throughput (i.e., messages per second).

- Reactivity time of the support to deal with a client feedback (FP or FN),

- The types of categorized e-mails.

Here is an example applied to the false positive rate:

When you want to measure a false positive rate, you should ideally evaluate the filter with your own corpus of legitimate e-mails. If you count the number of e-mails marked as "spam" and compute the ratio with the total number of e-mails, you get the false positive rate of your anti-spam solution.
Consider a corpus of 1,000 legitimate e-mails.

The tested anti-spam solution blocks an e-mail: false positive rate is therefore 0.1%. This indicator can be calculated in production condition and therefore affected to the entire e-mail flow received (which contains spam, advertising, interpersonal e-mails, notifications...). If we received one million e-mails and had a single false positive, then the false positive rate is 0.0001% i.e. 1 for 1,000,000.

How to get the corpus needed for testing purposes?

There are different ways to obtain corpuses in order to improve filtering. The e-mail flow is constantly changing, and a corpus can quickly become obsolete: it is therefore absolutely essential to renew the constitution of these corpuses. The ideal is to have a steady flow enabling to form corpuses on daily or weekly basis.

Some companies allow you to receive spam-type flow. These flows are mainly from honeypots. These honeypots are generated by leaving on the internet e-mail addresses which will be caught by spammers' harvesting robots. These harvesting robots allow them to build their addresses list for future e-mailings. The honeypots contain very few phishing, scams, or advertisements, as the latter are only sent to valid addresses to maximize the return on investment: indeed, sending these e-mails with higher cost than spam implies they are focused on valid addresses and do not use e-mailing lists generated by the harvesting robots. E-mail flows provided by these companies will therefore enable to mainly test the anti-spam filter against spam only.

There are also public corpuses such as the one proposed by TREC (Text REtrieval Conference). These corpuses are widely used - especially in academia – yet they are obsolete and often targeted to the U.S. market only. They are thus not at all representative of the reality of the e-mail flow.

The ideal is either to capture a production flow or to use client feedback via a feedback loop. These are indeed the only mechanisms enabling to have a real knowledge of the e-mail flow. The use of a feedback loop is therefore absolutely essential for determining whether advertising was solicited or not: indeed, the fact that an e-mail is solicited cannot be technically determined: only the opinion of its recipient allows to know it.

The Vade Secure Feedback loop

How does Vade benchmark its own solution?

To continuously improve its filtering engine, Vade develops a partnership with its clients by integrating in the sales contracts the ability to use e-mail flows by the laboratory of Vade for improvement purposes of the solution. Therefore, Vade has a daily flow of several millions of e-mails which allows to apprehend the e-mail issue as a whole (spam, phishing, scam, solicited or unsolicited advertising ...) and to analyze developments and trends. Thanks to continuous flows available for Vade to supply its analysis corpuses, the laboratory has a daily efficiency rate in line with the actual flow, which allows Vade to assert being qualified for e-mail flow filtering. At the time of writing this article (26/09/2011), Vade observes the following distribution of the e-mail flow:

- 4% of interpersonal e-mails and notifications

- 1% of solicited advertisements

- 4% of unsolicited advertisements

- 89% of spam

- 1% of scam and phishing

- 1% of viruses

Email by type : the distribution

The laboratory of Vade thus generates corpuses for each type of e-mails and tests the filtering engine against various threats for the end user. In this capacity, Vade has an independent filtering rate for spam and advertising to be consistent with the reality of the e-mail flow.

Sébastien GOUTAL Adrien GENDRE
Filter Lab Manager Product Manager