In a previous post we discussed artificial intelligence and machine learning, and how these emerging technologies are augmenting existing email security solutions to complete our defenses against email attacks. Now we enter the mode of using AI for defense: machine learning algorithms.


There is no solution, vendor or organization today which does not refer to artificial intelligence (AI) and its most sought-after application, machine learning, in its strategy. But most announcements and marketing slogans are little more than “AI-washing” existing technologies; in reality, these tools are more similar to analytical solutions than to intelligence, that is, the ability to automate behavior or processes. Little of this is applied to email security, where AI is used by only a few vendors who know how to effectively harness its potential to fight against the greatest threats to enterprises: email-based phishing, business email compromise, malware and ransomware attacks.


In general, email defense is stuck in the past, and difficult to evolve when faced with email where the protocols are not well-secured, and with traditional solutions used to protect them, resting mainly on simple rules, address lists and identified attack signatures. In addition, hackers continue to improve their methods of attack. To protect oneself today, one must be reactive to rapidly detect waves of attacks which are becoming ever-more sophisticated. They must also be predictive to find a threat’s assumptions and anticipate an attack. That’s where AI and ML become very useful, as complements to existing capabilities.


How AI functions when it protects emails

Classic AI uses algorithms, rules and instructions which are often statistical, which describe issues and are involved in resolving them. To detect and separate infected emails from healthy ones, algorithms are applied to secure emails, using well-known rules from traditional solutions. Due to its self-learning capabilities, ML can call on enormous volumes of data coming from protected messaging systems and processed in Big Data mode. It can compare events (emails or waves of emails) to detect changes, particularly those that could potentially hide a threat. Using these data, and those provided by administrators, ML then assists in creating new rules to feed the threat knowledge database.


Two methods are used from the mass of algorithms used by those rare email protection solution vendors who can develop and deploy them:

– Supervised algorithms

If AI algorithms are generally very complex, the principles of this type of algorithm are ‘simple’: the publisher who knows the nature of the threats defines decision models which will continue to be fed by training the AI using both healthy and malicious emails. These data, which are verified and validated manually by operators, are transformed into a body of specific characteristic vectors, or features, for the threat to be detected; these are used to create a model and determine a result depending upon the class of email. This is how ML can learn and reproduce procedures to qualify threats. Note that human expertise is still required to verify the characteristics that one hopes to detect, the results obtained, to monitor and supervise algorithms, to work on the precision of results, and to update the data.

– Unsupervised algorithms

These algorithms learn data to identify new threats. Phishing emails and malware attack in waves, changing regularly using small implementation changes. These are content modifications in emails or code for polymorphic malware which can trick