The Difference Between True AI and Marketing B.S. in Cybersecurity
July 11, 2019—
4 min read
Depictions of artificial intelligence (AI) in film and television have influenced our perception of AI as futuristic and even, at times, inexplicable. All this confusion is good for cybersecurity companies that want to market their products as being powered by artificial intelligence (AI). For business buyers navigating a market flooded by claims of AI, it can be difficult to tell what’s real AI and what’s marketing B.S.
Misleading Claims Lead to Disappointing Results
If you’re a science fiction fan, AI is everything from predictive policing gone wrong to machine-driven, end-of-days war. In the real world, machines are being trained for more practical matters. In cybersecurity, AI is applied to threat prediction, monitoring, analysis, and remediation, with various subsets of AI, including machine learning and deep learning, doing the heavy lifting. To the average buyer, none of that makes sense without a long explanation they don’t have time to hear, and that’s what a lot of vendors are hoping for.
Because the average person doesn’t have a technical understanding of the subsets of AI, it’s easy to be misled about what a product is and isn’t. “A lot of cybersecurity companies are pretending to do AI,” says Sébastien Goutal, Chief Scientist at Vade, “but it’s just a rebranding of their core technology—which probably isn’t AI.”
Buyers who are sucked in by promises of AI only to discover that they’re paying for a platform with the predictive capabilities of a chatbot will no doubt find themselves disappointed with the results.
What AI Is and Isn’t
AI is more of a marketing term than a technology—an umbrella term that doesn’t describe what your technology is doing, says Goutal. The term we should be using is Machine Learning.
A subset of AI, Machine Learning trains models with data. In email security, the models are fed data from both legitimate and illegitimate emails—non-threats and threats. They learn to recognize patterns in both, including the content of normal business correspondence vs the urgent financial nature of spear phishing emails; clean URLs found in brand emails vs malicious URLs found in phishing emails; legitimate email addresses vs spoofed email addresses; and more.
What AI is not is a silver bullet. Vendors who claim that their AI will catch 100 percent of threats are disingenuous at best, because no system is capable of catching all threats. Machine Learning models do make mistakes, but models can be retrained to learn from those mistakes and adapt.
Assessing an AI-based Solution
To distinguish between marketing speak and technical reality, buyers need to ask the right questions. One question to ask when searching for an AI-based email security product, Goutal says, is, “Is it Supervised Learning or Unsupervised Learning?”
Supervised Learning requires a trainer, typically a data scientist or a threat analyst, to label the data as malicious or legitimate and continually train the models. In one example, a predetermined set of email features are computed and compared with the trained model. “If trained correctly,” Goutal says, “Supervised Learning models are able to generalize, which means they are able to detect unknown attacks. This is important because the threat landscape is constantly moving.” In the case of Supervised Learning, you should also ask a vendor what features the model analyzes and whether the vendor applies Feature Selection. Some vendors claim to analyze thousands of features. This is overkill, according to Goutal. “Quality is more important than quantity.”
Unsupervised Learning can detect anomalies to identify malicious emails. Unlike Supervised Learning models, Unsupervised Learning algorithms do not require data labeled by an expert, but can detect rare events that differ significantly from the majority of the data and that are suspicious. Unsupervised Learning is particularly skillful at identifying spear phishing emails, such as financially motivated requests, as these attacks are anomalies in the targeted organization’s inbound email traffic.
Deep Learning is another subset of AI that is making the rounds in cybersecurity, and it’s possibly less understood than the previous subsets because it’s so complex. Unlike Machine Learning, Deep Learning relies specifically on Artificial Neural Network and requires a very large amount of data to outperform traditional Machine Learning algorithms. Deep Learning is particularly suitable for Computer Vision. In email security, it’s used to identify fraudulent brand images inserted in phishing emails and webpages.
Although a lot of cybersecurity companies claim to use Deep Learning, it’s not well adapted to non-stationary problems, Goutal says, which is the essence of cybersecurity. Deep Learning models are also costly to produce, further raising the bar to entry. Ask whether a vendor’s Deep Learning model is pre-trained and with which dataset. This will provide some insight into its capabilities. Using a pre-trained model is common, but it should be repurposed with Transfer Learning to receive additional training.
Finally, ask about the origins of the data itself, including “What is the source of your data?” and “What is the size of your dataset?” The answers will reveal a lot about the product. Machine Learning models need to be trained with substantial amounts of data, Goutal says. This is a critical differentiator when comparing email security vendors who use—or claim to use—AI. Machine Learning models can only learn from the data they’re fed. If the data is not reliable, or if the dataset is too small, the models will be ineffective.
Additionally, as new threats are discovered daily and as the tactics of cybercriminals are always changing, data needs to be constantly updated. How often does the vendor update their dataset? Every minute? Every week? If the data is stale, the model will miss threats. Vade protects more than 600 million mailboxes, including the world’s largest ISPs. This data feeds our dataset and continually trains our models.
The Ultimate AI Is More Than AI
When looking for an AI-based email security product, the AI should be combined with other technologies to create a performant solution. Supervised Learning, for example, is built to generalize, so it won’t detect an outlier. It has to be combined with another mechanism because there’s no perfect technology. “Cybersecurity is about stacking different technologies,” says Goutal. “An AI layer has to be combined with other more traditional security layers.”
Asking the right questions—and knowing some of the answers—is key to choosing an AI-based solution that is more than a marketing ploy or clever rebrand. If you’re an MSP considering a solution for your customers, you’ve probably seen your share of imitators. “There’s a joke in the data science community,” says Goutal: “The difference between Machine Learning and AI is that if it’s written in Python, it’s probably Machine Learning; if it’s written in PowerPoint, it’s probably AI.”