Remote Images Are Pushing Email Filters to Their Limits

Phishing emails impersonating popular brands like Microsoft or PayPal need visual content to be successful. From brand logos to colorful pictures, images provide a visual cue to the recipient that the email is harmless and legitimate.

But images do more than simply add a visual element of legitimacy to otherwise fraudulent emails: they also make the job of filtering emails much harder. Image spam has always been a quite popular method of circumventing an email's textual content analysis, as there is no relevant content that can be extracted from the text email parts; indeed, the textual content is in the image. Below is an example of SunTrust phishing email: the email does not contain textual content, but a single large embedded image which mimics legitimate HTML content.

Sun Trust

If the detection of identical images is relatively easy—thanks to signatures based on cryptographic hashing algorithms such as MD5—the detection of similar images requires complex and costly algorithms. Indeed, to evade detection, phishers manipulate the images slightly, adjusting the compression level, colorimetry, or geometry to bypass email filters. Their goal is to make each image unique to circumvent signature-based technologies. Below is an example of an Alibaba logo that has been altered but that remains identifiable by the end user.


As the technique grows in popularity among phishers, email security vendors have improved their ability to extract and analyze content from images. As a result, phishers have found a new way to deceive them.

Remote images

Remote images have emerged as the latest filter bypassing technique by hackers looking to exploit weaknesses in email security technology. Unlike embedded images, which can be analyzed in real time by email filters, remote images are hosted on the web and thus need to be fetched before being analyzed. In 2020, use of remote image-based threats surged. In November 2020 alone, Vade Secure analyzed 26.2 million remote images and blocked 262 million emails featuring malicious remote images.


Analyzing a remote image requires fetching it over a network. Capitalizing on this weakness, cybercriminals use additional techniques to make the process more cumbersome for security scanners, such as:

  • Multiple redirections
  • Cloaking techniques
  • Abuse of high-reputation domains

By using multiple redirections, the time to identify a phishing attack is prolonged. The use of JavaScript is also common so that it is necessary for security vendors to use state of the art web crawlers that are costlier and more difficult to scale.

Cloaking techniques may also be used to ensure that it is the intended victim that is fetching the image and not a security vendor. For example, a phishing campaign targeting customers of a Canadian bank may only deliver the malicious content to web connections originating from Canada.

Additionally, hosting remote images on high-reputation websites renders domain reputation-based detection ineffective. From Wikipedia to Github, websites with high domain and trust scores are continually exploited by cybercriminals.

The result is that many of these emails go undetected. For users, this often means receiving a phishing email and reporting it, only to receive it again, and in some cases, multiple times.

Blocking remote image-based threats

The process of blocking image-based threats requires Computer Vision, a scientific field which deals with how computers can gain high-level understanding of visual content. Vade Secure implemented a first Computer Vision technology based on Deep Learning models (VGG-16, ResNet) in early 2020 to detect brand logos in emails and websites.

The Deep Learning models have been trained on a combination of collected images and artificially generated images. The use of artificially generated images is crucial to ensure that our technology is resilient to the different techniques used by cybercriminals and also to unexpected visual configurations (Different background, different size and position of the logo). Below is an example of such an image.

Since then, we have leveraged OCR (Optical Character Recognition) combined with NLP (Natural Language Processing) models to detect malicious textual content in images. Below are several examples of malicious remote images blocked by Vade Secure.

Remote images

It is worth mentioning that we have trained several NLP models to detect threats in different languages, such as English, German, or Italian. More and more cyberthreats are localized, thus it is necessary to develop several NLP models to achieve maximal filtering accuracy.

Preparing for emerging phishing techniques

As AI and Computer Vision become more prominent in email security, cybercriminals are being forced to innovate, and they are answering that call. For every detection method that is developed, cybercriminals are following closely behind and developing new phishing techniques to evade detection.

Image manipulation and remote images will grow in both prominence and sophistication due to the limited ability of most solutions to analyze images. Cybercriminals are known for researching their targets—a quick search for a business’s MX record will reveal the email security solution protecting the business’s email. With this information in hand, they will learn to break through.

Vade Secure for Microsoft 365 is integrated with Microsoft via API—invisible in an MX search. This gives us an advantage over cybercriminals who are looking for weaknesses in protection. This, along with our continued research and investments in Computer Vision technology, allows us to identify malicious emails that other solutions would miss and prepare for new image techniques that will emerge.