An Update on Logo Detection Technology

Web scanners analyze dozens of webpage elements to determine whether it is a phishing page, including the URL, page structure, and CSS. What many scanners are not equipped to analyze are images. 

Trained to view webpages and emails as humans see them, Vade Computer Vision technology analyzes images to extract relevant features used in phishing attacks, such as brand logos, QR codes, and suspicious textual content. The Computer Vision technology is just one component of our anti-phishing technology and an added layer of protection against highly sophisticated phishing attacks. 

Building on the growing capabilities of our Computer Vision technology, we have updated our Deep Learning-based Logo Detection technology. A proprietary Active Learning algorithm has been implemented to ensure that labeling cost and Deep Learning models performance are optimized. New brands logos, including Adobe, Citibank, Dejardins, Instagram, and WeTransfer, are now supported by the VGG-16 and ResNet models.  

Logo Detection now detects more than 60 brands, including Microsoft, PayPal, Facebook, and eBay. Our Logo Detection technology can detect small size and altered logos, whereas similar Deep Learning-based technologies may fail to detect those, as they have not been trained specifically on electronic documents, such as graphical renderings of webpages and emails. We will continue to improve the technology and support other brands as they grow in popularity among scammers. 

Background 

Image manipulation, which includes blurring and minute changes to color or geometry, is an increasingly popular technique. Even the slightest modification to an image changes the image’s cryptographic hash, confusing filters that rely on signature and statistical-based technology and easily bypassing blacklists. 

Below is an example of image manipulation: a Microsoft phishing email with a modified logo. To avoid detection, the hacker has placed the Microsoft logo on a colored background, changing the image’s signature. 

logo-detection-1

To build a Logo Detection technology resilient to such changes, Vade research team leverages image augmentation and image generation techniques. Below is an example of an image with a logo in an unexpected configuration—position, background—that has been generated. 

logo-detection-2

It ensures that the Deep Learning models will recognize logos regardless of their position, their size, the background, and image manipulation techniques mentioned previously. 

Scammers are increasingly using images to bypass traditional email filters. The trend now is to send emails that contain only a link to an image, and this image is the graphical rendering of the HTML content. To address the challenge of remote images, Vade developed RIANA (Remote Image ANAlysis). RIANA uses Optical Character Recognition (OCR—a Computer Vision technology—to extract text from images and then applies Natural Language Processing models in English, French, Dutch, German, and other languages to detect suspicious textual content. Below are some recent images blocked by RIANA. 

logo-detection-3

For some context on the scale of the challenges presented by remote images, RIANA has blocked 500 million remote images in the last 90 days.  

Computer Vision provides reinforcements for highly sophisticated attacks that rely on images to evade detection. Vade has made significant investments in Computer Vision technology and will continue to explore additional uses for the technology.