Combating phishing attacks using AI and machine learning technologies

Phishing attacks rank among the most significant cybersecurity threats that target individuals and enterprise environments. A study by Deloitte found that 91% of all cyber attacks begin with a phishing email.

The growth in AI technology and its wide accessibility to the public has added significant power to cyber attackers, who leverage it to craft more convincing phishing messages to victims. But it's not all bad news – AI technology can be used on the good side, too.

Defenders are fighting back by effectively discovering and preventing phishing attacks before they cause damage to finances and reputations. Let's check out some of the tools and techniques they're using to protect users from this common but increasingly sophisticated cyber threat.

What is a phishing attack?

Phishing attacks work when malicious actors send an honest-looking message pretending to originate from a legitimate entity (e.g., your bank, work colleague, or online merchants you deal with). They aim to fool the recipient into giving sensitive information, such as credit card or other account credentials, or to click on a malicious link included in the message to install malware on their system.

Cybercriminals use different mediums to send their malicious phishing messages. The most common is via email. However, more mediums, such as internet messaging, social media platform messages, and SMS text, are also used effectively.

When crafted well, phishing attacks can have devastating effects on your enterprise. For instance, many high-profile breaches began with phishing attacks, resulting in massive financial and reputational losses, such as Google and Facebook, Colonial Pipeline, and Crelan Bank incidents.

How machine learning algorithms are used to fight phishing attacks with AI

Traditional security solutions cannot stop phishing attacks completely, especially those based on zero-hours threats. Although they've succeeded in reducing the number of phishing attacks accessing enterprises' IT environments, a large volume of phishing emails can still surpass these solutions to reach end users' devices.

A study by SlashNext company found a significant increase in zero-hour threats (54% of threats detected by SlashNext in 2022 were zero-hour attacks, and 76% of them were spear phishing credential harvesting) (see Figure 1). Such attacks have not been seen before, making them more able to surpass traditional anti-phishing solutions. This study reveals that threat actors are:

  • Developing more intelligent ways to craft phishing attacks based on previous failed attempts.
  • Leveraging automation and Machine Learning (ML) technology to send large volumes of customized phishing attacks (spear-phishing) to target enterprises to increase the likelihood of infection.
  • Using three primary methods for phishing attacks: link-based attacks, malicious attachments, and natural language threats.
zero-hour attacks graph

Figure 1 - Growth in Previously Unknown Zero-Hour Attacks | Source:

Machine learning algorithms can effectively stop zero-hour phishing attacks in addition to other common cyber threats.

Detecting phishing emails

To use ML algorithms in detecting phishing attacks, they need to be trained on a large dataset of normal (honest) and phishing (suspicious) emails to learn how to detect anomalies and recognize common malicious patterns in phishing emails. There are three main methods ML used to detect phishing emails:

Social graph analysis: In this method, the enterprise built a social graph of its normal communications flow between employees; this allows it to detect abnormal communications and raise them as suspicious. For example, it is common for the marketing department to have communications with the public relations department. However, exchanging emails between accounting and the company CEO could be rare, making such communications suspicious.

Employee communication profiling: Everyone has a writing style and a specific tone when writing emails. For example, some people begin their email or end it with a specific phrase. They may use spaces within emails for clarity in a particular way; they could have a preferred sentence structure and use specific phrase choices. These idiosyncrasies could be utilized to identify whether a particular email message is sent from a specific employee. Natural Language Processing (NLP), which is a sub-type of AI, can be used to analyze written text and extract different patterns from email content to identify the writing style of a specific person.

Email structural analysis: ML can be used to analyze the technical content associated with emails to identify suspicious emails, for instance, by checking all IP addresses of the received email used to hop through them until the final destination is reached. If an email is pretending to be from Outlook (Microsoft servers) while the email source information contains header information from Gmail, this email could be spoofed or modified.

Content analysis

AI-powered systems can scan email contents to detect suspicious phishing emails and raise red flags. Phishing emails tend to use specific linguistic patterns, such as:

Sense of urgency: Try convincing the recipient to act promptly without considering the consequences. For example, act now, or your account will get suspended.

Using general greetings: Bulk phishing attacks use general greetings, not the recipient’s name, because they are sent in bulk to a large number of recipients.

Ask for sensitive information: Phishing emails request recipients to give sensitive information such as banking or other online account information.

Links to malicious websites: Phishing email commonly contains links to fake websites containing a false login page or housing exploit kits. AI systems can promptly extract embedded links (URL) and SSL certificates of websites contained within email contents and analyze them for malicious patterns. For example, some red flags ML models can use to identify suspicious phishing domain names could be a short domain name registration period and the absence of SSL certification.

Contains attachments: Phishing emails may contain malicious attachments. The most common attachments used in phishing emails are MS Office documents, PDFs, and compressed files.

In addition to linguistic analysis, AI systems can use NLP to extract and analyze entities included in the email content. For example, checking sender and subject names, company names, and locations in email content to identify potential impersonation attempts.

Image analysis

A Convolutional Neural Network (CNN), which is a type of deep learning, can be trained to identify manipulated images on phishing websites or those included within the email content. For example, threat actors may modify original photos taken from legitimate websites and post them on phishing sites to make them look genuine. CNN has an advanced image recognition capability to identify fake and manipulated images which are indistinguishable to the human eye.

Blacklist/whitelist management

A backlist is a list or database containing IP addresses and URLs known to be malicious or participated in phishing attacks. Management of the black/whitelist involves maintaining links within these systems to ensure they remain current and to enforce implementing them across the enterprise IT environment to improve phishing detection.

ML models can improve the process of managing/maintaining black/whitelists by:

  • Automatically adding suspicious websites to the backlist after analyzing suspicious URLs and inspecting other elements within coming emails (embedded URLs and sender domain names).
  • ML can also be used to add whitelisted websites automatically to the database of legitimate websites when the user trusts them.

Anomalies detection

AI-powered systems can be used to monitor user logins to sensitive resources to detect abnormal behavior instantly. For example, a user behavior, such as login geographic location (e.g., user IP address), user device type (e.g., laptop, workstation, smartphone, tablet), user device operating system (MacOS, Windows, Android), and the time of login. These attributes can be compared with normal user behaviors to detect cyberattacks like phishing attempts or suspicious email senders.

Fight evolving threats

ML algorithms can learn from normal user behavior to make their training models more efficient in detecting phishing attacks. We can also use other sources, such as user feedback and threat intelligence feeds, to make these models more accurate. This allows ML-powered security solutions to detect zero-day attacks and better respond to phishing attacks leveraging evolving techniques, which have become more common.

Machine learning algorithms provide enhanced capability to traditional security solutions that still leverage rules-based methods to detect phishing attacks. Phishing attacks are continuously evolving and developing new techniques over time. ML provides the ability to automatically identify new patterns and learn from new data, allowing detection to adapt constantly.