Voice messages may be a new frontier for cybercriminals

Voice messages are on the rise, and bad actors are ready to use them to their advantage.

The popularity of audio content has been steadily growing for quite some time. Audiobooks have become a multi-billion dollar industry, major music streaming platforms have expanded their podcast offerings, and startups have launched several audio-dedicated social networks, the most recent one being Airchat.

There are multiple reports indicating that voice messages are becoming more popular, especially among the younger generation. In one study, 41% of respondents said that they noticed an increase in voice notes in recent years, with 84% of Gen Z saying they are utilizing voice messages.

As the popularity of voice notes grows, malicious actors are trying to use this trend to their advantage.

The first instance of AI-generated audio deepfake successfully used in a scam is thought to have happened in 2019, when scammers impersonated the CEO of a UK-based energy firm, tricking him into sending $243,000.

Since then, there have been multiple attempts to employ audio deepfakes in various fraudulent schemes.

Overall, in Q1, the amount of audio and video deepfakes combined rose 245% over the previous year, with the US being among the countries with the most deepfakes detected, identity verification and deepfake solution provider Subsub estimates.

With elections happening in the US this year, we are likely to see more examples of this technology being used for malicious purposes.

Superpowers for bad actors

Audio deepfakes are increasingly used in cyberattacks, especially those related to identity, says Aaron Painter, the CEO of security solutions provider Nametag.

Account takeover attacks are especially common, as they are the highest value for the bad actor.

"Taking over someone's account lets you control whatever that account has access to. For an employee account, this means planting ransomware or accessing sensitive company data. For a customer account, you could hijack social media or bank accounts. Deepfakes make it easier. They are superpowers for bad actors," Painter says.

An example of an account takeover attack is a SIM-swap attack. A bad actor tries to transfer a legitimate owner's phone number to a fraudster's SIM card. A successful attempt opens the door to receiving codes from banks and other financial institutions, possibly inflicting financial damage.

According to Painter, we are likely to see more examples of account takeover attacks accompanied by audio deepfakes in the future.

Another area where he expects audio deepfakes to be widely used is impersonating other people, possibly even causing political tension between nations.

The main reason audio deepfakes are on the rise is the rapid evolution of technologies used to create them.

In 2020, platforms like Descript required 20 minutes of script to produce an audio deepfake, while now only a few seconds of podcast recording will be enough to do a job for a malicious actor, Painter points out.

"If you're trying to trick an advanced voice biometric system, then you might need a higher quality. But you don't necessarily always need to have very high quality in order to be successful in your attack."

Expect spikes in malware attacks

Roman Zrazhevskiy, CEO of MIRA Safety, expects bad actors to use the growing trend of voice messaging.

In the past, we saw the rise of text-based phishing, which came on the heels of email fraud, which replaced traditional phone call and voicemail-based schemes. Zrazhevskiy believes that the next wave of cybercriminal activity will begin with voice messaging.

According to him, basic fraudsters will try to text or voice memo back things like account passwords, credit cards, banking info, or even risky location details.

"More advanced criminals will go deeper, though, likely trying to impersonate those within your circle for an added layer of trust and urgency. Often, these schemes look to extort money or financial information," Zrazhevskiy says.

"Though we'll also likely see spikes in malware attacks, likely driven by victims prompted by their voice notes to download some app they thought a friend recommended via voice message and followed up with a direct link.

He points out that the real issue is the availability of deepfake technology. It doesn't take a particularly advanced tech specialist to upload a few audio files and then prompt AI-backed generators to create a similar audio clip.

Younger users less aware of the risks

Jason Glassberg, the co-founder of Casaba Security, also expects the next wave of cyberattacks to exploit habits of sending voice notes.

"Keep in mind, most people – especially those who are younger – are now well aware of the risks of phishing, smishing, and even conversation hijacking in the context of text-based exchanges," he says. "They're more likely to be skeptical when something seems off or not quite right in a written message, such as a financial request. But a voice message is different. It's more convincing,"

Glassberg anticipates audio deepfakes complementing every type of malicious attack, from more elaborate stock shorting and pump-and-dump stock schemes to virtual romance scams or virtual kidnapping.

Another area where audio deepfakes may cause significant risks is influencing court cases, says Michael Hess, cybersecurity expert and senior analyst at Code Signing Store

"Consider a situation in which a hacker creates a plausible audio recording of a crucial witness in a court case by using deepfake technology. This could undermine the legal system by influencing the trial's verdict," he explains.

How do you detect an audio deepfake?

As audio deepfakes evolve, so do the tools for detecting them. Many researchers are utilizing the latest advancements in AI to spot fake content. However, bad actors are outpacing detection.

"The problem is that it's an arms race or a cat-and-mouse game. It's AI versus AI. Someone is always going to be slightly ahead. And more often today, it's the bad actor who is one step ahead. The bad actor then is using slightly better AI technology than the detector," Painter says

According to him, the best approach in determining if an audio is a deepfake is to evaluate the context, such as considering who the sender is and what channel is being used.

One should be extra cautious about recordings in large group messages. If it's a one-on-one chat, it might be worth reaching out to the person via a different communication channel to verify its authenticity.

Glassberg outlines several key methods for determining if an audio deepfake. Besides noticing edits or unnatural noises, it may be helpful to pay attention to breathing, as many deepfake voices do not breathe. Other indicators of a deepfake might include out-of-character remarks.