Typing on a Zoom call can expose your password

Deep learning models can be trained to recognize keystroke sounds and deduce which characters users are typing, researchers say. The smartphone’s inner microphone or Zoom call audio is enough to achieve over 90% accuracy.

AI’s potency to be employed in cyberattacks is seemingly endless, a recent paper from UK researchers shows. A team of scientists trained a deep learning model to match laptop keystroke sounds to characters, revealing that acoustic cyberattacks can soon become a thing.

“When trained on keystrokes recorded by a nearby phone, the classifier achieved an accuracy of 95%, the highest accuracy seen without the use of a language model. When trained on keystrokes recorded using the video-conferencing software Zoom, an accuracy of 93% was achieved, a new best for the medium,” researchers claim.

Meanwhile, Zoom responded to the paper saying users can employ built-in features to minimize the risk of their data taken whilst listening to the keyboard.

“Zoom users can also configure our background noise suppression feature to a higher setting, mute their microphone by default when joining a meeting, and mute their microphone when typing during a meeting to help keep their information more secure,” Zoom spokesperson said.

According to the researchers, even when the AI couldn’t perfectly determine what was typed, it would be off by one or two characters. Attackers could easily use the information to find the missing pieces and replicate victims’ passwords.

Researchers note that, while deciphering passwords from a keyboard is not new, previous attempts mainly focused on older mechanical keyboards that produce far more sound than a laptop keyboard used in the experiment.

To get their results, researchers used a 2021 MacBook Pro running on an M1 Pro processor, a popular off-the-shelf device. Before the experiment, the AI was trained to recognize the laptop’s keystrokes by pressing each one 25 times.

The recording was done using an integrated microphone of an iPhone 13 Mini, placed less than seven inches away. The second recording session was carried out via a built-in audio recording function of the video conferencing application Zoom. Recording using the iPhone produced better results with a staggering 95% accuracy, while a Zoom recording achieved a 93% accuracy rate.

The paper provides worrying results as attackers could use the tactic to accompany traditional password hacking techniques, Ryan McConechy, CTO of cybersecurity firm Barrier Networks, believes.

“If scientists are already achieving a 95% accuracy rate, this will excite adversaries who will now try replicate the software with even more precision. However, because the technology relies on sound, this means criminals would have to physically be close to someone typing in their password, plus the sound on their computer would have to be turned on, without headphones plugged in,” McConechy said.

The paper’s authors offer several mitigation techniques, with changing the style of typing being the most effective: the AI would find it difficult to adapt to changes in the sound of the keystrokes. Additionally, researchers said attackers would find it much more difficult to guess a password made up of random symbols instead of coherent phrases.

Other techniques involve adding randomly generated fake keystrokes to the transmitted audio, using two-factor authentication, and avoiding typing while transmitting audio.

Updated August 16 [12:55 PM GMT] with a statement from Zoom.