AI can spot a face, but not a vibe


AI might ace a math test, but it still can’t tell if two people are flirting or about to get into a fistfight.

Artificial intelligence has come a long way. It can write your emails, suggest your next binge-watch, and even pretend to flirt with you on dating apps. But ask it to interpret a three-second video of humans interacting, and it’s suddenly out of its depth.

That’s the takeaway from new research out of Johns Hopkins University, which finds that humans are still way better than machines at reading social cues – especially when those cues are moving.

ADVERTISEMENT

The study, led by cognitive science professor Leyla Isik, paints a sobering picture for anyone hoping that AI will smoothly integrate into the real world anytime soon.

“AI for a self-driving car, for example, would need to recognize the intentions, goals, and actions of human drivers and pedestrians,” Isik explained.

“You would want it to know which way a pedestrian is about to start walking, or whether two people are in conversation versus about to cross the street.”

In other words, AI doesn’t just need eyes – it needs intuition. And right now, it doesn’t have it.

vilius Ernestas Naprys Gintaras Radauskas Paulina Okunyte
Don’t miss our latest stories on Google News

The team had human participants watch three-second clips of people doing everything from chatting to walking side-by-side, then asked them to rate how socially interactive the scenes were. Unsurprisingly, humans mostly agreed on what was going on. Then they ran the same test on over 350 different AI models – including some of the biggest and best – and watched them flail.

Language models, which evaluated short captions, did slightly better than their video-analyzing counterparts. But none of the models – no matter how many GPUs or billions of parameters they had – could consistently match human interpretations.

“Any time you want an AI to interact with humans, you want it to be able to recognize what people are doing,” Isik said.

ADVERTISEMENT

“I think this sheds light on the fact that these systems can't right now.”

Kathy Garcia, a doctoral student in Isik’s lab and co-first author of the study, said this shows a glaring gap in how AI systems are being trained.

“It’s not enough to just see an image and recognize objects and faces. That was the first step, which took us a long way in AI,” she said.

“But real life isn’t static. We need AI to understand the story that is unfolding in a scene.”

So why can’t AI keep up? According to the researchers, today’s AI systems are modeled after the parts of the brain that process static visuals – think snapshots, not stories.

That works great when you want an algorithm to recognize a dog in a photo. It works a lot less well when the goal is figuring out if that dog is playing fetch with a kid or getting ready to bite someone.

“There are a lot of nuances, but the big takeaway is none of the AI models can match human brain and behavior responses to scenes across the board, like they do for static scenes,” Isik said.

“I think there’s something fundamental about the way humans are processing scenes that these models are missing.”

ADVERTISEMENT