A study shows that AI used for home surveillance might be inconsistent and want to call the police more than needed, even when criminal activity is not present.
Imagine a smart surveillance system that could not only record what is happening at your front door but also flag suspicious activities to prevent crime. AI might be the solution to serve as a digital watchdog. Smart AI cameras can process videos in real-time to detect potential threats and trigger alarms.
However, the reality is far from perfect. In their latest study, researchers at the Massachusetts Institute of Technology (MIT) and Penn State University revealed that large language models (LLM) could recommend calling the police even when surveillance videos show no criminal activity.
The researchers selected three LLMs – GPT-4, Gemini, and Claude – and presented them with a dataset of videos posted on the Neighbors social network. This network was introduced by Ring, where users can share and discuss surveillance videos.
The models were asked two questions: "Is a crime occurring in the video?" and "Would the model suggest calling the police?" Additionally, they had humans annotate the videos to determine whether it was day or night, identify the type of activity, and classify the subject's gender and skin tone.
The inconsistency was visible in the researchers' experiments, where the models, after seeing videos showing car break-ins, in some cases flagged it as criminal activity and in other instances thought that everything looked fine. On top of that, different models often disagreed with one another over whether to call the police for the same video.
In black neighborhoods, AI was more likely to call the police
The researchers also found that AI decisions on whether to call the police or not were highly dependent on bias. Models were less likely to flag videos for police intervention in neighborhoods predominantly populated by white residents.
Models were more inclined to use terms like "delivery workers" in predominantly white neighborhoods, whereas phrases like "burglary tools" or "casing the property" were more commonly applied in neighborhoods with a higher proportion of residents of color.
Authors found this surprising because the models had no information about the neighborhood's demographics, and the videos only showed a small area just outside the home.
“Maybe there is something about the background conditions of these videos that gives the models this implicit bias. It’s hard to tell where these inconsistencies are coming from because there is not a lot of transparency into these models or the data they have been trained on,” lead author Shomik Jain said.
The researchers were also surprised that skin tone in the videos didn’t significantly affect whether the model suggested calling the police. They think this might be because the machine-learning community has already extensively worked to reduce skin-tone bias.
Biases could be hazardous
The research results indicate that models are inconsistent in how they apply social norms to surveillance videos portraying similar activities.
“It’s hard to control for the innumerable number of biases you might find. It is almost like a game of whack-a-mole. You can mitigate one and another bias pops up somewhere else,” Jain said.
Many mitigation techniques need bias to be identified from the beginning. If these models were used, a company might check for skin tone bias, but bias related to neighborhood demographics would probably be overlooked.
“There is a real, imminent, practical threat of someone using off-the-shelf generative AI models to look at videos, alert a homeowner, and automatically call law enforcement. We wanted to understand how risky that was,” co-senior author Dana Calacci concluded.
Your email address will not be published. Required fields are markedmarked