ChatGPT can recognize a cat. But can it recognize your cat?

Researchers at the Massachusetts Institute of Technology (MIT) have taught artificial intelligence (AI) to recognize a specific cat – not just any cat.
Existing AI models like GPT-5, which underpins OpenAI’s ChatGPT, are pretty good at recognizing generic objects like “a cat” or “a dog” but would be pretty much useless in locating your Snoofkin in a room full of other cats.
And if you’re thinking about tasking ChatGPT to keep an eye on Bowser as he plays in the dog park – better forget it, as the machine would most likely fail at locating him. Like folding laundry, it is easy for humans to distinguish between personalized and generic objects, but a challenge for robots.
That’s because AI models are trained to recognize categories rather than individual items or objects, and they rely on prior knowledge, not present context, to identify what they’re seeing.
In other words, the model tends to rely on what it already knows from training rather than the context of the scene to figure out which specific object it’s looking for. It will locate “a” cat but not Snoofkin, “the” cat.
To fix this obvious shortcoming, MIT researchers retrained AI by feeding it video-tracking data showing the same object in multiple scenes. To prevent AI from “cheating” – that is relying on data it memorized earlier – they used pseudo-names in the dataset, like calling a tiger “Charlie.”
That way, the model is made to identify a tiger crossing the grassland based on contextual clues – much like humans would – instead of pretrained knowledge that an image of a tiger correlates with the label “tiger.”
“It took us a while to figure out how to prevent the model from cheating. But we changed the game for the model. The model does not know that ‘Charlie’ can be a tiger, so it is forced to look at the context,” said Jehanzeb Mirza, senior author of the paper detailing this technique.
Models retrained with a new method outperformed state-of-the-art systems at identifying personalized items like someone’s cat or a specific coffee mug by up to 21%. Crucially, the MIT team’s technique leaves the rest of AI’s abilities intact.
The new approach could power AI tools to track specific objects like a child’s backpack, or localize objects of interest, such as animal species, or help the visually impaired to find certain items in the room.
“Ultimately, we want these models to be able to learn from context, just like humans do. If a model can do this well, rather than retraining it for each new task, we could just provide a few examples and it would infer how to perform the task from that context,” Mizra told MIT News.