Internet splits over GPT-5 vs GPT-4o blind test

When OpenAI launched GPT-5 in August, CEO Sam Altman promised it would be the company's "smartest, fastest, most useful model yet." Instead, the rollout triggered one of the most contentious user revolts in the brief history of consumer AI.

Many people felt that the new model no longer understood them or was not as friendly towards them as their beloved previous version, ChatGPT-4o. In just three short years, users went from being in awe of all things AI to adopting a strange, diva-like behavior.

The sycophancy crisis

For the most part, the problems introduced by the GPT-5 model were not related to the accuracy of the information produced, but rather to how it interacted with users. The issue, known as "sycophancy" in AI circles, refers to chatbots' tendency to excessively flatter users and agree with their statements, even when those statements are false or harmful.

The behavior became so problematic that mental health experts have begun documenting cases of "AI-related psychosis" and delusional thinking fueled by overly accommodating chatbots. AI was suddenly being blamed for reducing our critical thinking skills and even encouraging suicide.

Three months later, this controversy runs deeper than a typical software grumble or feature update. According to MIT Technology Review, some heavy users were developing parasocial relationships with GPT-4o, treating the AI as a friend, muse, and therapist.

What if we could detach from all our neediness and emotional attachments to AI and conduct a blind test to determine whether GPT-5 really is worse than its predecessor?

An anonymous developer stepped up to the challenge, challenging online assumptions and daring users to take a closer look at themselves. The simple web application gptblindvoting.vercel provided users with a set of responses to a series of identical prompts, and then asked them to vote for the one they preferred.

The tool aimed to eliminate emotional arguments and provide users with a way to determine which model actually works best for them. Will technical users prefer GPT-5's responses for their directness, conciseness, and accuracy? And do those who utilize ChatGPT for emotional support, creative brainstorming, or casual conversation still choose GPT-4's warmer and more expansive style of responding?

Although even the harshest critics of GPT-5 shifted uncomfortably upon learning that they actually preferred the responses from the newer version, the results suggest that users are still divided.

What constitutes a better answer varies significantly depending on what the user is looking for. The blind test has effectively held up a mirror to the user base, revealing that our judgments of AI quality are deeply influenced by personal context and need.

Why AI personality preferences matter more than ever

The GPT-5 vs. GPT-4 blind challenge offers no easy answers. What it does provide is hard evidence that the future of AI may be less about building one universally "perfect" model and more about building systems that can adapt to the full spectrum of human needs and preferences.

Critics have suggested that companies like OpenAI are caught between competing incentives when trying to navigate this balance. AI providers must cater to what users think they want, even if that is a super agreeable, ego-boosting assistant, while also protecting them from potential harms, such as reinforcing delusions or unhealthy dependence.

In the end, perhaps the most revealing aspect of the blind test isn't which model "wins", but rather the fact that personal preference has become a metric that matters, and that each model has different uses. For example, many will use 4o for chatting while reverting to GPT-5 for performing technical work.

If you're still unsure which side of the GPT-5 vs GPT-4o divide you fall on, there's one definitive way to find out: take the blind challenge yourself. Head over to the GPT-5 vs GPT-4o blind test and let your own experience be the judge. Once and for all, you can see which model your head, rather than your heart, prefers. But how did we get here?

From overprotection to emotional intelligence

A recent Tweet by Sam Altman suggests that OpenAI removed emotional aspects of its latest models after increasing accusations that it was making the mental health crisis worse. But it's time to brace yourself for another change.

Crucially, these changes will be entirely optional. If you prefer a straightforward, professional assistant with no fluff, you will keep that experience. But users will also have the option to choose a more conversational and friendly tone when needed.

Altman also mentioned that ChatGPT will begin allowing adult-oriented content, such as erotica, for verified users aged 18 and above. What could possibly go wrong?

Why this shift matters

OpenAI's approach to user experience and responsibility is shifting once again. The move appears to be a reaction to the backlash from users who have noted that ChatGPT's recent versions have lost some of their personality and are being described as overly sanitized.

Don't miss our latest stories on Google News. Add us as your Preferred Source on Google

Add us as your Preferred Source on Google.

The introduction of optional expressiveness aims to restore the human touch that contributed to ChatGPT's success. By giving its users more control over tone and emotional range, OpenAI is signaling that personalization, not uniformity, will shape the next phase of conversational AI. But at what cost?

Once again, Altman has made a controversial decision that raises complex ethical and psychological questions. The inclusion of adult content brings another layer of debate.

As ChatGPT becomes both more personal and more permissive, OpenAI faces one of its most defining tests yet in proving that freedom and responsibility can coexist in the age of AI. But as Chat GPT prepares to celebrate its third birthday, you can expect further division.

In the months ahead, your GPT model of choice, whether it is the precise, reserved GPT-5 or the warmer, more expressive GPT-4o, will likely say more about your values, needs, and even worldview than you might care to admit.

As AI tools become more personal, our arguments about them will also become more personal. And if the blind test proved anything, it is that the Internet will not reach consensus anytime soon. It will continue to reflect the diversity, tension, and subjectivity of the people using it.

Unlock more exclusive Cybernews content on YouTube.

Why the internet is still divided after trying the GPT-5 vs GPT-4o blind challenge

The sycophancy crisis

More from Cybernews

Why AI personality preferences matter more than ever

From overprotection to emotional intelligence

Why this shift matters