OpenAI’s Q* mystery: was coup sparked by major and concerning technological milestone?


Did OpenAI researchers really warn the board of a major breakthrough ahead of CEO Sam Altman’s ousting? Altman is now back at the helm, but questions about the mysterious Q* remain.

The British comedian Eddie Izzard, touring in San Francisco in the nineties, once joked: “Guns don’t kill people. People kill people. And monkeys do, too, if they’ve got a gun.”

In tech, something similar is now said by AI thinkers. Many believe that AI alone is not a threat to humanity – the true danger lies with humans who use the tech for nefarious purposes. Sometimes, people can be evil (or stupid), and that’s why AI must be developed and used responsibly.

But what if the gun – or, in this case, an AI model – suddenly needs no human input or command at all? What if it suddenly acquires intelligence, manages to make decisions, and evolves autonomously?

A worrying breakthrough?

We’re entering the realms of science fiction now, surely? Maybe not for long. Online sleuths are already frantically discussing the possibility of a major breakthrough in AI development, hinted at in the mysterious letter to OpenAI’s board by several staff researchers, which allegedly resulted in the move against Altman.

The drama at OpenAI – if it were fictional – would surely win the annual National Novel Writing Month contest, held in November each year worldwide. Altman was removed but then reinstated as the CEO of the ambitious AI startup just five days later.

Silicon Valley is still arguing about what exactly happened, though. Not much is clear, but a small detail reported by Reuters last week caught the eye of thousands.

According to the new agency’s sources, ahead of Altman’s short-lived exile, the board of directors – now reshuffled entirely – received a letter from several in-house researchers, warning of a powerful AI discovery that they said could threaten humanity.

Mira Murati, a long-time OpenAI executive, mentioned the project, called Q* (pronounced Q-Star), to employees during the fateful week, Reuters said.

The researchers – again allegedly because OpenAI wouldn’t comment – explained they were worried that things were happening too quickly because the Q* breakthrough was getting AI systems closer to artificial general intelligence (AGI), defined as smarter than humans.

Sam Altman CEO OpenAI
Sam Altman. Image by Shutterstock.

Of course, we can’t possibly know whether Altman was fired over the mysterious project with a single letter and an asterisk.

But during the days of the crisis, speculation was rampant that this was a clash between Altman’s camp of commercialization and the so-called tribe of believers, who think that AI represents an existential risk to humanity and should only be further developed after understanding the consequences.

Since the new model is allegedly able to solve certain mathematical problems and has possibly leveraged an AI technique known as test-time computations, the question to answer here is this: is the alleged breakthrough tantamount to such a risk? Here’s our attempt to explain Q*.

Q-learning in AI

As the majority of Sherlockian detectives have been speculating, the enigmatic Q* probably refers to one of two distinct theories: Q-learning, or the Q* algorithm from the Maryland Refutation Proof Procedure System (MRPPS), defined decades ago in scientific papers.

The difference between these theories is a crucial one. Let’s start with Q-learning, which is a type of reinforcement learning where AI learns to make decisions by trial and error. Unlike today, it does not rely on human interaction or feedback.

Probably the best example here would be a robot navigating a maze. If Q-learning works, the robot learns the quickest path to the exit by itself and sets positive and negative rewards by its own design. In essence, the process is autonomous.

But if the robot used OpenAI’s current approach, known as Reinforcement Learning Through Human Feedback, or RLHF, it would rely on human intervention and indications as to whether its choice was correct or not.

Dog training is another vivid example. One usually trains a dog to, say, do a paw shake by rewarding the pet with a delicious canine treat – after repeating this a dozen times, the dog begins to realize that performing the trick is the best way to get a treat.

This is reinforcement learning. But Q-learning actually seeks to leverage reinforcement learning in a computer by figuring out autonomously which next step would be the best to take.

The enigmatic Q* probably refers to one of two distinct theories: Q-learning, or the Q* algorithm.

That’s basically figuring things out as you – or the dog – go along. If this is what OpenAI has improved dramatically, Q* might really push AI closer to being AGI.

“The overall algorithm proceeds to essentially get things done on the fly as the activity proceeds and self-derives the rules,” said Dr. Lance Eliot, a Stanford Fellow and an expert on AI and machine learning.

“If you place this into the context of generative AI such as ChatGPT by OpenAI and GPT-4 of OpenAI, perhaps those generative AI apps could be much more fluent and seem to convey ‘reasoning’ if they had Q* included in them.”

Back in May 2023, OpenAI published an article saying that they had "trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning instead of simply rewarding the correct final answer."

If the researchers used Q-learning to achieve this, it would unlock a whole new set of problems and situations that ChatGPT could resolve natively.

Q* algorithm for question-answering systems

Another theory suggests that OpenAI researchers have successfully employed the Q* algorithm. This is a sophisticated method for theorem-proving in AI, particularly in question-answering systems – like ChatGPT, as a matter of fact.

“The Q* algorithm generates nodes in the search space, applying semantic and syntactic information to direct the search. Semantics permits paths to be terminated and fruitful paths to be explored,” the research paper from 1973 reads.

Examples are needed once again, and Sherlock Holmes, or rather the famous fictitious detective’s skills, provides a perfect one.

While trying to solve a case, Holmes gathers clues (semantic data) and connects them logically (syntactic information) to reach a conclusion. The Q* algorithm works just like that in AI, navigating complex problem-solving processes.

iron-man-jarvis
Robert Downey Jr. played Tony Stark in the Ironman movie series. Image by Shutterstock.

This would imply that OpenAI is getting closer to building a model capable of understanding its reality beyond mere text prompts – a bit like J.A.R.V.I.S., an AI functioning as Tony Stark’s assistant in the Marvel universe.

While Q-learning is about teaching AI to learn from its interactions with the environment on its own, the Q* algorithm improves AI’s deductive capabilities.

What about maths?

Both of these theories are quite promising when it comes to advancing AI, even though, once again, this is all just speculation, as OpenAI neither confirms nor denies the existence of Q*. But there’s more.

Reuters’ sources said that the new model was able to solve certain mathematical problems at the level of grade-school students. This might sound pretty basic, but the researchers are supposedly now “very optimistic” about Q*’s future success – why?

That’s because maths is considered to be a frontier of generative AI development. Currently, generative AI is good at writing and language translation by statistically predicting the next word, and answers to the same question can vary widely.

But conquering the ability to do maths – where there is only one correct answer – implies that AI would have greater reasoning capabilities resembling human intelligence. This could be applied to novel scientific research, for instance, AI researchers believe.

Unlike a calculator that can solve a limited number of operations, AGI can generalize, learn, and comprehend, and solving even grade-school level math is actually a huge step towards true AGI – as surprising as it sounds.

Indeed, people are quite stunned to discover that generative AI is not especially able to figure out straight-ahead maths problems.

"Some describe generative AI as being a mimicry of human wording. Others indicate that generative AI is no more than a stochastic parrot,”

Lance Eliot, an AI expert.

The overriding assumption is that since generative AI can produce fluent essays about all manner of topics and can answer tough questions of all sorts, those math problems should be extremely easy for the chatbots to solve.

Not so. Maths is tough for generative AI because it is essentially based on a large language model (LLM), which, in turn, is devised by scanning massive amounts of online text from the internet and related sources.

Humans express things via text, and the LLM is a model of how we say things – of patterns that are based on the words used. No direct calculations or formulas are invoked.

“In school, the teacher provides a set of rules and processes for the students to use to solve these math problems. The student doesn’t just read the words of the maths problem. They have to extract the essential parameters, make use of formulas, and calculate what the answer is,” said Eliot.

“By and large, that is not what generative AI and large language models are devised to do. These are word-oriented pattern matches. Some describe generative AI as being a mimicry of human wording. Others indicate that generative AI is no more than a stochastic parrot.”

AI researchers and developers have been trying to deal with this lack of mathematical reasoning in generative AI, for instance, by using an external app that is programmed to handle maths problems.

But the ultimate goal would be for generative AI and its LLM to solve these maths puzzles without using any other app – so maybe Q* has been able to crack the code?

Is AGI actually closer?

Within the realm of AI, the notion of singularity or AGI – the Holy Grail of AI research – is constantly discussed. Even Altman himself commented in June: “I think we’re close enough. But I think it’s important that we realize these are tools and not creatures that we’re building.”

We don’t know whether Altman meant Q*, of course. It’s great that he stresses that these models are supposed to be tools – not autonomous systems.

Still, one cannot avoid deliberating the possible consequences of an AI system evolving into AGI. What if AI actions are not aligned with human values? Why need human workers if an AI can do almost anything?

What if the current era – which excited tech leaders say is an AI spring – is actually a winter for the world as we know it?

Alas (or rather, fortunately), Q* does not equal achieving AGI – even if it were a significant step forward. It’s actually rather simple – achieving AGI would mean developing an AI that can perform any intellectual task that a human can.

Besides, a machine that has achieved Q* is not aware of its own existence and cannot yet reason beyond the boundaries of its pre-training data and human-set algorithms. Yes, Q* might be a leap (if it’s real), žut AGI remains another – huge – bound away.