FDA’s bellyflop a sign that AI is no silver bullet for public safety – interview


In an interview with Cybernews, Brooke Hartley Moy, cofounder and CEO of AI fact-checking platform Infactory, admits she wasn’t exactly surprised when the Food and Drug Administration’s new AI tool Elsa, introduced to help fast-track drug approvals, began cooking up fake research data.

Key takeaways:

Imagine, just for one sad moment, you’re in a life-or-death situation. You’re terribly ill and you’re running out of time: if you’re not given a certain revolutionary drug soon, you’re going to die.

ADVERTISEMENT

Thankfully, the FDA is here to help. Armed with its new generative AI tool, Elsa, the agency has sped up exhaustive drug approval processes, and the new cure reaches you while you’re still hanging in there.

Great! Or is it? Unfortunately, Elsa – just like most other similar programs – is prone to hallucinations, and the approval of this specific drug was based on nonexistent medical studies. Your condition keeps deteriorating, and soon, there’s a funeral.

Okay, all of the above is a fictional scenario. Nothing of the sort has happened. However, it could if the FDA doesn’t fix major – and very real – issues with Elsa.

FDA
Image by Andrew Kelly | Reuters

In July, FDA insiders told CNN that Elsa was making up imaginary medical studies and misrepresenting research. The tool is essentially unfit for clinical reviews because it misinterprets important data far too often.

“Anything that you don’t have time to double-check is unreliable. It hallucinates confidently,” one FDA employee told the network.

Importantly, the FDA seems to have collaborated with OpenAI on Elsa. And sure, using large language models could indeed eventually speed up the slow drug review process, which can excruciatingly drag on for years.

Not now, though, Hartley Moy, cofounder and CEO of Infactory, a company that transforms articles, data, and content archives into AI-ready formats that can be easily queried, cited, and licensed, tells Cybernews in an exclusive interview.

ADVERTISEMENT
Ernestas Naprys vilius jurgita Niamh Ancell BW
Be the first to know and get our latest stories on Google News

AI models are still far from perfect and have a track record of hallucinating facts, oversimplifying nuance, and generating confident nonsense.

Just this week, a new study by researchers at the Icahn School of Medicine at Mount Sinai found that widely used AI chatbots are highly vulnerable to repeating and elaborating on false medical information, revealing a critical need for stronger safeguards before these tools can be trusted in health care.

To be fair, when it’s just play, it’s all good. But when critical public health – or finance, for that matter – decisions are on the table, human intervention is and will remain vital.

“In an industry like healthcare, small errors have significant impacts. Sure, you can apply AI in order to digest large amounts of information, but it has to be an augmentation tool for human capacity. That’s where you really get the superpower effect,” says Hartley Moy.

We misunderstand the LLMs

Why and how could the FDA's rush to implement the generative AI tool without proper safeguards endanger public health in America?

There are a few things. There has been a lot of excitement and a lot of hype around the potential of AI, particularly just in the last few years with the launch of ChatGPT, which really made large language models mainstream.

OpenAI LLMs
Cheng Xin/Getty Images

It’s something that is now not just in the public zeitgeist but actually on the to-do list of every major corporation and government organization.

ADVERTISEMENT

They have been under a lot of pressure to start adopting these tools as quickly as possible to keep up with innovation. So, they’re really trying to push these technologies into a place that they may not be ready for.

One thing that has been broadly misunderstood is the large language models, which are the underlying tech for pretty much all of the generative AI that we're seeing today.

brooke-hartley-moy
Brooke Hartley Moy.

How are they better? What are they weaker at? What are their shortcomings and pitfalls? When they launched at the very beginning, there was a lot of talk around the idea that when you take a probabilistic model, you're going to have some degree of randomness and chaos in the system.

The LLMs are incredibly poorly suited to things that require a high degree of precision, accuracy, and trust. It's that mental misunderstanding and mismatch that has misled not just the FDA but almost every organization.

Brooke Hartley Moy.

That's the nature of LLMs themselves. At the end of the day, hallucinations are a feature in many senses, not a bug, because they are what creates a lot of the interesting and dynamic use cases that drive AI.

But that means that they're also incredibly poorly suited to things that require a high degree of precision, accuracy, and trust. It's that mental misunderstanding and mismatch that has misled not just the FDA but almost every organization.

The examples are endless. We've seen Google and Meta run into tricky situations. They know exactly what the technology limitations are; they aren’t coming from a place of ignorance. So, to me, it's kind of a perfect storm.

Sure, there’s a lot of excitement and interest, which I think is positive. Ultimately, we're AI optimists over here – we wouldn't be running an AI company if we weren't.

llm-doctors-trust
Image by Cybernews.
ADVERTISEMENT

But at the same time, you have such a massive gap in the knowledge, expertise, talent, and other things that are really required to ensure that you're putting the right safety mechanisms in place and that you are prepared to launch these kinds of projects in a way that doesn’t compromise public safety.

Too comfortable too quickly

In this specific case of health care, what kind of standards would you have in place before AI is anywhere near, say, drug approvals? These may be life-or-death situations, right? Some drugs might not be approved in time for someone to survive an illness, for example. What kind of path do you see for AI in healthcare in general?

I believe that AI has tremendous potential for the healthcare industry. Not everything is a life-or-death situation in healthcare. When we talk about healthcare, that's such a broad spectrum of things.

There are many ways to use it, for example, in WebMD or as digital scribes who can take much more accurate notes than a human scribbler. All this has clearly made healthcare safer and more effective.

If you can control the sources, you're potentially going to end up with safer and more accurate information.

Brooke Hartley Moy.

So I don't think it's necessarily a question of whether AI belongs in the industry. It definitively does and will have a place in it going forward.

Where it becomes thorny and where you’re seeing some of the largest concerns is that we have become increasingly and surprisingly quickly comfortable with letting AI make decisions without any degree of human intervention.

In fact, that's the dream pitched right now across Silicon Valley, where they talk about things like AI agents and the whole idea of its autonomy with no human interference or interaction.

That's a really interesting path, and we may get there over the next few years or the next decade. But we’re not there yet.

ADVERTISEMENT

In my mind, there’s not a fully autonomous AI that can make critical decisions. We do need some degree of manual oversight from humans.

chatgpt-ai-healthcare
Image by Getty Images/NurPhoto.

We've been working with a large organization in the healthcare training world. They provide a lot of the educational content for folks who are nurses or hospital staff on questions like proper procedures, medication, equipment use, and even hand-washing protocols in the hospital.

What's interesting is that when you talk to the executives of this company, they're very keen to do as much as possible with AI.

They understand that the potential of less human intervention in some ways could mean fewer mistakes, particularly when there’s a possibility that workers may go on Google or YouTube and maybe look up more dubious sources of information.

If you can control the sources, you're potentially going to end up with safer and more accurate information.

bronze AI bot reading a book at the desk
gremlin/Getty Images

Of course, this should not fully replace some kind of human-in-the-loop mechanism that should always ask whether this or that is grounded in fact. This is still going to be necessary, at least for the industries where that degree of fidelity is so key.

The government can’t afford super AI pros

Do you not think, though, that because the system is not perfected and you have to double-check everything, the whole process is even longer? Don’t you think AI creates more work by claiming that it reduces the workload?

ADVERTISEMENT

Yes, I think there's a lot of bad AI out there, and I have a lot of conversations where I talk about how it's a square peg in a round hole.

Not everything should be solved at this stage. There are many things that continue to be extremely effective using just older technologies or traditional machine learning.

I would be cautious, though, about the idea that it's always about double-checking. Sometimes, that's a simplification of situations when we talk about a human in the loop.

There are many things that continue to be extremely effective using just older technologies or traditional machine learning.

Brooke Hartley Moy.

In my own team, we run a fairly lean engineering team right now because what used to require a substantial number of engineers is now done with a few senior engineers and coding assistants.

They don’t need to critically confirm that every line of code is perfect, but they’re able to monitor the machine, if you will, and supervise it almost the same way you would supervise a junior employee. And the efficiency gains there are incredible.

Obviously, that's not quite the same degree of fidelity that's required in an industry like healthcare or even industries like finance and others, where small errors have some significant impacts.

But it does suggest that if you can build your workflows in the right way and apply AI where it’s at its best – which is often being able to digest large amounts of information in ways that humans cannot and its ability to create, troubleshoot, and reason – and if you can use it as an augmentation tool, that's where you really get the superpower effect AI can provide.

What do you think is the way forward for the FDA in this specific case of using the AI tool?

There are solutions for this particular use case. We focus on combining more deterministic pathways and the ability to actually show that a fact is a fact with the summarization capabilities and scalability effects of generative AI.

There are three things that I often see causing AI projects to fail. First, there’s a degree of hubris about what AI can and can’t do.

That’s largely our industry's fault because we like to talk about artificial general intelligence and the future, and we're all pretty bullish on where AI is going.

That has led the adjacent industries to proceed with less caution than they otherwise should.

FDA logo with chat gpt in yellow background with yellow syringe
By Cybernews.

Besides, in the case of the FDA, you don't have an organization that is probably full of deep AI machine learning specialists and natural language programmers. The talent pool just isn't there: it's concentrated in a few very large companies.

The demand for AI across so many industries is robust. When I'm on LinkedIn looking at how many jobs there are for someone with 10 years, 15 years, or 20 years of AI experience, there are just not enough people to fill those jobs. They don't exist.

If they do exist, they are well compensated at a large tech company, so trying to get them to go and work for the government is a tall order.

Lastly, it's the speed issue. There's a lot of pressure within organizations to get AI tools out as quickly as possible to show that they're making progress.

The FDA needs to rebuild public trust: this is going to come from taking a much more conservative, cautious approach rather than trying to advertise some sort of magic silver bullet.

Certainly, within the government right now, there's a lot of interest in showing results as quickly as possible. And traditionally, organizations like the FDA haven’t been known for being particularly fast.

Of course, we've recognized that’s actually a good thing, given the nature of the challenges that they are working on. So for the FDA, it requires rebuilding public trust: this is going to come from taking a much more conservative, cautious approach rather than trying to advertise some sort of magic silver bullet.

Brooke, you mentioned finance a little while before. What do you think are other areas where the use of AI could be risky or dangerous if not properly controlled or regulated?

Finance is a little different right now because, as I've observed in the industry, a lot of the early interest in AI finance use cases is deeply embedded in the parts of the industry that have always been risk-taking.

I mean hedge funds, for example, that have been using machine learning models for a long time. Even so, finance itself is sort of professional gambling at times. Even though precision is really important, risk is built into the game.

chatbots-vs-human
Image by Cybernews.

Our other major focus is in the news and publishing space. The majority of our customers are publishers, large newspapers, and media companies.

They’re under tremendous pressure to identify how they’re going to participate in the AI ecosystem. What does AI mean for subscriptions? What does that mean for eyeballs on homepages and ad revenue? What does AI mean for content that’s being scraped and repackaged?

The fundamental issue, even with all that anxiety about revenue, is how information in the news is going to be represented. And we all know the concept of fake news: it can spread much more quickly, given the systems that are in place.