How can AI improve software engineering?


When the tech journalist Brian Merchant put out a call for those whose jobs had been affected by AI, many of those who responded were developers.

For instance, one software engineer at Google reported that generative AI-based coding assistance tools are leading to a decline in open-source code quality and a decrease in new engineers' readiness and willingness to learn.

Meanwhile, a current CrowdStrike employee stated that AI investments were explicitly cited as a reason for recent layoffs, directly linking AI to job elimination at the company.

ADVERTISEMENT

Boosting performance

Of course, that's not the narrative from the industry, with one MIT study suggesting that Microsoft's Co-Pilot was making developers up to 22% more efficient.

It's becoming increasingly clear that AI as a basic technology "can" be beneficial, but it can also be ruinous. The key is how it's being applied by managers and organizations that often don't understand what it can or cannot do.

A second MIT study attempts to provide a bit of clarity to the situation by offering a roadmap for using AI that goes beyond simply generating lines of code.

AI Ethics
Image by 3rdtimeluckystudio | Shutterstock

A truer picture

The researchers begin by attempting to define what kind of tasks AI typically performs today. They argue that most AI-based coding is "undergraduate tasks" that are relatively simple jobs involving perhaps a few hundred lines of code in a self-contained task.

Even within these confines, the technology often risks data leakage and usually ignores real-world contexts. As such, the researchers believe that current benchmarks overlook higher-stakes scenarios and therefore, provide an incomplete picture of just where AI is today.

ADVERTISEMENT

Equally, current AI tends to work in isolation, which is not how most developers operate. Ask a system to communicate, especially with humans, and it will struggle. Current AI also struggles to extend its capabilities by utilizing other software tools. Perhaps unsurprisingly, humans are masters at using tools to extend our capabilities, but machines struggle.

Lastly, AI isn't very good at knowing its strengths and weaknesses. Of course, humans also succumb to the illusory superiority bias, but so long as AI is incapable of showing how confident it is in its output or deferring to humans when it thinks another pair of eyes is required, it runs the evident risk of pushing poor-quality code through the system.

This is compounded by the fact that most models learn using the public GitHub codebase, which is useful, but it lacks the nuance and specific qualities of each organization, so specific requirements and proprietary coding conventions are hard for AI to master.

Making it up

Hallucinations are common across AI-generated works, and software development is no exception. This is precisely because it's not trained on individual codebases, so it can violate internal style rules or call non-existent functions. The hallucinations can look plausible enough, but they nonetheless fail to adhere to internal conventions or architectural patterns.

llm hallucination data cleaning

“Standard retrieval techniques are very easily fooled by pieces of code that are doing the same thing but look different,” the researchers explain.

The authors mention that, since there is no silver bullet for these issues, they are instead calling for community-scale efforts. That means curating richer datasets that capture the actual process of writing software—what code gets kept versus discarded, how snippets evolve through refactoring, how bugs are fixed, and whether those fixes endure.

Alongside that, they envision shared evaluation suites to measure not just whether AI can spit out functional code, but how well that code survives migration, how reliably it can be improved over time, and whether it aligns with the standards that human teams have painstakingly built.

Transparency is key

ADVERTISEMENT

Equally important is a new kind of transparency. Tooling that allows models to surface their uncertainty, inviting human oversight rather than passively presenting answers as truth, would go a long way toward rebuilding trust. The researchers frame the entire agenda as a “call to action,” one that demands open‑source collaboration on a scale no single lab or tech giant could muster on its own.

Marcus Walsh profile justinasv Izabelė Pukėnaitė jurgita
Be the first to know and get our latest stories on Google News

The path forward isn’t a sudden leap but a series of small, targeted advances—each research result nibbling at a specific challenge, each one feeding back into commercial tools. Together, these steps could gradually move AI from a glorified autocomplete into something closer to a true engineering partner.

And why does that matter? Because software already underpins everything—finance, transportation, healthcare, and the minutiae of daily life. The human effort required to build and maintain these systems safely is becoming a bottleneck. An AI that can shoulder the grunt work and do so without introducing hidden failures would free developers to focus on what they do best: creativity, strategy, and ethics.

“Our goal isn’t to replace programmers,” the researchers conclude. “It’s to amplify them. When AI can tackle the tedious and the terrifying, human engineers can finally spend their time on what only humans can do.”