What changes with AI?

There are two extremes when it comes to predicting the future of software engineering with the arrival of AI. There are those who dread the speed with which AI is improving and fear for their jobs. Then, there are those who come across one piece of slop as evidence that all of it is going to crumble down like a house of cards. Of course, there are many still who take a balanced view of things and I recommend listening to them.

I have been using AI for a while now and have firsthand seen the tangible boost in productivity that it brings. Let’s first understand why that is so and what it means for our careers as engineers.

Why do LLMs work so well in software?

A while back I started a hobby project to build my own note taker, fed up with synchronising my notes across devices manually. This was years before Obsidian had been released. I vividly remember picking up new languages and frameworks to solve the problem and promptly forgetting about them once I was done with the project. It took me over a month to build it and while my objectives were met, it was not the best code I had written. Just for fun, I recently rebuilt it using Claude. With a proper spec written for it and some back and forth I got a much better product in under two hours.

So why does AI work so well in software? Current AI applications are built around LLMs which work extremely well with tokenised patterns — something which is abundant in code and comes along with a large corpus of graded data in the form of open source repositories, code review comments, bug trackers, software reviews, and style guides. All this made coding the first “killer app” for AI. The more people use it, the more investment it gets from AI providers, and the better it gets.

The compiler analogy

No wonder then that people consider the productivity boost from LLMs very similar to the boost offered by compilers and later still by ISA, OS, and language agnostic virtual machines. That is where the analogies must end though. Compilers translate between unambiguous languages with nearly context-free grammars. We can formally verify the correctness of this translation. Once built and tested, a compiler is typically used by its users without requiring knowledge of the target language. Bugs in the most commonly used compilers are extremely rare.

The same is not true for LLM-based coding assistants at large. Natural languages are ambiguous without context-free grammars and LLMs are probabilistic by design. The same input—regardless of whether it is a prompt, a plan, or a spec—can generate very different codebases. It is of course not necessary for any coding agent, human or otherwise, to generate the same output for the same requirement but it is precisely this variability that stops us from making watertight guarantees on LLM generated output. In short, saying that we can all code in English is taking the compiler analogy too far.

But take my Claude-generated notemaker as an example again. While the first version worked exactly as I wanted it to, every time I added a small tweak, Claude rewrote large sections of the app and over time, the app became increasingly brittle. It was then that I properly reviewed the codebase and applied software engineering 101 to the project—reduced duplication, defined its interfaces more cleanly, and isolated its various modules. I was able to do this, despite not being fluent in React or TypeScript, because I was still asking Claude to make the changes and because once in my life I had been a .NET front-end developer and the principles of designing thick clients are not very different from those required for modern web applications. It took me a few hours to achieve all this and so I am still doing well compared to the months I had spent on my original artisan project.

I’ve seen this pattern repeated on many real-world projects and I’ve seen only the best engineers wield AI without compromising on software quality. The right compiler analogy is not that we don’t need to know hardware internals or assembly language; it’s that the best high-performance computing engineers get their edge from a thorough understanding of computer architecture with just enough knowledge of ISAs.

The art of verification

Earlier this year, Barreto used ChatGPT-5.2 to solve Erdős Problem #728. The solution can be found here. To those who, like me, are not very familiar with number theory, the problem was ambiguously framed by its authors and had evaded a solution to the best of our knowledge. So it was quite interesting to see ChatGPT-5.2 not only reframe it in a way that resolved the ambiguity while preserving the spirit in which the problem had been posed but also solve it. The real question though is—how do you know if it solved it correctly? It was not until the solution was peer reviewed, most notably by Terence Tao, a mathematician of some repute, that most of us could know that it was correct.

This will be the future of software engineering at least until doubts about the correctness of LLM outputs persist. There is considerable value to be created in building systems that verify correctness and control for quality at scale. Uncle Bob has for years advocated to write tests as if they were production code and call flaky tests buggy. Now, more than ever, we need to heed his advice and leverage TDD and BDD frameworks properly.

Tests are only a part of this puzzle. We must go further and ensure we build systems that are observable and can be verified automatically with little effort.

A new investment in our craft

Recent research by Shen and Tamkin at Anthropic found that aggressive use of AI in software engineering comes with trade-offs: particularly concerning skill development. To quote directly from the paper’s conclusion:

Given time constraints and organizational pressures, junior developers or other professionals may rely on AI to complete tasks as fast as possible at the cost of skill development—and notably the ability to debug issues when something goes wrong.

and

Cognitive effort—and even getting painfully stuck—is likely important for fostering mastery

While the study is preliminary, its conclusions are not very different from what we would expect. Practice makes us perfect after all. Early in our careers, we spent tens of thousands of hours coding and designing systems by hand. We learnt from the masters in the field and built a muscle memory for what good code and great systems look like. We coached newcomers into the field on the best practises and the tradition continued. So how are we to build this muscle memory if we are not practising in our early years?

Let’s look at some other industries. The invention of the mill meant that very few people today need to know how to weave clothes. The craft is not dead but it isn’t essential to keeping us comfortable and looking trendy. Instead, the best clothing companies today invest in sourcing the best materials, coming up with the best designs, and building robust quality controls. Many mass transit systems rely on automation to drive the trains with engineers—as train drivers are also called—trained to verify these decisions in real-time and intervene when necessary.

The same is likely to happen in software engineering. Because AI is nondeterministic, today’s engineers need a strong sense of what good code and great designs look like. And while coding and system design, like Mathematics, are not spectator sports, code and design reviews are. To take an analogy from music: you cannot become a violin maestro simply by attending concerts, but you can indeed become a better critic by listening to lots of music and by learning enough about music.

Until AI can match the accuracy of compilers, we need engineers who can review AI-generated code and architectures almost as well as they could once write them. It’s not easy, for we tend to trust AI and even other humans more. We need to review everything with a sceptic’s eye and not get carried away with AI’s accuracy—a model update from your provider could easily cause regression and you cannot be asleep at the driving wheel. Be the trained engineer driving that train.

Conclusion

I don’t have a view on whether software engineers will be needed in the future or not. But everything we have seen about AI’s capabilities tells us that the house of cards view is definitely wrong. The real value add right now is in the human judgement about what to ship. Developing the skills that enable that judgement—keen reviews, a mastery of the principles of software engineering, and a functional knowledge of the domain in which your software works—now will pay off in the long run.