Code is Cheap, Confidence is Not

The Real Bottleneck in AI-Assisted Software Delivery

If you want to understand where software delivery is heading, pay less attention to those amazing demos where an AI tool spins up a working API in 90 seconds. Instead, focus on what happens after… when those huge blocks of AI-generated code have to survive review, security checks, integration, performance testing, and the reality of production. That’s where delivery can grind to a halt.

Here’s a scene playing out across engineering teams: someone submits a massive pull request with hundreds of lines of code changed. It passes the happy-path checks. It even looks clean at first glance. But it’s much longer than expected, more verbose than the team’s usual style, and full of small choices that are hard to quickly validate. Reviewers start asking, “Why did you write it this way?” The author shrugs: “That’s how the model did it.” Now the burden shifts to the team: do we merge it, rewrite it, or spend two days validating something we don’t fully understand?

This is the realization that I’ve come to in recent months: Generative AI doesn’t eliminate engineering constraints, it relocates them. As code generation accelerates, the most important work for engineering leaders has become defining “trustworthy code,” embedding verification into their SDLC, and aligning metrics and accountability so speed doesn’t come at the cost of maintainability, security, or compliance. In other words: code is becoming more abundant. Confidence is becoming scarce.

The Bottleneck Has Moved From “Writing” to “Trusting”

In a recent McKinsey interview, Sonar CEO Tariq Shaukat described something many digital leaders are feeling but haven’t named yet: we’ve largely moved beyond code generation being the bottleneck. The challenge now is clarity and trust, specifically, the downstream bottleneck where humans still need to review and verify everything that gets produced.

This is the part of the AI conversation that so often gets skipped. It’s easy to hail productivity enhancements when you’re talking about a prototype or a greenfield service. It’s harder, and more essential, to talk about what happens when you introduce AI-generated output into a mature enterprise environment that already has years of accumulated architectural decisions, deeply intertwined dependencies, non-negotiable security and regulatory expectations, and on-call rotations with tribal knowledge of every shortcut you ever took.

Shaukat makes a point that’s especially relevant for CTOs and other engineering leaders: as AI models improve, they may also become harder to review and maintain. In Sonar’s benchmarking, newer models solve problems better overall but also generate dramatically more code. He specifically mentions GPT-5 producing about three times as many lines of code as earlier versions. This increases maintenance complexity and creates even more places for subtle bugs to hide. That’s the paradox: better output can still increase risk if it expands the “surface area” your team must validate.

“Vibe and Verify” is Not Just a Meme

You may have heard the phrase “vibe and verify.” Sure, it sounds a little glib, but the point is operationally true: you can move fast with code generation, but you can’t ship until the work has been verified in your environment and against your standards. As Shaukat frames it: go ahead and generate quickly but then validate before deployment to ensure the code fits your environment, meets compliance requirements, and can be trusted. This matters more than you might think because it changes what “good” looks like.

In the pre-AI world, engineering excellence was often measured by the craft of writing and designing code. In the AI-assisted world, excellence increasingly shows up as the ability to produce reliable change on a regular basis. This requires a delivery practice that can absorb higher volumes of code without collapsing under review load, rework, or incident response.

To put it another way: if AI increases your throughput, your delivery system must become more truthful. It must tell you quickly when something is wrong, risky, non-compliant, or simply inconsistent with your architecture and standards.

Accountability Starts With Defining “Trustworthy”

This is where senior leadership needs to step in. AI doesn’t eliminate accountability, it just makes it harder to enforce when your organization hasn’t defined what matters most. Shaukat frames it clearly: accountability starts with defining what “trustworthy” means for your organization. A large manufacturer and a startup will have very different thresholds. Most enterprises care about explainability, transparency, and repeatability. They’re uncomfortable with the same AI model evaluating its own output. He points to the practice of using independent models for verification (i.e. one model checking another) as an emerging best practice.

That framing is useful because it moves the debate away from “Should we let engineers use AI?” and toward something far more productive:

What level of evidence do we require before code ships?
What risks are acceptable in different domains?
What does “repeatable and explainable” mean in our specific context?
What can be automated, and what must remain in the realm of human judgment?

These are governance questions, of course, but they’re also engineering questions. They shape how you build, how you review, how you test, and how you operate. And they’re exactly the kind of questions that separate organizations that simply use AI from organizations that substantially benefit from AI.

Where Are the Productivity Gains We Were Promised?

A lot of the hype around AI coding is built on a particular kind of story: a startup building from scratch, writing almost everything with AI, and moving at an eye-watering pace. That story is real but it’s not the environment most of us lead in.

Shaukat calls this out directly: the “greenfield” examples don’t translate cleanly to enterprises with millions of lines of existing code and high internal and external regulatory standards. The lagging productivity many leaders report isn’t a failure of the technology. It’s the reality that integration, training, and governance take time, just like they did with cloud computing a decade ago. This is a critical leadership point: AI doesn’t replace the need for engineering discipline; it makes discipline the pivot point that determines whether AI helps or hurts.

If your organization is struggling to translate pilots into production value, it’s often because AI is exposing weak spots that were already there:

Architecture intent that exists only in tribal knowledge instead of documented boundaries and patterns
Code review practices that were considered “nice to have” rather than treated as a core part of the process
Quality and security checks that occur too late, well after the volume of change has become overwhelming
Metrics that reward output over outcomes, encouraging speed even when it increases long-term drag

You can adopt AI and continue to deliver the same way you always have. But if you do, you’ll likely find that you’ve merely amplified the most painful parts of the system.

How Trissential Can Help

If you’re feeling this tension with wanting to unlock AI’s potential without letting your delivery system buckle under the weight, you’re not alone and you don’t need another massive transformation initiative to get unstuck. What you need is clarity on where your system is breaking and how to fix it.

Trissential works with engineering leaders to:

Define what “trustworthy code” means in your environment, not in theory, but in practice, with clear thresholds your teams can apply consistently
Identify the chokepoints where AI-assisted throughput is outpacing your ability to verify, review, and ship with confidence
Build the guardrails and verification muscle that let you move faster without gambling on quality, security, or compliance

We’re not here to sell you another tool or run another pilot. We’re here to help you turn AI adoption into sustained delivery capacity, the kind that shows up in your cycle times, your defect rates, and your ability to sleep at night.

Ready to make AI work for your team instead of against it? Let’s connect.

Learn more about Trissential’s Digital Engineering Services: Software Engineering, Quality Assurance & Testing, Cloud Strategy, PLM

Talk to the Expert

Brian Zielinski – Sr. Director, Digital Engineering
brian.zielinski@trissential.com