AI still doesn’t work very well, businesses are faking it, and a reckoning is coming
Summary
Codestrap founders Dorian Smiley (CTO) and Connor Deeks (CEO) warn that many organisations are rushing to adopt AI without the right metrics, controls or understanding of failure modes. They argue large language models are fallible, produce misleading code and content, and that firms are often “pretending” to have solved integration and governance problems. The pair call for a reality check: new performance metrics, better feedback loops and honest conversations about legal, financial and insurance risk.
Key Points
- Enterprise AI adoption is ahead of understanding: few organisations have clear reference architectures or validated use cases.
- LLMs are inherently fallible — generated code can pass unit tests yet be catastrophically inefficient or incorrect (example: an AI-driven SQLite rewrite in Rust that regressed performance).
- Typical productivity metrics (lines of code, pull requests) are misleading; meaningful engineering metrics are deployment frequency, lead time, change failure rate, MTTR and incident severity.
- Codestrap suggests new AI-specific measures (for example, tokens burned per approved pull request) to assess real impact.
- Misaligned incentives in consultancies and service firms can encourage unchecked AI use, producing poor outputs, refunds and legal exposure.
- Insurers are wary of underwriting AI risk, which may remove coverage and increase liability for firms that rely heavily on AI.
- Smiley predicts code-quality problems surfacing within eight to nine months for heavy AI users; Deeks expects more lawsuits and pricing pressure as clients demand discounts when vendors use AI.
Content Summary
Smiley and Deeks — both veterans of PwC who now run AI advisory Codestrap — say companies have rushed to adopt generative models without adequate feedback loops. They point out that models are non-deterministic, bad at reliably retrieving facts, and cannot check their own work. Organisations measure the wrong things (for example, lines of code), so apparent productivity gains may mask declines in quality and performance.
They cite concrete consequences: an AI-generated rewrite of SQLite that passed tests but ran thousands of times slower, and a Deloitte refund to the Australian government after AI-introduced errors. Together these illustrate how unchecked AI outputs can cause costly failures. The pair also highlight the business effects: customers pressuring suppliers for lower prices when AI is used, insurers excluding AI from cover, and the prospect of litigation when AI-derived advice goes wrong.
Context and Relevance
This is a timely reality check for CIOs, CISOs, architects and procurement teams. After a wave of rapid AI adoption, the article connects technical weaknesses in LLMs to measurable business risks: degraded system performance, legal exposure, pricing pressure and insurance complications. It feeds into broader conversations about governance, observability and the need for new operational metrics for AI-driven work.
Author style
Punchy — the founders’ warnings are framed bluntly: dial down the hype, instrument outcomes properly, and fix incentive problems now. If you run technology or risk in an organisation, this article isn’t just commentary — it’s a prompt to act before problems compound.
Why should I read this?
Look, if you’re involved in making AI decisions or paying for them, this is your wake-up call. It sums up where the shiny demos hide real-world failures, explains which metrics actually matter, and flags immediate commercial risks (refunds, lawsuits, insurance gaps and price pushback). Short version: don’t treat AI as magic — treat it like production software that can break things badly.
