Why LLMs write horrible code
I'm not surprised at all by the "90% of code in our organization is written by LLM" claims. It's because LLM-written code is way too verbose. I explain why
They optimize locally, given their context
They are optimized for code to run, not for prettiness or elegance of the code.
I think a month of use of Claude Code (after 4 months of Cursor) has finally made me good at vibe coding. I know when to ask the LLM to write the code, what to ask it to write, and when I should just write the code by myself. When I ask a question, I know when to blindly accept its suggestions and when to add my own value.
In a sense, my personal vibe coding (which I’d written about earlier) is like a long put option (combined with the LLM-written code, which is a “one delta stock”) - because I know what the code does, and what the larger system does, I know when it is doing rubbish, and can thus get the upsides of vibe coding and none of its downsides.
If you have vibe coded, you will know that LLMs write extremely verbose code. They are extremely “defensive” - way too many try-catch blocks (so that even genuine bugs go unnoticed for months together), way too many checks on what to do in different situations, etc. Basically they write a lot of “slop” (a word used a lot with AI nowadays) around the actual code. For a human reading the code, it can be a bit of a problem.
Like I had written in my previous post on the topic, once you’ve used LLMs to write code, it becomes so humanly unreadable that you need to forever use only LLMs to edit it .
But then I try.
Yesterday I was doing a major refactor of my company’s code. I do this once in a while, though sometimes I think i need to do this more often (like I found a bug that I had written in November 2024!). And when I do these refactors, I take the opportunity (and great pleasure, as it happens) to strip out LLM-written slop in our code base (all of us here vibe code. Even the said refactor was done primarily by Claude; and even when stripping out the LLMese, Claude does most of the work).
During this latest round of refactoring, I got rid of a lot of deadwood “defensive code”. And in the process, understood why LLMs write such defensive code.
Basically they are “optimizing locally”. Let’s say you highlight one function and ask it to rewrite it, the LLM optimizes for that piece of code to be “well software engineered”. While it might be able to see the rest of the codebase, and keep it in context, the LLM assumes that it doesn’t know how the data that is being processed in this piece of code was produced (I’m only talking about data science code here, since the rest is out of my syllabus).
And so, for example, you might produce a data frame with columns A, B, C, D and E in one part of the code. If this has to be manipulated by another function, the LLM assumes that it doesn’t know how this data frame is being produced, and puts all sorts of checks and balances to make sure the code doesn’t error out.
I don’t know if it is a function of RLHF (reinforcement learning with human feedback) but the coding agents all optimize for writing code that runs without errors. There is seemingly no penalty for verbose or overly defensive coding in their training, and so that’s what they optimize for.
The only way I can see LLMs getting less sloppier at coding is if they are able to look at the entire repo at once, and optimize globally rather than locally. That way, they’ll know how something was produced and so will be able to use it far easier without having to resort to a hundred checks.
Until then, it is all up to supervised vibe coding to get the best out of LLMs!
Oh, and if you wanted to know, yes, 90% of the code in my company’s codebase has been written by an LLM (I use Claude Code, some people use Cursor, some others Github Copilot; I haven’t come across a Windsurf user in my company. Yet).


