The curse of being early
Turns out when you build-build-build, you have to be prepared to tear it all down.
Last week I found a piece of code that enforced turn-based conversations with an LLM. Like a chess clock. You say something, you wait, the model responds, you wait, you say something back. Strict alternation. No interrupting.
I stared at it for a good thirty seconds before I remembered why it existed.
Our first model was text-davinci-002. If you’re not familiar: it wasn’t a chat model. It was a completion model. You’d give it a blob of text and it would try to continue it. There was no concept of “messages” or “roles” or “system prompts.” You’d format your own conversation by hand, something like Human: ... Assistant: ..., and pray the model understood the pattern.
It usually did. Until it didn’t. And when it didn’t, you got an AI that would respond to itself, or start roleplaying as the human, or just wander off into increasingly creative fiction. The turn-based enforcement existed because without it, the whole thing would derail.
That code was still running in production. In 2026.
The art of throwing stuff away
Everyone in tech talks about building. Shipping features. Adding capabilities. The whole culture is additive. More tools, more integrations, more options. Your product roadmap is a list of things you’re going to add.
Nobody talks about removing things. But if you’ve been building on top of LLMs since the early days, removing is the single most important skill you can develop.
A significant portion of our codebase, and I mean significant, is patches. Workarounds. Guardrails. Things we built because the model at the time couldn’t do what we needed it to do.
Turn-based conversation management? Built it. Context window management with sliding windows and summarization? Built it. Our own tool-calling framework because the model didn’t support function calling natively? Built it. Structured output parsing with regex and retry loops because the model couldn’t reliably return JSON? Built that too.
Every one of those was the right decision at the time. Every one of those is now, to varying degrees, obsolete.
Modern models handle multi-turn conversations natively. Context windows went from 4k tokens to over a million. Tool calling is a first-class API feature. Structured outputs with guaranteed JSON schemas ship out of the box.
But the code is still there. And it’s not inert. It’s not just sitting in a corner collecting dust. It’s actively running, actively adding complexity, actively creating bugs that wouldn’t exist if you just... removed it and trusted the model.
The bleeding edge trap
There’s a pattern I keep seeing. It goes like this.
You try something ambitious. Something on the absolute bleeding edge of what’s possible with the current model. It works in your demo. It works in your test suite. You ship it.
Then reality hits. Edge cases. Weird inputs. The model does something unexpected 5% of the time. That 5% is enough to break the experience for real users.
So you build a fix. A guardrail. A fallback system. A retry mechanism. Sometimes an entire parallel pipeline. This takes weeks. It’s clever engineering. You’re proud of it.
Six months later, a new model drops. The thing that failed 5% of the time? It now fails 0.01% of the time. The model just... got better. Your elaborate fix is now solving a problem that barely exists anymore.
But nobody removes the fix. Because it’s working. It’s tested. It’s in production. It has its own monitoring. Someone wrote documentation for it. Removing it feels riskier than keeping it. What if the model regresses? What if there’s a subtle edge case it still catches?
So it stays. And the next time you build something on top of that system, you’re building on top of a layer of complexity that shouldn’t be there. Your architecture reflects the limitations of a model that no longer exists.
RAG and the cure that’s worse than the disease
Retrieval-Augmented Generation is the poster child for this pattern.
Sometimes RAG is exactly right. You have a massive knowledge base, the model can’t possibly know about your internal documentation, and you need grounded answers with sources. Perfect use case. RAG shines.
But often, RAG is solving a problem for you by creating a bigger one. The original problem: the model doesn’t know about your specific data. The RAG solution: build a retrieval pipeline, chunk your documents, create embeddings, manage a vector database, tune your retrieval parameters, handle relevance scoring, deal with chunk boundaries cutting sentences in half, figure out re-ranking, and then hope the model actually uses the retrieved context correctly instead of hallucinating anyway.
You’ve traded one problem for twelve.
And with context windows getting larger and models getting better at reasoning over long documents, the question becomes: do you actually need retrieval, or can you just... put the documents in the prompt?
I’ve watched us build elaborate RAG pipelines for datasets that now comfortably fit in a single context window. The retrieval adds latency, introduces relevance failures, and occasionally surfaces the wrong chunk at the wrong time. The “just put it in the prompt” approach is slower per token but gets the right answer more often.
But the RAG pipeline exists. It’s instrumented. It has dashboards. Nobody wants to be the person who proposes ripping it out.
What removal actually looks like
Removing code that works is emotionally difficult. It’s also politically difficult. You’re essentially telling whoever built it, which is often past-you, that the work is no longer needed.
But I’ve started thinking about it differently. Every line of code in your codebase has a carrying cost. It’s one more thing that can break. One more thing a new developer has to understand. One more layer between you and what the model can actually do today.
When I find old workaround code now, I don’t ask “is this still working?” I ask “is the model still bad enough to need this?”
Usually the answer is no. Usually, the model got better while we weren’t paying attention.
The turn-based conversation code I found? Removed. Twelve files deleted. Nothing broke. The tests all passed. The model doesn’t need a chess clock anymore. It knows how conversations work.
Building for tomorrow’s model
There’s a deeper tension at play. We build software using Shape Up principles. You discover problems by working, not by imagining them upfront. You encounter a real issue, you solve it.
But in the LLM world, that cycle has a twist. You discover a problem today. You spend a week building a solution. By the time you’ve shipped the fix, the next model update has already eliminated the behavior that caused the problem.
So you’ve built a solution for a problem that no longer exists, and that solution is now load-bearing infrastructure in your system.
I don’t have a clean answer for this. You can’t just not fix things. Users are hitting the problem right now. You can’t tell them to wait for GPT-Next.
What I’ve started doing is building fixes that are easy to remove. Thin wrappers instead of deep integrations. Feature flags instead of architectural changes. Code that knows it might be temporary. It’s harder to build this way. It requires admitting, while you’re writing it, that this clever thing you’re making might be worthless in six months.
But six months later, when the model is better and the fix is obsolete, you’ll be grateful you made it easy to rip out.
The archaeological record
Our codebase is a geological record of every model’s limitations. Layer by layer, you can see what each generation of LLMs couldn’t do.
The deepest layer: turn-based enforcement, manual prompt formatting, temperature tuning hacks.
Above that: context window management, conversation summarization, sliding window implementations.
Above that: custom tool-calling frameworks, JSON parsing with regex, retry-on-malformed-output loops.
Above that: RAG pipelines for datasets that now fit in the context window.
Each layer was essential when it was built. Each layer is now, at best, unnecessary overhead. At worst, it’s actively interfering with what the model can do natively.
The curse of being early isn’t that you made bad decisions. You made the best decisions you could with the models you had. The curse is that those decisions calcified into infrastructure, and removing infrastructure is always harder than adding it.
The companies that will build the best AI products aren’t the ones that build the most. They’re the ones willing to throw the most away.
Excavating the codebase at neople.io.



