Your Agent’s Memory Is Broken. Here’s Why.

The best AI agent in the world currently relies on grep.

Feb 06, 2026

When Anthropic built Claude Code, they did their homework. They evaluated the vector embeddings that a $2 billion industry told them were the future of retrieval.

They threw them in the bin. Chose grep instead. A tool from 1973.

This wasn’t a mistake. It was a verdict. Vector embeddings are frozen the moment you ingest them. They capture what a document meant at one point in time. But code changes every minute. Grep is crude, but at least it’s honest. It searches what actually exists right now.

Anthropic isn’t an outlier. The smartest teams building agents are quietly walking away from the retrieval stack they were sold. They’re right to walk away. But they’re also throwing out something important in the process.

The Vector Database Identity Crisis

Here’s the thing nobody wants to say out loud: vector databases were never designed for this.

They were built for static similarity search. Product catalogs, image archives, document collections that don’t change much. You embed once, you index, you look stuff up later. It’s a lookup structure, not a brain. A perfectly fine technology for that problem.

But somewhere along the way, we told an entire generation of developers this was the foundation for intelligent retrieval. Just embed your docs, throw them in a vector DB, do RAG. It became a checkbox feature. Step 3 of every AI tutorial.

Meanwhile, look at what the vector DB companies actually compete on: latency, throughput, cost per query, index size. Storage metrics. Not one of them benchmarks retrieval quality. Not one measures whether the documents returned actually help the agent get the right answer. The whole industry is optimized for making the wrong answer faster and cheaper.

That’s not a retrieval revolution. That’s a commodity storage business wearing an AI costume.

Baby, Meet Bathwater

So teams like Anthropic looked at this and made a reasonable call: if the retrieval layer can’t be trusted, route around it. Use grep, keyword search, brute-force context stuffing. Anything that gives you freshness guarantees, even if you sacrifice semantic understanding.

This works for code. Codebases are structured, self-contained, and lexically searchable. If you’re looking for memory_implementation, the files containing those exact tokens are probably what you want.

But try grepping two million documents to answer “What changed in Python 3.12’s exception handling?” Keyword matching falls apart the moment the query and the answer use different words. Which, in the real world, is most of the time.

In rejecting frozen embeddings, the industry accidentally rejected the idea that retrieval should understand meaning at all. That’s the baby going out with the bathwater.

Better Shelves, Same Broken Catalog

There’s a whole wave of academic and open-source work trying to patch this. Projects like xMemory, Mem0, A-MEM, Letta (née MemGPT), MemoryOS. They’re doing genuinely interesting things. Organizing memories into hierarchies. Managing context windows like an OS manages RAM. Building knowledge graphs over conversation history.

But they all share the same blind spot: they assume the underlying retrieval is fine. They’re building world-class shelving systems for a library where the card catalog is broken.

Get down to the atomic level of any of these systems. The actual moment where it decides “is this document relevant to this query?” It’s still just a cosine similarity score between two frozen vectors. A score that gives "revenue increased" and "revenue decreased" a similarity of 0.92.

You can organize your memories beautifully. You can build the most elegant hierarchy in the world. If the retrieval layer can’t tell up from down, none of it matters.

The Context Window Money Pit

The other popular escape route: skip retrieval entirely. Just cram everything into a long context window and let the model figure it out.

The economics should scare you. Filling a million-token context window takes about 60 seconds and can cost anywhere from $0.50 to $20 per request. For an agent making dozens of retrieval calls per task, that math gets ugly fast.

But the deeper problem is simpler. Long context doesn’t learn. Every session starts at zero. If the agent discovers a document is misleading on Monday, it’ll fall for the same document again on Tuesday. You’re paying to re-teach the same lessons, hour after hour, forever.

In-context learning is the most powerful mechanism in AI. It’s also the most expensive way to store knowledge. Volatile RAM that gets zeroed out on every restart.

The Gap Nobody’s Talking About

In any other domain, this would be considered insane. Imagine a search engine that never learned from clicks. A spam filter that never adapted to new patterns. A recommendation engine that forgot your preferences between sessions. We solved static retrieval for the web twenty years ago.

And yet. Here we are building the most sophisticated AI agents in history on a retrieval layer with the learning capacity of a hash table.

The memory organization people are building better shelves. The context engineering people are brute-forcing around the problem. Everyone’s patching. Nobody’s fixing the foundation.

The question isn’t “how do we organize documents better around frozen embeddings?” It’s simpler and harder than that: why are we freezing embeddings at all?

If retrieval could learn. If the system that routes queries to documents actually got smarter with every interaction. Adapted to corrections. Improved the more agents used it. Most of the elaborate workarounds we’re building today wouldn’t be necessary. The hierarchical organizers, the context stuffing, the re-ranking pipelines. They’re all patches on a foundation that refuses to learn.

It’s time someone fixed that foundation.

I’m Ram Sriharsha. I spent 15 years building AI infrastructure: vector search at Pinecone (where I was CTO), data platforms at Splunk, ML systems at Yahoo and Databricks. Then I realized the thing I helped build was architecturally wrong for what comes next. Now I’m working on what comes after frozen retrieval. More on that soon.

Ram's Substack

Discussion about this post

Ready for more?