Observability has three mature output formats: logs, metrics, traces. All three answer what happened. None of them answer why. That answer has always lived somewhere else — in a dashboard, in a runbook, in the head of the engineer who got paged at 3am.
For thirty years the stack got away with this. The consumer of observability data was a human. Humans are good at causal reconstruction. They look at a spike in latency, remember there was a deploy ten minutes ago, grep the logs for the service name, open the traces view, cross-reference the timestamps, and conclude "probably the new code." The stack gives them five unrelated data streams; they assemble the story in their head.
That model is now breaking.
Agents don’t reconstruct, they traverse
When the thing reading your observability data is an AI agent, the five-unrelated-streams model falls apart. Not because the agent is incapable. Given enough context, it performs the same reconstruction a human does. But the context cost is enormous. The agent has to grep the same logs, open the same traces, cross-reference the same timestamps, and hold the whole chain in its working memory. Every incident pays this tax from scratch. Every time.
The reason humans got away with this workflow is that they accumulated a second artifact alongside the data: the mental model. After six months on a platform, an engineer knows that when node-7’s memory climbs past 90%, two pods start restarting, and checkout latency spikes, that’s the Tuesday 3am batch job. It’s not written down. It’s in their head. When they leave, it leaves with them.
Agents don’t accumulate. Every investigation starts from zero. Unless the system itself carries the knowledge.
Semantics belong in the data, not the dashboard
A dashboard is a semantic layer. It’s the place where raw numbers become meaningful: this red line is checkout error rate, this alert fires when it crosses 2%, this graph’s y-axis is dollars lost per minute of downtime. None of that meaning is in the metric itself. It’s in the chart, and the chart was made by a human for other humans.
This worked when humans were the only readers. It breaks when the reader is an LLM with no prior context. The LLM sees the metric. It does not see the chart.
The fix is not to put an LLM inside the observability tool and ask it to re-derive the meaning from natural-language reasoning over raw events. That’s expensive, slow, and non-deterministic. The fix is to move the semantics out of the dashboard and into the data itself.
Traditional event:
{ "level": "error", "message": "pool exhausted" }
Semantic event:
{
"what_failed": "auth pool exhausted in auth-service",
"why_it_matters": "checkouts will time out in ~30s",
"possible_causes": ["db connection leak", "traffic spike"],
"affected_scope": ["checkout", "billing"],
"suggested_fix": "restart pod, inspect pool config",
"confidence": 0.82
}
The difference isn’t verbosity, it’s intent. The second event was written to be reasoned about. It tells a reader, human or otherwise, what the fault actually means for the system around it. No dashboard required.
Correlation is a data structure, not an algorithm
The cleanest way to see this is Git.
Git doesn’t have a correlation engine. It doesn’t have an ML model that figures out which commit caused which regression. It has commits that point to their parents. You follow the pointer. The data structure is the correlation.
When you ask "what changed between these two states of the codebase," Git doesn’t compute anything clever. It walks the commit graph. When you ask "who wrote this line," git blame traverses a chain that was already there. The answer isn’t inferred at query time. It was recorded at write time.
Current observability has the inverse model. Events go in flat, unrelated. The causal relationships between them get inferred at query time, usually by a human, sometimes by a correlation scoring engine that looks at timestamps and tries to guess. The cost is paid on every query, and the quality is probabilistic.
What if observability worked like Git instead? Every change to infrastructure is recorded as an event. Every change to an entity — a pod, a node, a service — produces a new version, and the new version points back to the event that caused it. When something breaks, you don’t query for related events. You follow the pointer.
❯ ahti blame pod-checkout-api
pod-checkout-api state: CrashLoopBackOff
version 47 caused_by: kernel.oom_kill (0.94)
node-7 memory at 97.2%, 3 pods competing
pattern: kernel.oom_kill → container.terminated
(learned, 847 observations)
version 46 caused_by: k8s.pod.scheduled
version 45 caused_by: k8s.deployment.rollout (canary 25%)
Every version points to the event that caused it. Every event points to its own cause. The chain is walkable, not searchable. No correlation engine. No fuzzy timestamp matching. No scoring.
Patterns as first-class objects
The other thing Git has that observability doesn’t is learned history.
In Git, you can ask questions like "when did this test first start failing," and the answer is in the commit graph. Observability systems can’t answer analogous questions — "when did this latency spike first become a Tuesday 3am thing" — because the patterns aren’t stored. The system sees individual events. The pattern lives in the engineer’s intuition.
A semantically-rich observability system treats patterns as first-class data: kernel.oom_kill → container.terminated, 847 observations, 94% confidence, median 2.3s. Learned from history. Queryable. Explainable.
This is the piece that makes the stack accumulate rather than reset. After six months, the system knows what the engineer knows. The engineer leaves. The knowledge stays.
The market is already naming this
This argument isn’t just internal. Gartner’s recent research on agent reliability arrives at the same place. The firm predicts that more than 50% of AI agent systems will run on context graphs by 2028, naming semantic retrieval and causal structure as the reliability layer that separates agents that work from agents that don’t.
The diagnosis of failure is consistent. The primary cause of agent underperformance isn’t the LLM — it’s retrieval against what Gartner calls ROT content: redundant, obsolete, trivial. The fix, per the same research, is governance enforcing APT content — accurate, pertinent, trusted — at the retrieval layer.
The broader forecast: 50% of AI agent deployments by 2030 will result in financial or reputational loss traceable to exactly this problem — insufficient contextual grounding. For anyone deploying agents at scale, the question is which side of that statistic they want to be on.
AI-native does not mean AI-inside
One thing worth stating directly, because the industry is muddled on it: semantically-rich observability is not observability with a chatbot on top.
A chatbot on top of a traditional observability stack is a translator. It reads the same flat, unrelated data and tries to compose a sensible answer in English. Sometimes it succeeds. Often it hallucinates. It’s slow, expensive, and non-deterministic, because the underlying data doesn’t carry the meaning it’s being asked to generate.
A semantically-rich observability stack is structured so that the meaning is already in the data when the model arrives. The model doesn’t generate it; it presents it. A much cheaper query, a much faster response, a much more defensible answer.
AI-native observability is observability whose output is legible to a model. It is not observability that runs a model internally.
Where this goes
The category currently labeled "observability" was built around three assumptions: the reader is human, causality is reconstructed at query time, and meaning lives in the visualization layer. All three are now wrong.
What replaces them is a stack where events are semantic, entities are versioned, causality is data, and patterns are learned. Ahti is False Systems’ attempt at the causality and pattern layers. The FALSE Protocol is the event schema underneath, with meaning baked into every occurrence. Together with the supporting stack — event routing, context memory, kernel observation — they’re the shape of what an operations layer looks like when the primary reader is no longer human.
The infrastructure isn’t different. The thing reading about it is.
Ahti and the FALSE Protocol are part of False Systems, the operations layer for a world where agents are the operators. Both are in active development. GitHub.