DataHQ, AI thesis, The Data Product is the missing layer

The missing layer for enterprise AI is the Data Product, and the right way to build Data Products is above the engine, not inside it.

Everyone is layering AI on top of warehouses, lakehouses, and ERPs, and everyone is hitting the same ceiling. The model isn't the bottleneck, the context is. And the context you need, a versioned, governed, semantically precise view of how your business actually works, does not live in the engine. It has to be built on top of it, as a first-class product.

DataHQ · April 2026 12 minute read

Every event becomes a data point. Every data point posts to a driver. The DataHQ ledger is the dictionary.

Most BI products start at the report. We start at the dictionary, the contract that says what each event is, what each driver is, and how each one maps to a row on the General Ledger. Once that dictionary exists, three things become trivial that are otherwise impossible: closing the books in days, answering an agent's question without hallucination, and simulating the business by dragging a driver.

LLMs cannot plan alone. That isn't opinion, it's benchmark.

Work presented at ICML 2024 on the LLM-Modulo framework, together with PlanBench results, makes the case formally. Frontier models, on their own, score well below chance on standard planning tasks. The fix is not a bigger model. The fix is a framework where the LLM generates, and an external, formal, domain-aware layer of critics verifies, reformats, and back-prompts until a valid solution is reached. That external layer is exactly what a data product looks like.

Source · Kambhampati et al., LLM-Modulo, ICML 2024 · PlanBench, as of 8/27/2024

Domain	Shot	Claude 3.5 Sonnet	Claude 3 Opus	GPT-4o	GPT-4	GPT-4 Turbo	LLaMA-3.1 405B	LLaMA-3 70B	Gemini Pro
Blocksworld	One-shot	346/60057.6%	289/60048.1%	170/60028.3%	206/60034.3%	138/60023.0%	284/60047.3%	76/60012.6%	68/60011.3%
Blocksworld	Zero-shot	329/60054.8%	356/60059.3%	213/60035.5%	210/60034.6%	241/60040.1%	376/60062.6%	205/60034.2%	3/6000.5%
Mystery Blocksworld	One-shot	19/6003.1%	8/6001.3%	5/6000.8%	26/6004.3%	5/6000.8%	21/6003.5%	15/6002.5%	2/5000.4%
Mystery Blocksworld	Zero-shot	0/6000.0%	0/6000.0%	0/6000.0%	1/6000.2%	1/6000.2%	5/6000.8%	0/6000.0%	0/5000.0%

Domain

Shot

Claude 3.5 Sonnet

Claude 3 Opus

GPT-4o

GPT-4

GPT-4 Turbo

LLaMA-3.1 405B

LLaMA-3 70B

Gemini Pro

Blocksworld

One-shot

346/60057.6%

289/60048.1%

170/60028.3%

206/60034.3%

138/60023.0%

284/60047.3%

76/60012.6%

68/60011.3%

Blocksworld

Zero-shot

329/60054.8%

356/60059.3%

213/60035.5%

210/60034.6%

241/60040.1%

376/60062.6%

205/60034.2%

3/6000.5%

Mystery Blocksworld

One-shot

19/6003.1%

8/6001.3%

5/6000.8%

26/6004.3%

5/6000.8%

21/6003.5%

15/6002.5%

2/5000.4%

Mystery Blocksworld

Zero-shot

0/6000.0%

1/6000.2%

5/6000.8%

0/6000.0%

0/5000.0%

User trust · false confidence

Reasoning traces are persuasive but not informative. They engender false trust regardless of correctness.

Between-subject user study: participants shown an LLM's chain-of-thought or a post-hoc explanation accepted the AI's answer at the same rate whether it was correct or wrong. Only a contrastive dual explanation, arguments for and against the AI's answer, genuinely improved users' ability to spot incorrect outputs.

Palod V., Biswas U., Kambhampati S. · Arizona State, May 2026 · "Evaluating the False Trust Engendered by LLM Explanations" · arXiv:2605.10930

DataHQ implication →

Asking an agent to "show its reasoning" is theatre when the reasoning is internal tokens. The verifier is the explanation. Every Pilot answer must come with the chart, the SQL query, and the row that proves it, and ideally the counter-case as well. Ungrounded "explanations" actively make decision-makers more confident in wrong answers.

Chain-of-Thought is not reasoning

Models trained on corrupted reasoning traces match, and often outperform, models trained on correct traces.

Transformers trained from scratch on formally-verifiable traces still produce invalid traces while arriving at correct answers. More striking: models trained on intermediate steps that bear no relation to the problem perform on par with correct-trace training, and generalise better on out-of-distribution tasks. The semantic content of the trace is largely irrelevant to task performance.

Valmeekam K., Palod V., Stechly K., Gundawar A., Kambhampati S. · TMLR April 2026 · "Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens"

DataHQ implication →

The output of a "reasoning model" is not a proof of reasoning. You cannot use the trace as audit evidence. The only durable verifier is external: a governed semantic model, a domain critic, a query that returns the same answer whether the LLM produced it or not. Which is, again, the data product above the engine.

The LRM tax

o1 and DeepSeek R1 improve accuracy on planning, at large, unpredictable inference-time cost, and compound systems match them at the same price.

First comprehensive PlanBench-style evaluation of Large Reasoning Models. o1-preview and o1-mini do beat frontier LLMs on accuracy, but the accuracy gain comes with token bills and latency that are an order of magnitude higher and an order of magnitude more variable. At the same effective price point, compound systems, LLM + external verifier in the loop, perform comparably. The verifier remains the cheaper and more reliable lever.

Valmeekam K., Stechly K., Gundawar A., Kambhampati S. · TMLR April 2025 · "A Systematic Evaluation of the Planning and Scheduling Abilities of the Reasoning Model o1"

DataHQ implication →

You don't have to pay the LRM tax to get the answer right. The compound system wins on cost, latency, and reliability. The data product is what makes the compound system possible, it's where the critic bank, the rules, the semantic contracts live. DataHQ Pilot is the operator-facing instantiation of exactly this architecture.

Every event becomes a data point. Every data point posts to a driver. The DataHQ ledger is the dictionary.

Source systems

Process logs & events → data points

DataHQ FinOps Ledger, the analytics-integrable Chart of Accounts

Your General Ledger CoAZoho · SAP · Oracle · Tally

What it enables

The dictionary, not the report.

The context engine, because of structure.

The simulation engine, because of drivers.

Above the engine, not inside it.

Two ways to build a data product. Only one compounds.

Data product inside the engine

Data product above the engine

LLMs cannot plan alone. That isn't opinion, it's benchmark.

PlanBench, frontier-model results

LLM-Modulo · a principled framework for planning where LLMs play multiple constructive roles.

Six things the research community now accepts.

LLMs trained just on web corpora have severe limits on planning and reasoning tasks.

They can still be good arbiters of style, though.

In a Generate-Test cycle, LLMs become robust generators.

The improved behavior can be "compiled" back into the base LLM.

The resulting LRMs still have no correctness guarantees, they're just better generators.

The anthropomorphization of intermediate tokens as "reasoning traces" is questionable.

The case keeps building. Same conclusion: external verifier, not bigger model.

Reasoning traces are persuasive but not informative. They engender false trust regardless of correctness.

Models trained on corrupted reasoning traces match, and often outperform, models trained on correct traces.

o1 and DeepSeek R1 improve accuracy on planning, at large, unpredictable inference-time cost, and compound systems match them at the same price.

The critics, the meta-controller, the blackboard, the synthetic-data loop, these are the data product.

Why now, why this layer, why DataHQ.

Models are cheap, context is not.

The warehouse is the wrong chassis.

Agents need a surface, not a SQL prompt.

Planning and reporting converge on the same object.

Six properties, non-negotiable.

Addressable

Versioned

Governed

Observable

Lineaged

Writable

We are building the layer we wish every enterprise had ten years ago.

Your General Ledger CoA
Zoho · SAP · Oracle · Tally