There is a word that runs through the major enterprise AI studies of 2025, and it is not "agents," "multimodal," or "reasoning." It is trust. McKinsey's State of AI survey reports that 74% of organisations rate inaccuracy as a relevant risk — the single most-cited concern, ahead of the 72% who name cybersecurity. Accenture's Technology Vision 2025 finds that 77% of executives believe the true benefits of AI will only materialise when built on a foundation of trust, and 81% say their trust strategies must evolve in lockstep with their technology strategies. These are not adjacent observations. They describe one structural constraint from two angles: enterprises cannot scale what they do not trust, and they do not trust what they cannot observe, govern, or validate.
The trust deficit is not a perception problem awaiting better marketing from AI vendors. It is an operational bottleneck. When three-quarters of leaders rate AI accuracy a material risk, they respond rationally — they confine AI to low-stakes work where errors are cheap. Summarising meeting notes. Drafting first versions. Answering questions where a wrong answer costs nothing. The high-value applications — pricing decisions, claims adjudication, credit assessment, production planning — stay off-limits. And those are precisely the applications where the economic case for AI actually concentrates.
What trust actually means in operational terms
Trust is not a sentiment. Accenture frames it as "cognitive trust" built on four measurable properties: accuracy (the system produces correct outputs), predictability (it behaves consistently across similar inputs), consistency (it holds performance over time), and traceability (every output can be explained and audited). That is a meaningful departure from the "responsible AI" conversation of 2023 and 2024, which centred on bias and fairness. Those concerns remain valid, but they cover only a slice of what a Geschäftsführer means when he says he does not trust the system.
When a CFO says she does not trust the AI-generated forecast, she is not making a philosophical point. She is saying the system produced a number, and she has no way to verify how it was derived, whether the same inputs would yield the same output tomorrow, or what data was included and excluded. She does not lack confidence in AI as a concept. She lacks observability into a specific system making specific claims about her revenue pipeline.
That distinction changes the intervention. You do not fix an observability problem with ethics training. You fix it with monitoring infrastructure, output logging, confidence scoring, and validation workflows — the operational architecture that makes AI outputs auditable.
The risk landscape validates the concern
McKinsey's survey paints a consistent picture across risk categories. Inaccuracy leads at 74%; cybersecurity follows at 72%. And these are not hypothetical worries. Roughly half of organisations using AI report having experienced at least one negative consequence, with inaccuracy the most common. The risk perception, in other words, is grounded in lived operational experience, not paranoia.
That experience is specific and familiar. Organisations that deployed generative AI broadly encountered hallucinations in customer-facing systems, data exposure through prompt injection, inconsistent outputs across equivalent queries, and silent behaviour changes after a provider pushed a model update no one inside the company was monitoring. Each incident reinforced the instinct to keep AI away from anything that touches the income statement or a regulator. The enterprise hallucination risk analysis documents these failure modes in detail — they are not theoretical.
Governance maturity has not kept pace. Deloitte's State of AI in the Enterprise finds that only about one organisation in five has a mature model for governing autonomous AI agents — the very systems that take actions rather than merely generating text. As companies move from copilots toward agents that book, send, approve, and transact, the gap between deployment ambition and governance readiness widens rather than closes. Experience with AI does not automatically breed confidence. More often it reveals complexity that was invisible from the outside.
Why high performers face the same risks differently
McKinsey's most useful finding is also its most sobering: only around 6% of organisations report material, enterprise-wide EBIT impact from AI, even as the large majority now use it somewhere. These high performers do not operate in a lower-risk environment. They meet the same accuracy concerns, the same cybersecurity threats, the same reliability challenges. What sets them apart is structural — and the structure has three load-bearing parts.
The difference is observability. High performers monitor what their AI systems actually produce. They track accuracy against ground truth, log inputs and outputs for audit, and set confidence thresholds below which human review is mandatory. When a system drifts outside expected parameters, they detect it through automated alerts — not through a customer complaint three weeks later. That turns trust from a feeling into a measurement.
The difference is governance. High performers define what their AI can and cannot do, and they implement those definitions as system constraints rather than policy PDFs filed in SharePoint. The model cannot approve a claim above a set value, cannot change a price without review, cannot send an external communication without a human in the loop. Notably, McKinsey finds the value-capturing cohort is far more likely to redesign workflows around AI rather than bolt it onto existing ones — governance is built into the process, not appended to it. The governance framework for midmarket companies provides the operational structure for these controls.
The difference is validation. High performers prove their outputs are reliable before they scale them. They run structured evaluations against known-correct datasets, compare AI outputs to expert judgement, and measure not just accuracy but consistency and edge-case behaviour. Validation is not a one-time gate before launch; it runs continuously in production, catching degradation before it reaches a customer or a financial statement.
The EU AI Act makes trust infrastructure mandatory
For DACH enterprises, the trust conversation carries a regulatory dimension that other markets lack. The EU AI Act maps almost directly onto the trust architecture above. Article 15 requires high-risk AI systems to achieve an appropriate level of accuracy, robustness, and cybersecurity, and to perform consistently in those respects throughout their lifecycle. It demands declared accuracy metrics in the instructions for use, resilience against errors and faults — including technical redundancy and fail-safe measures — and explicit protection against data and model poisoning, adversarial examples, and unauthorised attempts to alter a system's behaviour. The obligations for high-risk systems become enforceable from 2 August 2026, which for many Mittelstand firms is no longer a distant horizon but the current planning year.
The regulatory requirement and the operational requirement converge. A firm that builds observability, governance, and validation to satisfy the EU AI Act simultaneously builds the trust foundation that lets it scale. A firm that treats compliance as paperwork — drafting policies without implementing controls — satisfies neither the regulator nor the executive team that has to trust the output before deploying it in a critical process. The compliance-by-design approach folds both objectives into a single architecture rather than running them as separate projects.
The trust roadmap
Make AI observable first. Before trust can be built, it has to be measured. Implement output logging, accuracy tracking, and confidence scoring for every system in production, and define what "trustworthy performance" means for each specific use case. Monitor it continuously, not quarterly.
Make AI governable. Define delegation rules for each workflow — what the AI decides, what it recommends, what it cannot touch — and implement them as enforced constraints rather than guidelines. Tighten or loosen them as the system's track record earns it.
Make AI provable. Build validation pipelines that compare outputs to ground truth on an ongoing basis, run structured evaluations before every material model update, and give business stakeholders internal accuracy evidence they can actually act on.
Then expand trust progressively. Trust is not binary. It is earned in a constrained domain and extended as the evidence base grows. Start where the cost of error is lowest and data quality highest, prove reliability, and move to the next workflow.
The 74% who cite inaccuracy as their top risk are not wrong. They are describing the current state of deployment, where most systems run without adequate observability, governance, or validation. The answer is not to argue that AI is trustworthy. It is to build the infrastructure that makes it so — and, in the EU, that increasingly overlaps with what the law already requires.
A Fit Call pinpoints whether trust infrastructure — not the model — is the bottleneck keeping you out of the high-value applications, before the EU AI Act's high-risk obligations land in August 2026.
References: McKinsey & Company, "The State of AI: How Organizations Are Rewiring to Capture Value," 2025 (https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-how-organizations-are-rewiring-to-capture-value); Accenture, "Technology Vision 2025" (https://newsroom.accenture.com/news/2025/accenture-technology-vision-2025-new-age-of-ai-to-bring-unprecedented-autonomy-to-business); Deloitte, "State of AI in the Enterprise" (https://www.deloitte.com/global/en/issues/generative-ai/state-of-ai-in-enterprise.html); EU AI Act, Regulation (EU) 2024/1689, Article 15 — Accuracy, Robustness and Cybersecurity (https://artificialintelligenceact.eu/article/15/).
