Every failing AI initiative I have diagnosed had a working model. The language model did exactly what it was supposed to do the moment you fed it clean data in a controlled test. The problem was never the model. The problem was everything that happened before the model received its input — and that is the part nobody buys a licence for, demos in a sales pitch, or puts on a slide.

This is not a niche observation. MIT's NANDA initiative studied the gap directly in its 2025 report The GenAI Divide: State of AI in Business — built on 150 leader interviews, a survey of 350 employees, and an analysis of 300 public AI deployments — and found that roughly 95 percent of enterprise generative AI pilots delivered no measurable return on the profit-and-loss statement. The diagnosis was blunt: the core issue is not the quality of the models but a "learning gap" — generic tools that never adapt to a specific organisation's workflows and data. That is the context layer problem, described from the outside by people who were not looking for it.

The context layer is the second component of the AI Operating System. It defines how data reaches the AI workflow, in what shape, at what speed, and with what domain knowledge attached. Get it right and a mediocre model produces excellent results. Get it wrong and a state-of-the-art model produces nothing usable. Most companies keep spending their attention on model selection, prompt engineering, and fine-tuning — the levers a vendor is happy to sell — while ignoring the unglamorous truth that their data cannot reach the workflow in a form the model can act on. The frontier moved past the model two product generations ago. For the DACH Mittelstand, the binding constraint is almost never which model you choose. It is whether your data can get to it.

What context actually means

Context is not a synonym for data. Data is raw material. Context is data that has been made accessible, shaped for the task, kept current, and enriched with domain knowledge. Four properties define a functioning context layer, and a workflow that is weak on any one of them will disappoint regardless of how good the model is.

Accessibility is the first and most underestimated. The question is never "do we have the data" — that is almost always answered yes. The question is whether there is a programmatic path from where the data lives to where the model needs it. In a typical DACH Mittelstand company, the data sits across SAP, a handful of Excel files maintained by specific individuals, a document management system that predates the smartphone, and tribal knowledge locked in the heads of three people who have been with the company for twenty years. The data exists. The path does not.

Quality is the second, and it does not mean perfection — it means fitness for purpose. A workflow that classifies incoming insurance claims needs the claim description, the damage type, and the policy number. If those three fields are consistently populated, the data quality is sufficient even if the customer address has formatting inconsistencies. The mistake is treating quality as a binary — either "our data is clean" or "our data is a mess." Neither framing is useful. The useful question is narrower: for this specific workflow, are the required fields populated and consistent enough to produce reliable outputs? It is worth noting that European regulators have now made this discipline mandatory at the high-risk end. Article 10 of the EU AI Act — which applies to high-risk systems from 2 August 2026 — requires that training, validation, and testing data be "relevant, sufficiently representative, and, to the best extent possible, free of errors and complete in view of the intended purpose." Fitness for purpose is no longer just good engineering. For regulated use cases it is a legal standard you must be able to document.

Freshness is the third. How old is the data when it reaches the model? A claims-triage workflow that processes yesterday's claims is useful; one that processes last month's claims is useless. A product recommendation engine running on last week's inventory will cheerfully recommend out-of-stock items. Requirements vary by workflow — some need real-time data, most need data less than 24 hours old, almost none can tolerate data more than a week stale — but the requirement must be named and enforced, not assumed.

Domain context is the fourth and hardest. Raw data without domain context produces raw outputs. The model needs to know that a Kaskoschaden is not a Haftpflichtschaden; that a €5,000 claim on a commercial policy is routine but the same amount on a private liability policy is unusual; that "Lieferant A" has been reliable for fifteen years while "Lieferant B" defaulted twice last quarter. This is the institutional knowledge experienced employees carry in their heads. Making it explicit and available to the workflow is the most valuable and most labour-intensive part of building a context layer.

The gap between "we have data" and "the AI can use it"

Every enterprise I work with has data. None of them has data that is immediately usable by an AI workflow. The gap between possession and usability is the context layer problem, and it shows up in three predictable patterns across DACH industries.

Data locked in SAP is the first. The data is in SAP. The system has no modern API. Extraction means either a custom ABAP report — which IT estimates at four months — or a manual export that someone runs every Friday afternoon. Neither is a foundation for a production workflow. This pattern is especially common in manufacturing and retail, where the data that would make AI transformative (order histories, inventory levels, supplier performance) is technically accessible but practically walled off.

Excel as the integration layer is the second. The real data pipeline turns out to be a collection of spreadsheets maintained by specific people: the procurement specialist's file tracking supplier lead times, the quality manager's defect log, the sales director's customer-segment profitability sheet. These spreadsheets contain genuine business intelligence. They are also single points of failure, never version-controlled, and impossible to access programmatically.

Tribal knowledge is the third and the most valuable. The best context is in no system at all. It is the claims handler who knows which repair shop inflates estimates, the procurement manager who knows a supplier's quoted lead times always run two weeks optimistic, the service agent who can tell from the tone of a complaint whether it will escalate. This knowledge is real and decisive, and completely invisible to any AI system that has no mechanism to capture and encode it.

Building the context layer

The context layer is not a data warehouse project. It is a focused effort to make the right data available for specific workflows, and it follows a practical sequence.

Start with the workflow, not the data. For each AI workflow you intend to deploy, document exactly what the model needs to produce its output — field names, formats, freshness, volume. A claims-triage workflow might need the claim description as free text, the damage category as a code, the claim amount as a number, the policy type as a code, and the customer's last five claims. That is five specific data points, not "all customer data." The specificity is the point: it turns an unbounded data problem into a bounded engineering task.

Trace the data path honestly. For each required field, follow it from where it lives to where the model needs it. Is there an API, a database connection, a file export, or a person who copies it by hand? If the path is "Maria exports it from SAP every Friday and emails it to Thomas," write that down verbatim. That sentence is your real architecture, and naming it is the first step to replacing it.

Build the minimum viable pipeline. You do not need a real-time streaming architecture to start. You need a reliable, automated pipeline that delivers the required data in the required format at the required frequency. For many DACH Mittelstand workflows that means a scheduled query or API call running nightly, depositing a structured file in a defined location, triggering a validation check, and processing only if the file is valid — otherwise raising an alert. This is not glamorous. It is reliable, and reliability is the entire point.

Encode the domain context. Take the tribal knowledge and make it explicit, usually as a structured knowledge base capturing the rules, exceptions, and judgement heuristics experienced staff apply. For claims triage that means a table mapping damage types to expected claim ranges, a list of flagged repair shops, and rules for when a claim escalates regardless of amount. These are not training data; they are reference material the workflow consults at runtime. Retrieval-augmented generation is the standard pattern here — the model retrieves relevant domain context before it generates — and the quality of that knowledge base sets a hard ceiling on the quality of the output.

Establish freshness guarantees. Define and enforce a freshness commitment for each source. If claims data must be less than 24 hours old, build monitoring that alerts when the pipeline has not run. If the knowledge base must reflect current policy, assign an owner who reviews it on a fixed cadence. Freshness is not a technical property. It is an operational commitment, and someone has to own it by name.

A context readiness check before you deploy

Before deploying any AI workflow, pressure-test the context layer across the same four properties. On accessibility, ask whether each required field can be reached programmatically, whether the path is documented and owned by a named person, and whether you can provision a staging environment without touching production. On quality, ask whether the workflow-critical fields are sufficiently populated, whether coded fields are clean of duplicates and stray free text, and whether there is a defined process for missing or invalid records. On freshness, ask whether the requirement is defined per source, whether monitoring exists for pipeline failures, and whether the current pipeline actually meets the requirement. On domain context, ask whether experts have been interviewed to capture their decision heuristics, whether the knowledge base is structured and queryable, and whether someone owns keeping it current.

Score each area as blocking, weak, adequate, or strong. Two or more blocking scores mean the context layer needs work before deployment is viable — and shipping anyway is precisely how a working model ends up in the 95 percent that return nothing. The check costs an afternoon. The pilot that skips it costs a quarter and the credibility of the next AI request that crosses the Geschäftsführung's desk.

Why context compounds

The context layer is not a one-time build. It is an asset that appreciates. Every workflow that uses it validates and enriches the underlying data: claims that are correctly triaged confirm the domain rules, and claims that are mis-triaged expose gaps that, once fixed, sharpen the next cycle. This is how AI initiatives actually compound — not through better models but through richer context. The learning component of the AI Operating System formalises this feedback loop, and it is exactly the adaptation the MIT research found missing in the pilots that stalled. A competitor who buys the same model as you has bought nothing you do not already have. A competitor who has spent two years encoding their domain context into a layer their workflows can query has built something you cannot purchase and cannot shortcut.

The companies that build a production-grade context layer for their first workflow find the second one dramatically easier. The data paths already exist, the knowledge base is started, the freshness guarantees are in place. The marginal cost of context falls with every deployment, which is why the first workflow is less a use case than an investment in the layer underneath it.

Where to start

If you suspect your AI initiatives are failing on context rather than models — and the evidence says they almost certainly are — take two actions. First, pick any stalled AI project and map its actual data path from source to model, documenting every manual step, every Excel handoff, every tribal-knowledge dependency. Second, hold that map against the four readiness properties above. The gap between where you are and where you need to be is your context layer project. It is less exciting than model selection. It is more important than everything else combined.

A Fit Call maps the real data path behind one stalled workflow and tells you whether the context layer — not the model — is what is holding it back, before you spend another quarter blaming the wrong thing.

Book a Fit Call →


References: MIT NANDA initiative, "The GenAI Divide: State of AI in Business 2025," 2025 (reported by Fortune, fortune.com); EU Artificial Intelligence Act, Article 10 "Data and data governance," applicable to high-risk systems from 2 August 2026 (artificialintelligenceact.eu/article/10).