Every AI workflow makes decisions. That is the point. The question is not whether AI should decide — it is which decisions, under what conditions, with what authority, and with what fallback when it gets it wrong.

Most enterprises get this wrong in one of two directions. They over-automate — letting AI make decisions that require human judgment, creating compliance risk and eroding trust. Or they under-delegate — requiring human approval for every output, which eliminates most of the operating leverage that justified the investment.

Decision architecture is the third component of the AI Operating System. It sits between the context layer (which defines what the AI knows) and workflow design (which defines what the AI does). Its job is to define who decides what.

The decision spectrum

Decisions in an AI workflow exist on a spectrum. The mistake is treating them as binary — either the human decides or the AI decides. In practice, there are five distinct configurations:

Fully automated. The AI decides and acts without human involvement. The output is delivered, the action is taken, the record is logged. Humans monitor aggregate performance metrics, not individual decisions. Example: classifying incoming support emails by category and routing them to the correct queue.

AI acts, human is notified. The AI decides and acts, but a human receives a notification of every decision. The human can intervene retroactively if something is wrong but does not need to approve each decision in advance. Example: auto-approving expense reports under €500 with notification to the finance team.

AI recommends, human decides. The AI analyses the inputs and presents a recommendation with supporting evidence. The human makes the final decision. Example: claims triage where the AI recommends approval or further investigation, and the claims handler makes the call.

AI prepares, human decides. The AI structures and summarises information but does not make a recommendation. The human receives an organised briefing rather than raw data. Example: due diligence summaries for M&A transactions where the AI compiles and structures data but the investment committee makes the judgment call.

Human only. The AI is not involved in the decision. Some decisions should remain entirely human — not because the AI cannot process the inputs, but because the consequences of error, the need for empathy, or the regulatory requirements make human judgment non-negotiable. Example: employee termination decisions.

Mapping decisions to the right level

The right position on the spectrum is determined by three factors: consequence severity, decision structure, and regulatory requirement.

Consequence severity

What happens when the decision is wrong? If a misclassified email goes to the wrong support queue, the consequence is a minor delay. If a fraudulent insurance claim is auto-approved, the consequence is financial loss and potential regulatory exposure.

Low-consequence decisions can be pushed toward full automation. High-consequence decisions need human involvement — but that does not mean human-only. It often means AI recommends, human decides.

Decision structure

How rule-based is the decision? If the decision can be fully expressed as a decision tree — if condition A and condition B and not condition C, then approve — it is a candidate for full automation regardless of consequence severity, because the decision logic can be validated exhaustively.

If the decision requires weighing ambiguous evidence, considering context that is difficult to formalise, or exercising judgment that experienced professionals develop over years — it needs human involvement. The AI can still add value by structuring the evidence and highlighting relevant factors, but the judgment call stays human.

Regulatory requirement

The EU AI Act and sector-specific regulations impose explicit requirements on certain decision categories. High-risk AI systems — including those used in insurance underwriting, credit scoring, and employment decisions — require human oversight by law. This is not a design choice. It is a compliance requirement.

For DACH enterprises, this means certain decisions must be "AI recommends, human decides" regardless of what the technical capability would allow. Build this into your architecture from day one, not as a retroactive compliance fix. For details on EU AI Act classification, see the EU AI Act compliance guide.

The confidence threshold model

One of the most effective patterns we deploy is the confidence threshold model. Rather than assigning every instance of a decision type to the same position on the spectrum, the system routes individual decisions based on the AI's confidence in its output.

Here is how it works for an insurance claims triage workflow:

Above 95% confidence, below €2,000 claim value: Fully automated. The AI classifies the claim, determines the appropriate handler, and routes it. The claims handler sees a pre-classified, pre-routed claim.

Between 80% and 95% confidence, or €2,000–€10,000 claim value: AI recommends, human decides. The AI presents its classification with a confidence score and the evidence it used. The claims handler reviews and either confirms or overrides.

Below 80% confidence, or above €10,000 claim value: AI prepares, human decides. The AI structures the available information but does not make a recommendation. The claims handler reviews the full case.

Fraud indicators present: Human only, flagged for specialist review. The AI flags the indicators but takes no classification action.

This model achieves two things simultaneously. It captures the efficiency gains of automation for straightforward cases — which are the majority. And it preserves human judgment for cases that need it — without burdening humans with reviewing every routine decision.

The thresholds are not set once and forgotten. They are calibrated during deployment and refined based on outcome data. If the auto-approved claims under €2,000 show an acceptable error rate after 90 days, the threshold might be raised to €3,000. If the 80–95% confidence band shows too many overrides, the lower threshold might be raised to 85%. This calibration is part of the review cycle.

Decision architecture in practice: DACH examples

Insurance: claims processing

A German insurance group processes 4,000 property damage claims per month. Before implementing decision architecture, every claim was reviewed by a human handler — regardless of whether it was a €200 broken window or a €50,000 water damage case.

The decision architecture:

  • Claims under €1,000 with clear damage category, high confidence match to standard repair costs, and no fraud indicators: auto-approved and routed for payment processing. Human notified.
  • Claims €1,000–€10,000 with standard damage patterns: AI recommends approval amount and handler reviews. Approval rate of AI recommendations: 89%.
  • Claims above €10,000 or with unusual patterns: AI prepares a structured case summary. Handler decides from scratch with enriched information.
  • Any claim with fraud indicators: flagged for specialist review. No AI recommendation.

Result: 45% of claims are auto-processed. Average handling time for the remaining 55% dropped by 35% because handlers receive pre-structured information. Total claims capacity increased by 40% without additional headcount.

Industrial: incoming order processing

A manufacturing company in Baden-Württemberg receives 200 orders per day across email, fax (yes, fax), and their web portal. Order processing involves classifying the order, checking inventory availability, confirming delivery dates, and routing to production planning.

The decision architecture:

  • Standard catalogue items, stock available, standard delivery terms: auto-confirmed. Customer receives confirmation within 2 hours instead of 48.
  • Standard items but stock below threshold or delivery date conflict: AI recommends alternatives (substitute product, adjusted delivery date). Sales confirms or modifies.
  • Custom specifications, volume above threshold, or new customer: AI prepares order summary with customer history and margin calculation. Key account manager decides.

Result: 60% of orders are auto-confirmed. Sales team focuses on the high-value 40% that actually requires their expertise. Order-to-confirmation time dropped from an average of 2 days to 4 hours across all orders.

Retail: pricing decisions

An e-commerce retailer with 15,000 SKUs adjusts prices based on competitor data, inventory levels, and margin targets.

The decision architecture:

  • Commodity products with clear competitive benchmarks and stable margins: automated repricing within defined corridors (never below floor, never above ceiling). Category manager reviews weekly aggregates.
  • Seasonal products or products with volatile demand: AI recommends price adjustments with supporting data. Category manager approves or modifies daily.
  • Strategic products (brand-defining, loss leaders, new launches): human-only pricing decisions with AI-provided competitive intelligence.

Common mistakes

Mistake 1: Uniform decision authority

Applying the same level of human oversight to every decision in a workflow. If a human must review every classified email, you have eliminated the time savings. If no human reviews any claims decision, you have accepted risk that regulators will not tolerate.

The fix is granular authority assignment. Different decisions within the same workflow can and should sit at different positions on the spectrum.

Mistake 2: Static thresholds

Setting confidence thresholds and authority levels at deployment and never revisiting them. The right thresholds are empirical — they depend on actual error rates, actual consequence patterns, and actual team capacity. They should be calibrated quarterly based on outcome data.

Mistake 3: Confusing transparency with authority

Some organisations respond to concerns about AI decision-making by making the AI's reasoning visible to humans — without changing who makes the decision. Transparency is important, but it is not the same as human authority. If the human is expected to review the AI's reasoning and approve every decision, that is "AI recommends, human decides." If the human sees the reasoning but is not expected to intervene, that is "AI acts, human is notified." These are different configurations with different resource implications.

Mistake 4: Ignoring the cost of human review

Every human review step has a time cost. If an AI workflow processes 500 items per day and each human review takes 3 minutes, that is 25 person-hours per day of review work. Before requiring human review, calculate the cost and compare it against the risk of the alternative. Sometimes the cost of reviewing every decision exceeds the expected cost of the errors that review would prevent.

Decision architecture and the EU AI Act

The EU AI Act mandates human oversight for high-risk AI systems. This is not optional for DACH enterprises. But the Act does not prescribe a specific implementation. It requires that humans can understand the AI's outputs, can decide not to use the system, can intervene in real time or stop the system, and can override the AI's output.

The confidence threshold model satisfies these requirements when implemented correctly. The human can see the AI's classification and confidence score (understanding). The human can override any individual decision (intervention). The system can be stopped or bypassed at any time (stopping). And the thresholds can be adjusted to shift more decisions to human review (override).

The key is building these capabilities into the architecture from the start, not adding them as a compliance layer afterward. Decision architecture that is designed with regulatory requirements in mind is both more compliant and more efficient than architecture that treats compliance as an afterthought.

Building your decision architecture

Start with a single workflow. List every decision point in that workflow. For each decision point, assess consequence severity, decision structure, and regulatory requirements. Assign an initial position on the spectrum. Define the confidence thresholds if applicable. Document the escalation path for cases that fall outside the defined parameters.

Then deploy, measure, and calibrate. The initial architecture is a hypothesis. The production architecture is what emerges after 90 days of outcome data.

The full framework for designing decision architecture, including the consequence-structure matrix and threshold calibration methodology, is in Chapter 05 of The AI Operating System. For the related question of when to automate versus augment, see Automation vs. Augmentation.

For a conversation about designing decision architecture for your AI workflows, book a Fit Call.

Book a Fit Call →