"We need to self-host for data sovereignty" is the most common AI infrastructure assertion in DACH boardrooms. It is also the most frequently incorrect.
Data sovereignty is a legitimate requirement. Self-hosting is one way to meet it — but not the only way, and rarely the cheapest. The decision deserves a framework, not a reflex. What usually happens instead is that a single sentence in a board meeting — "our data cannot leave the house" — quietly commits the company to a seven-figure operational obligation that nobody priced and nobody can staff. This article is the framework we use to take that decision apart before the money is spent.
The decision tree
Three questions determine whether self-hosting is genuinely necessary. Answer them honestly and the architecture usually decides itself.
Question 1: Does your data actually need to stay on your own servers?
Most companies conflate two very different statements: "this data should not be processed in the US" and "this data must never leave our infrastructure." The first is a residency and transfer requirement. The second is a control requirement. They have different legal bases and different — much cheaper — solutions.
The GDPR does not mandate on-premise processing. It requires that any third party processing personal data on your behalf gives sufficient guarantees and operates under a written contract. Article 28 spells out exactly what that contract must contain: processing only on your documented instructions, confidentiality, appropriate technical and organisational security, sub-processor controls, breach notification, and audit rights. A standard data processing agreement with an EU-region endpoint — Azure West Europe, AWS Frankfurt, Google Cloud's Belgium or Frankfurt regions — satisfies this. The hard problem the GDPR cares about is the transfer of personal data outside the EU, not the building it sits in. Keep the data in an EU region under a proper DPA and the residency question is answered without a single GPU on your premises.
The EU AI Act does not change this. Article 10, the data-governance obligation for high-risk systems, is about the quality and provenance of your training, validation and test data — relevance, representativeness, bias examination and mitigation, documented design choices. It is demanding, and from 2 August 2026 it carries real penalties for high-risk systems. But nowhere does it require that inference or training happen on hardware you own. It regulates how you govern data, not where the servers live.
NIS2 is the third regulation people invoke here, and it points the same way. Germany's transposition through the new BSI Act took effect in late 2025 and pulls a far broader set of mid-market companies into scope — broadly those above roughly €10m turnover or fifty employees in covered sectors — with personal liability for management who fail to approve and oversee cybersecurity risk-management measures. It mandates risk management, supply-chain security and governance. It does not mandate on-premise AI. If anything, an unstaffed self-hosted GPU cluster that nobody patches is a worse NIS2 posture than a hardened managed endpoint with a contractual security baseline.
Self-hosting becomes genuinely necessary in a narrow set of cases: where a sector regulator explicitly requires on-premise processing of the specific data class, where your own data-classification policy — a written policy, not a preference — prohibits any third-party processing, or where you handle material under national-security classification. For the vast majority of DACH Mittelstand companies, none of these apply, and an EU-hosted API satisfies sovereignty at a fraction of the cost.
Question 2: Can you afford the operational reality?
Self-hosting is not a hardware purchase. It is a permanent operational commitment, and the commitment is mostly people, not silicon.
The independent cost analyses converge on the same uncomfortable multiplier. Once you account for everything beyond the GPU — provisioning, monitoring, security patching, incident response and the recurring cost of model updates — DevTk.AI's 2026 breakdown suggests multiplying raw GPU spend by 1.3 to 2 times for a realistic figure, while Braincuber puts the all-in number at three to five times raw GPU cost in real client engagements. A setup that looks like €3,000 a month in compute behaves like €9,000 to €15,000 once the hidden costs surface. The healthcare deployment Braincuber documents ran roughly €4,300 in GPU plus €6,100 in engineering time per month — about 5.6 times the cost of the equivalent API.
The scarcest input is the team. A production self-hosted operation needs ML-infrastructure engineers who can own provisioning, observability, security patching, compliance documentation and a model-update cadence that recurs every few weeks. In the DACH market these people are expensive and slow to hire — senior roles routinely take three to six months to fill, and when an inference node falls over at month-end, the on-call competence has to already exist. Buying GPUs is easy. Standing up the team that keeps them serving traffic at 99.9 percent is the part that quietly fails.
Question 3: Does the volume justify the fixed cost?
Self-hosting has high fixed costs and low marginal costs. APIs invert that: low fixed cost, linear marginal cost. The crossover is entirely a function of volume — and, critically, of which API you are comparing against.
The break-even is far more sensitive to model tier than most cost cases admit. Against premium frontier APIs, a single well-utilised A100 can pay for itself at only a few million tokens a day — DevTk.AI puts the line against a top-tier model somewhere around five to nine million tokens daily. Against the cheap, high-efficiency tiers — the DeepSeek- and Gemini-Flash-class endpoints most Mittelstand workloads can actually use — the same analysis pushes break-even out to roughly 190 to 230 million tokens per day before owned hardware wins. That is an enormous gap, and it is the gap most self-hosting business cases ignore. They benchmark against the premium API to make the GPU look cheap, then plan to run a cheap model in production.
The honest reading: below roughly 50 million tokens a day, which covers the great majority of Mittelstand use cases, API deployment is cheaper even after the EU-residency premium. Once you are reliably running hundreds of millions of tokens a day through a stable set of use cases, owned infrastructure starts to win decisively — provided, and only provided, you have the team from Question 2. In between, the answer depends on your growth trajectory and how many workloads you can consolidate onto the same hardware.
The middle path: managed private deployment
Between full self-hosting and shared public APIs sits an option that matured considerably through 2025 and 2026, and that fits most DACH constraints better than either extreme: managed private deployment.
Azure AI's private endpoints, AWS Bedrock's VPC configurations and a handful of specialist European inference providers will run models on dedicated, isolated infrastructure inside EU data centres, with contractual data-isolation guarantees — and they carry the operational burden for you. You pay a premium over shared API pricing, typically in the region of 30 to 50 percent, and in exchange you delete the ML-engineering headcount, the hardware lifecycle and the 2am incident pager. For a company whose real requirement is data isolation rather than physical control of the machine, this is usually the optimal architecture. It buys the sovereignty outcome of self-hosting without the operational liability that sinks most self-hosting projects.
The recommendation framework
The right answer depends on what is actually driving the decision — so name the driver first, then choose.
If your driver is GDPR compliance, EU-hosted APIs under a proper Article 28 DPA are sufficient for the overwhelming majority of use cases. Self-hosting adds cost without adding a compliance benefit, because the regulation never asked for the building — it asked for the contract and the EU region.
If your driver is sector regulation, read the actual regulatory text before committing. Most DACH industry rules require data protection and governance, not on-premise processing. Where on-premise genuinely is required for a specific data class, managed private deployment frequently satisfies the letter of it without a self-built cluster.
If your driver is cost at scale, self-hosting earns its place above the volume break-even — but only against the model tier you will actually run, and only if you can staff the operation. Benchmark against your real production model, not the premium API that flatters the GPU. Without operational capability, the savings evaporate in downtime and incident response.
If your driver is latency, small models in the three-to-seven-billion-parameter range, self-hosted on a single GPU, deliver inference an API round-trip cannot match. For genuinely real-time production applications, this can be the deciding factor regardless of the cost arithmetic.
Most boardrooms that open with "we need to self-host" are answering a control question with an infrastructure answer they have not priced. The framework above is how you find out — before the capital is committed — whether you have a genuine sovereignty constraint or an expensive reflex.
A Fit Call pressure-tests your self-hosting decision against your real regulatory requirements, data sensitivity, volume projections and team capability — before you commit a seven-figure operational obligation to an architecture you do not need.
References: GDPR Art. 28 (processor obligations), gdpr-info.eu/art-28-gdpr; EU AI Act Art. 10 (data and data governance), artificialintelligenceact.eu/article/10; NIS2 transposition in Germany via the BSI Act, DLA Piper, dlapiper.com; DevTk.AI, "Self-Host LLM vs API: Real Cost Breakdown 2026," devtk.ai; Braincuber, "Self-Hosted LLM vs API: Breakeven Cost, GPU Math & When It's Worth It," 2026, braincuber.com.
