GPU Infrastructure Economics: On-Premise vs. Cloud vs. Hybrid for DACH

GPU capacity is the most consequential infrastructure decision in enterprise AI, and the one most often made on instinct. Buy too early and you sink six figures into hardware that depreciates on a three-year clock while the next generation halves the price. Rent indefinitely and you hand a margin to a cloud provider on every token you will ever serve. The right answer is rarely the one a vendor calculator hands you, and for a German mid-market company it is almost never the same answer a hyperscaler-scale US analysis would give.

The pricing has moved fast. H100 rental rates have fallen sharply from their launch-era peaks, used datacentre GPUs are flooding the secondary market as enterprises upgrade to Blackwell, and a fresh tier of cost-efficient inference cards has appeared. But the variables that decide the outcome for a DACH buyer — electricity prices, the Energy Efficiency Act, and how the German tax office treats hardware — sit outside every generic model. This is how to read the decision with those factors in.

The hardware landscape in 2026

NVIDIA does not publish official datacentre GPU prices, so every figure below comes from reseller and market data rather than a list price. Treat them as ranges, not quotes.

The H100 remains the reference point. A single 80GB card runs roughly $25,000 at the low end to $40,000 for the SXM variant, with bulk orders pulling toward the bottom of that band. A full 8-GPU HGX system, once chassis, networking, and storage are in, lands in the low-to-mid six figures. The more interesting movement is on the rental side: on-demand H100 capacity now spans roughly $1.50 to $7.00 per GPU-hour depending on provider and commitment, with the bulk of the market clustering between $2 and $4. Specialist providers such as Lambda sit near the bottom of that range; hyperscaler on-demand rates sit several times higher.

The H200 buys you memory, not just speed. Its 141GB of HBM3e lets a single card hold models that would otherwise force a multi-GPU split, which matters more for inference economics than raw throughput. An 8-GPU H200 system sits around $310,000 to $315,000. Cloud availability has improved but still trails the H100.

The L40S is the card most Mittelstand workloads actually need. At roughly $7,500 to $10,000 per unit, it runs small-to-mid models efficiently, fits standard air-cooled racks without the liquid-cooling burden of an HGX node, and sidesteps the assumption — baked into most analyses — that serious AI means H100-class hardware. For a company serving one or two production models in the single-digit-billion-parameter range, this is usually the honest choice.

Used A100s are the value play. With inventory pouring onto the secondary market as buyers move to newer silicon, used 80GB A100s now trade in the region of $4,000 to $9,000 — well below half their original pricing. For proven workloads that do not need the latest generation, that price-performance is hard to argue with, provided you accept faster depreciation and no warranty.

Reading the three-year cost

Forget headline sticker prices and look at three years of total cost, because that is the horizon over which the German tax office will let you depreciate the hardware anyway. The shape of the decision matters more than any single number, so here is the logic rather than a spuriously precise table.

Cloud on a reserved commitment converts the whole thing into a predictable monthly operating cost that already absorbs hardware refresh, cooling, networking, and basic availability. At a sustained $2-to-$4 GPU-hour equivalent under commitment, a single production GPU is a few thousand dollars a month — visible, flexible, and free of capital risk. What it does not include, on any provider, is the engineering time to deploy, monitor, and keep a model healthy in production. That cost is identical whether you rent or own, and it is the line most calculators omit.

On-premise purchase front-loads the spend — the card, plus chassis, networking, and installation — then runs at the cost of electricity, cooling, maintenance, and rack space. The breakeven against reserved cloud is genuinely attractive, but only at sustained high utilisation. A GPU that sits idle two-thirds of the day is more expensive owned than rented, because you have paid for capacity you are not converting into tokens. Ownership wins on steady, predictable, round-the-clock load and loses on everything spiky.

Hybrid is where most Mittelstand companies should land: owned or reserved capacity for the steady-state production model that runs every hour, cloud for development, experimentation, and the bursts that would otherwise force you to over-provision hardware you only need occasionally. It is not a hedge so much as matching each workload to the cost structure it deserves.

The DACH factors that change the answer

Electricity is the structural disadvantage. German industrial electricity prices sit among the highest in Europe — broadly in the high-teens cents per kilowatt-hour, against single-digit-to-low-teens figures in France and the UK. A continuously loaded GPU draws a meaningful amount of power, and at German rates that running cost is real money over three years, several times what a US operator on cheaper power would pay for the identical card. This is the single factor that most weakens the on-premise case in Germany relative to the US analyses you will read, and it is why utilisation discipline matters even more here. There is partial relief: from January 2026 Germany introduced a temporary subsidised industrial electricity price for energy-intensive companies, capping part of eligible consumption near five cents per kilowatt-hour through 2028, but eligibility is narrow and most mid-market AI operators will not qualify.

The Energy Efficiency Act adds a procurement obligation. Under the EnEfG, datacentres with a connected load of 300kW or more must cover 50 percent of their electricity from renewable sources from January 2024, rising to 100 percent from January 2027; purchasing guarantees of origin satisfies the requirement on a balance-sheet basis. A single rack of GPUs sits well below that threshold, so an in-house cabinet is unlikely to trigger it directly — but the moment you take colocation space in a German facility, that obligation is priced into what you pay, and from 2027 it applies in full. It is a cost and a compliance line, not a footnote.

German depreciation rules cut both ways, and there is a real option here. The standard AfA table assigns computer hardware a three-year useful life on a linear basis. But since a 2021 Federal Ministry of Finance directive, businesses may apply a one-year useful life to computer hardware and software — meaning the full purchase cost can be written off in the year of acquisition. For a GPU bought outright, that is a genuine tax lever: it lets you expense the capital immediately rather than dragging it across three years. Cloud spend is already fully deductible as it is incurred. So the CapEx-versus-OpEx framing is softer than it looks; with the accelerated write-off, owning need not mean locking the cost into a slow depreciation schedule. The right call depends on your tax position, not a generic rule.

Datacentre capacity is tight, and power is the constraint. Frankfurt is Europe's largest datacentre hub and one of its most supply-constrained, with availability — particularly for dense, high-power AI cabinets — measured in lead times rather than spot vacancy. Across primary markets, grid power, not floor space, has become the binding limit. For a Mittelstand buyer this means colocation for GPU density is something to plan and pre-commit, not something to procure on short notice, and that secondary markets such as Munich, Berlin, or Hamburg may quote sooner even where connectivity is thinner.

How to decide

Buy on-premise when one production workload runs at genuinely high, sustained utilisation around the clock, you already hold datacentre space with the power and cooling headroom, you have the infrastructure engineering capacity in-house to operate it, and your three-year horizon is stable enough to justify the capital — bearing in mind the one-year write-off can make that capital easier to stomach than it first appears.

Lean on cloud when volumes are modest or unpredictable, when you want the freedom to jump GPU generations as new silicon lands, or when you simply do not have the engineers to babysit hardware. Renting is not the timid option; it is the correct one whenever your load does not fill the capacity you would otherwise own.

Go hybrid — and most companies with more than a handful of AI workloads should — when you have a stable production model that earns reserved or owned capacity alongside experimentation that thrives on cloud elasticity. The discipline is not picking a side; it is refusing to pay owned-hardware prices for spiky demand, or cloud premiums for load that never moves.

A Fit Call models your GPU economics against your actual workloads and the DACH factors generic calculators leave out — electricity, the EnEfG, and the depreciation lever — before you commit capital to the wrong side of the buy-versus-rent line.

Book a Fit Call →

References: IntuitionLabs, "NVIDIA AI GPU Pricing Guide," 2026 (intuitionlabs.ai/articles/nvidia-ai-gpu-pricing-guide); GetDeploying, "H100 Cloud Pricing: Compare 45+ Providers," 2026 (getdeploying.com/gpus/nvidia-h100); White & Case, "Data center requirements under the new German Energy Efficiency Act" (whitecase.com/insight-alert/data-center-requirements-under-new-german-energy-efficiency-act); Gleiss Lutz, "Germany cuts costs for electricity-intensive companies from 1 January 2026: the new industrial electricity price" (gleisslutz.com); Clean Energy Wire, "Germany set to introduce 'industrial electricity price' by beginning of 2026"; Euronews, "Germany is a leader in renewables, so why does it have one of the highest EU electricity prices?," June 2026; Haufe, "BMF: Verkürzung der Nutzungsdauer von Computer-Hardware und -Software" (haufe.de).

GPU Infrastructure Economics: On-Premise vs. Cloud vs. Hybrid for DACH

The hardware landscape in 2026

Reading the three-year cost

The DACH factors that change the answer

How to decide

Related articles

Inference Economics: Self-Hosted vs. API — The Real Math

The Self-Hosting Decision Tree: Data Sovereignty vs. Operational Reality

MLOps for Mittelstand: What You Actually Need vs. What Vendors Sell You

Ready for the next step?