RAG vs. Fine-Tuning vs. Prompt Engineering: A Decision Framework

The question "should we use RAG or fine-tuning?" is the wrong question. The right question is: where should intelligence live in your system — in the model's weights, in external knowledge, or in the instructions you give it?

Each approach solves a different problem. Confusing them is the most expensive mistake we see in enterprise AI architecture — and, since the EU AI Act's general-purpose AI rules took effect, one of the few that can quietly hand you a regulatory obligation you never intended to take on.

Three approaches, three problems

Prompt engineering shapes behaviour through instructions. It changes how the model responds without changing what it knows or how it thinks. System prompts, few-shot examples, structured output schemas, chain-of-thought instructions — these are fast to implement, require no training infrastructure, and work immediately. For most enterprise tasks, this is where you should start, and more often than not, where you should stop.

Retrieval-augmented generation (RAG) gives the model access to external knowledge at inference time. The system retrieves relevant passages from a search index — typically vector-based, often hybrid with keyword search — then the model generates an answer grounded in that content. RAG solves the knowledge problem: the model needs information it was never trained on. Your internal policies, product catalogues, customer histories, the version of a regulation that applies this quarter. None of that lives in a frontier model's weights, and you would not want it to.

Fine-tuning changes the model's weights through additional training on domain-specific data. It changes how the model behaves — its tone, its output format, its classification boundaries, its handling of domain vocabulary. Fine-tuning solves the behaviour problem: you need the model to act in a specific, repeatable way that prompting alone cannot reliably enforce across thousands of interactions.

The decision framework

Start with prompt engineering. If a well-engineered prompt cannot reach the required quality, diagnose the gap before you reach for heavier machinery. Almost every gap is one of two kinds: the model lacks knowledge, or the model's behaviour is wrong. The two demand different tools, and mistaking one for the other is where the budget disappears.

If the model lacks knowledge, implement RAG. Your company's documentation, product data, compliance manuals, customer records — RAG makes them accessible without retraining. The decisive advantage is freshness: change a document in the knowledge base, and the next query reflects it. Nothing to retrain, nothing to redeploy. For any domain where the ground truth shifts — pricing, policy, regulation, inventory — this property alone usually settles the argument.

If the model's behaviour is wrong, consider fine-tuning. The model understands the content but produces it in the wrong shape: the wrong format, an off-brand tone, classification calls that do not match your domain's boundaries. Fine-tuning earns its keep when you need reliably structured output in a fixed schema, a consistent voice at scale, or decision boundaries that generic instruction-following keeps getting wrong.

If both, use both — and this is where the strongest 2026 architectures land. Fine-tune a smaller model for behaviour, then layer RAG on top for knowledge. You get fast, on-brand, citable answers, and you get them without routing every token through a frontier model. The behaviour is baked into cheap weights; the facts stay external, current, and auditable.

What each layer actually costs a Mittelstand budget

Prompt engineering costs little beyond engineering time. No GPUs, no training corpus, no pipeline. The ceiling is reliability: complex behaviour is hard to pin down through instructions alone, and the long system prompts you write to compensate quietly inflate per-call token costs once you are at volume.

RAG needs a vector or hybrid index, an embedding model, a retrieval pipeline, and a chunking strategy. A production-grade system is typically a few weeks of work, not a few months. Running costs — embedding compute and index storage — are modest at the document volumes a typical Mittelstand knowledge base reaches. The hard part is never standing RAG up; it is making it good. Chunking, retrieval quality, reranking, and context-window discipline are what separate a system that cites the right paragraph from one that retrieves the wrong policy and synthesises it with total confidence.

Fine-tuning needs curated training data — usually hundreds to a few thousand high-quality examples — compute to train, and an evaluation harness to prove the result is actually better. Full fine-tuning of a large model is costly and risks catastrophic forgetting: the model gains your domain and loses general capability. Parameter-efficient methods are the practical answer. LoRA freezes the base model and trains a small set of low-rank matrices instead; the original authors reported reducing trainable parameters by up to ten thousand times versus full fine-tuning of GPT-3 175B, and cutting the GPU memory requirement by roughly three times, while holding quality (Hu et al., 2021). For the overwhelming majority of enterprise use cases, LoRA or QLoRA matches full fine-tuning at a fraction of the cost — and, as we will see, keeps you well under a regulatory line that full retraining can cross.

The EU AI Act trap that turns a fine-tune into a provider obligation

This is the part most architecture discussions skip, and it is the part a Geschäftsführer cannot afford to. Under the EU AI Act, the obligations on a provider of a general-purpose AI model are far heavier than those on a deployer. The European Commission's July 2025 guidelines clarified when fine-tuning a third-party model makes you the provider of a new model in your own right — and therefore liable for the documentation, training-data summary, and copyright-policy obligations that come with that status.

The line is drawn by compute. A modification is treated as creating a new model when the compute used to fine-tune exceeds roughly one third of the compute used to train the original. As an indicative threshold, the guidelines point to one third of the 10²³ FLOP marker that defines a general-purpose model in the first place — and a stricter regime applies to models classed as systemic-risk. Cross that line and the obligations attach only to your modification, not the whole base model; but they attach. Enforcement of the GPAI provisions, including fines, applies from 2 August 2026.

The practical reading for a Mittelstand team is reassuring and sharp at once. Ordinary LoRA-style fine-tuning on a few thousand examples is nowhere near one third of a foundation model's training compute — you stay a deployer, and the heavy obligations do not attach. It is the ambition to fully retrain or heavily continue-train a large open-weight model that walks you toward provider status. That is a strategic decision, not a quiet engineering choice to be made in a sprint. If your roadmap includes deep retraining, the compliance cost belongs in the business case from day one, not as a surprise in 2026.

Two further obligations apply regardless of the build choice and are easy to forget. From 2 August 2026, Article 50 requires that users be told when they are interacting with an AI system unless it is obvious, and that AI-generated or manipulated content be marked as such — including text published to inform the public on matters of public interest, unless it has passed human editorial review. A RAG assistant on your website and a fine-tuned drafting tool fall under the same transparency rule; design the disclosure in from the start rather than retrofitting it under deadline.

Where DACH enterprises go wrong

Three patterns recur, and all three are avoidable.

Reaching for fine-tuning when RAG would suffice. A team spends months fine-tuning a model on compliance documentation that changes every quarter. A RAG system over the same documents, built in weeks, would have produced better answers and stayed current — because the fine-tuned model goes stale the moment the rules move, and re-baking it on every update is a treadmill no one budgeted for. If the knowledge changes, the knowledge belongs outside the weights.

Building RAG without investing in data quality. RAG is only as good as what it retrieves. Outdated policies, contradictory guidelines, and unstructured documents do not get smarter inside a vector index — they get retrieved and synthesised into fluent, confident, wrong answers. The expensive, unglamorous work of cleaning, de-duplicating, and structuring source content is the work. Treat it as the project, not the prerequisite.

Skipping prompt engineering entirely. Teams jump to RAG or fine-tuning before testing what a disciplined prompt can do. A clear role definition, an explicit output schema, and a handful of worked examples routinely close gaps that looked like they demanded retraining. Try the cheap thing first and instrument it; you will be surprised how often it wins.

Making the right architectural choice

The decision is not permanent, and it should not be made all at once. Start with prompt engineering. Add RAG when the gap is genuinely a knowledge gap. Add fine-tuning only when behaviour remains inconsistent after the knowledge is in place. Each layer adds cost, operational surface, and — in the case of deep fine-tuning — potential regulatory weight. Add each one only when the layer beneath it is demonstrably not enough.

The organisations building the most effective AI systems are not the ones reaching for the most sophisticated technique. They are the ones using the simplest technique that reliably solves the problem — and knowing, before they commit, which side of the EU AI Act's provider line their choice puts them on.

A Fit Call maps your use case to the right architecture — prompt, RAG, fine-tune, or hybrid — and flags any EU AI Act provider exposure before you commit budget to the wrong layer.

Book a Fit Call →

References: Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models," arXiv:2106.09685, 2021 (https://arxiv.org/abs/2106.09685); European Commission, "Guidelines on the scope of obligations for providers of general-purpose AI models," 2025 (https://digital-strategy.ec.europa.eu/en/policies/guidelines-gpai-providers); EU Artificial Intelligence Act, Article 50 — Transparency Obligations (https://artificialintelligenceact.eu/article/50/).

RAG vs. Fine-Tuning vs. Prompt Engineering: A Decision Framework

Three approaches, three problems

The decision framework

What each layer actually costs a Mittelstand budget

The EU AI Act trap that turns a fine-tune into a provider obligation

Where DACH enterprises go wrong

Making the right architectural choice

Related articles

LLM Weight Classes: Which Model Fits Which Enterprise Task

Small Language Models for Enterprise: When 7B Parameters Beat 70B

The AI Context Layer: Why Most Enterprise AI Fails on Data, Not Models

Ready for the next step?