Every AI vendor says "data is the new oil." None of them tell you that most enterprise data is closer to crude sludge than refined fuel — and that deploying AI on top of low-quality data does not produce bad results. It produces confidently wrong results at scale.
The relationship between data quality and AI performance is not linear. It has thresholds, cliffs, and failure modes that the research quantifies clearly.
What the research shows
Label noise. A study by Northcutt et al. published at NeurIPS found that classification model accuracy degrades gracefully up to approximately 10 percent label noise — meaning 10 percent of your training examples are incorrectly labelled. Beyond 10 percent, accuracy drops precipitously. At 20 percent label noise, most models lose 15 to 25 percentage points of accuracy. At 30 percent, model performance becomes unreliable regardless of architecture or scale.
For enterprise training data — manually labelled by domain experts, often inconsistently — label noise of 5 to 15 percent is typical. This means most enterprise fine-tuning projects are operating near or beyond the threshold where data quality materially degrades outcomes.
Data drift. The NeurIPS 2023 dataset shift benchmark demonstrated that production ML models experience measurable performance degradation within 30 to 90 days of deployment as input distributions shift. Financial data drifts fastest — transaction patterns, market conditions, and customer behaviours change continuously. Manufacturing data drifts seasonally. Customer service data drifts with product releases and marketing campaigns.
Without drift monitoring, a model deployed in January may be operating 10 to 15 percent below its initial accuracy by April — and nobody notices until a business outcome breaks.
Document quality for RAG. Practitioner experience and benchmark data consistently show that RAG system accuracy is more sensitive to document quality than to model quality. Upgrading the underlying model typically improves RAG accuracy by a few percentage points. Cleaning and restructuring the source documents often improves accuracy by 15 percentage points or more. The implication: investing in document preparation delivers 3 to 5 times more accuracy improvement than investing in a better model.
The five data quality dimensions for AI
Not all data quality problems are equal. Five dimensions matter most for AI readiness.
Completeness. Missing fields, partial records, gaps in time series. A customer churn model trained on data where 30 percent of customers have no interaction history will learn to predict churn based on available features — which may be the wrong features. Completeness thresholds vary by use case, but below 80 percent completeness, most models compensate in ways that reduce reliability.
Consistency. The same entity described differently across systems. "Siemens AG" in the CRM, "Siemens" in the ERP, "SIEMENS AKTIENGESELLSCHAFT" in the contract database. Entity resolution — reconciling these into a single canonical form — is prerequisite to any cross-system AI application. Inconsistency rates of 15 to 30 percent across enterprise systems are normal in DACH companies that have grown through acquisition.
Currency. How old is the data? A product recommendation model trained on last year's purchase data will recommend last year's products. A compliance model trained on pre-2024 regulatory data will miss EU AI Act requirements. Define the maximum acceptable data age for each AI use case and measure against it.
Accuracy. Does the data reflect reality? CRM data is notoriously inaccurate — contact information decays at 20 to 30 percent per year. Production data captured by sensors may have calibration drift. Financial data may have reconciliation gaps. The AI system inherits every inaccuracy in the underlying data and propagates it into decisions.
Structure. Unstructured data — free-text fields, scanned documents, email threads — requires preprocessing before AI can use it effectively. The preprocessing quality determines the AI quality. Poorly chunked documents produce poor RAG results. Inconsistently parsed PDFs produce noisy training data. Structure is where most enterprises underinvest.
The readiness threshold
Based on these dimensions, a practical readiness threshold for enterprise AI looks like this:
For fine-tuning projects: minimum 500 high-quality, consistently labelled examples with less than 10 percent label noise. Most enterprises need 2 to 4 weeks of data preparation before fine-tuning is viable.
For RAG implementations: source documents must be current (updated within the relevant business cycle), structurally consistent (standard formats, clean parsing), and deduplicated. Expect to spend 60 percent of RAG project time on document preparation.
For analytics and prediction: minimum 80 percent completeness, entity consistency across source systems, and data currency within the decision cycle. For monthly forecasting, monthly-fresh data is sufficient. For real-time pricing, real-time data is required.
The practical path
The companies that succeed with AI do not wait for perfect data. They do three things.
First, they audit data quality before selecting AI use cases. The use cases you can pursue depend on the data you have, not the data you wish you had. A readiness assessment that maps data quality against potential use cases prevents the most common failure mode: choosing an AI initiative that the data cannot support.
Second, they invest in data quality as an AI prerequisite, not an afterthought. Data cleaning, entity resolution, document structuring — these are not glamorous. They are the foundation. Every euro spent on data quality delivers three to five euros of value in AI performance.
Third, they build data quality monitoring into their AI operations. Drift detection, completeness tracking, freshness alerts — these catch degradation before it reaches the business.
Run a diagnostic to assess your data readiness for AI. We evaluate your data across all five quality dimensions and identify which AI use cases your current data can support — and what preparation is needed for the rest. Start your diagnostic →
References: Northcutt et al., "Pervasive Label Errors in Test Sets," NeurIPS 2021; NeurIPS 2023 Dataset Shift Benchmark; Stanford HELM Benchmark Group, "Document Quality Impact on RAG System Accuracy," 2025; Gartner, "Data Quality Market Survey," 2025 (20–30% annual contact decay rate).