Why Data Is The Biggest Barrier To Ai Readiness And What To Do About
Why Data Is the Biggest Barrier to AI Readiness — And What to Do About It
Overview
This is Part 2 of a three-part “Agent Readiness” series produced by the AI Daily Brief in collaboration with Super Intelligent, a firm specialising in AI readiness audits. The host, Nathaniel W. (contact: nw@bsuper.ai), interviews Nufar, Head of Research at Super Intelligent. The conversation draws on findings from thousands of surveys and agent readiness audits across a wide range of organisations. The central thesis is that data fragmentation, quality, and access — not cultural resistance or outdated technology — are the primary barriers preventing organisations from scaling AI agents. The episode argues for a middle-path strategy called “intentional opportunism” to navigate this challenge.
Source video URL: Not available (internal/podcast distribution)
Prerequisites
- Basic familiarity with AI agents and large language models (LLMs)
- General understanding of enterprise data architecture (e.g., CRMs, ERPs, data lakes)
- Awareness of common AI/ML development concepts: APIs, RAG (retrieval-augmented generation), evals
- Familiarity with low-code automation tools (e.g., Zapier, Make/Integromat) is helpful
- Some exposure to AI agent frameworks (e.g., OpenAI, Google ADK) is useful but not required
Main Points
1. Technical Readiness Scores Are Consistently the Lowest Dimension
- Across all agent readiness audits conducted by Super Intelligent, technical and data readiness scores are the lowest of all dimensions assessed.
- Organisational motivation, ideas, and FOMO consistently outpace willingness to fix underlying data and infrastructure.
- The gap between ambition and foundational capability is the core problem preventing AI from scaling beyond pilots.
2. Three Archetypes of Companies Getting It Wrong
- The Magpie: Chases shiny agent demos for marketing and social media, avoids the unglamorous work of data cleanup. Ends up stuck in “pilot hell.”
- The Overwhelmed (Analysis Paralysis): Sees the full scope of legacy issues, becomes paralysed, and cannot move forward.
- The Monk (Mountaineer): Recognises what needs fixing and initiates long, comprehensive foundational data/infrastructure projects before attempting any agent use case. This approach is too slow and historically has failed even before the GenAI era.
- None of these three archetypes achieves meaningful results.
3. The Recommended Framework: Intentional Opportunism
- A pragmatic blend of opportunistic, high-ROI early wins combined with a structured longer-term vision.
- The principle is start now, but start smart: don’t wait for a perfect data foundation, but don’t build randomly either.
- For the first one or two use cases, grant yourself a “free pass” on foundational data work — get the use cases out the door.
- In parallel, begin identifying areas where building reusable, foundational infrastructure has a clear and justified ROI.
- Over time, shift from mostly opportunistic to increasingly intentional and structured.
Action Plan:
- This quarter: Launch one or two agent projects chosen for both feasibility and visible value.
- Next year: Form a concrete roadmap addressing critical gaps, informed by what you learned.
- Aim to get approximately 70% of recommendations right — that alone puts you ahead of most organisations.
4. The Data Readiness Problem in Detail
- In every audited company, internal systems rarely connect well to one another.
- The same entity (e.g., a customer or product) often appears under different names across multiple systems, making automated unification difficult or impossible.
- Two major sub-challenges:
- Compliance and data privacy: Fear of data leakage leads many organisations to approve nothing, causing stagnation. Customer contracts may include explicit “do not use for AI” clauses.
- Tribal knowledge: Critical business process know-how is held by a small number of individuals with no documentation, making it impossible to teach an agent how to execute those processes.
5. Five Recommendations for Data Readiness
- Use AI to solve data problems: Use LLMs to link data entities via natural language, clean and refine data, and build RAG and semantic similarity systems for data retrieval — capabilities that did not exist before GenAI.
- Document tribal knowledge via screen recording: Have subject matter experts narrate their work while recording their screen. Feed the recording into an LLM to generate a Standard Operating Procedure (SOP), then have the expert review it. This produces documentation in hours rather than weeks.
- Focus on the highest-ROI data sources: Identify the handful of foundational data sources most critical to agent use cases. Create dedicated, easily accessible solutions for those — potentially via a unified third-party vendor. Do not attempt to connect everything.
- Invest in data cleanup only where ROI is extremely large: Historical large-scale data projects have largely failed because they tried to do everything. Be selective; only clean or restructure data where the return clearly justifies the cost.
- Build new systems correctly from the start: If building new data sources or systems, architect them with agent access in mind from day one — accessible, logically organised, rich with metadata, and supported by structured SOPs.
6. Security and Governance Non-Negotiables
- Define and enforce strict data access roles — not every agent or user should have read or write access to every system.
- Anonymise data where necessary, even if imperfect.
- Create a secure sandbox environment where employees can experiment within defined boundaries without risk of data leakage. This is the primary enabler for safe, broad experimentation.
- Centralised AI teams should control write access to sensitive systems (e.g., finance); individual teams can build more freely within those guardrails.
7. Key Technology Readiness Decisions (as Dials, Not Binary Choices)
Centralised vs. Decentralised Building
- Individual teams should be empowered to build agents for their own day-to-day needs — this power should not be taken away.
- A centralised AI team should handle shared, complex, or high-stakes problems that individual teams cannot address alone.
- Define a boundary condition based on value, complexity, data sensitivity, and how widespread the use case is.
Point Solutions vs. Unified Platform
- Recommended architecture includes three tiers of horizontal agent-building platforms to serve all populations:
- Prompt-based / no-code platforms (e.g., Relevance AI, All India) for less technical teams.
- Low-code / automation platforms (e.g., n8n, Zapier, Make) for more flexibility and integration.
- Full-code / developer frameworks (e.g., Google ADK, OpenAI packages) for developers needing full flexibility.
- For specific verticals with proven external solutions (legal, customer support, coding), buy and customise rather than compete with dedicated vendors.
- Build reusable internal utility services covering data access, tooling, monitoring, governance, and guardrails to serve all builders safely.
Build vs. Buy
- In practice, the answer is almost always build or adapt on top of something you buy (hybrid).
- Pragmatic rules:
- If a tool covers ~80% of your use case, buy it — don’t build.
- If a major native business system (Salesforce, Workday, etc.) will inevitably build what you need, wait or use a temporary patch rather than building something that will be rendered obsolete.
- Build only when no suitable solution exists or is in sight, or when building to your exact needs constitutes a genuine competitive advantage.
Velocity vs. Quality (Evals)
- Moving fast without rigorous testing is costly — the quickest road to value often requires slowing down for proper evaluation.
- Evals (datasets and methods for systematically testing AI systems) are among the highest-ROI investments an organisation can make.
- Most companies with strong evals treat them as proprietary — they do not share them publicly, signalling how valuable they are.
- Using agents to test other agents (AI-on-AI QA) is an emerging and promising practice to overcome the human bottleneck in testing.
- Always test in a safe environment before promoting to production.
8. The Role of MCP and Emerging Standards
- Model Context Protocol (MCP) allows discrete data sources to be wrapped in a standardised API that agents can plug into quickly.
- MCP enables organisations to approach the data problem incrementally rather than needing to rationalise all data sources at once.
- However, MCP does not eliminate the need to clean and standardise data — it makes connection easier, not quality problems irrelevant.
- MCP is one of the key reasons 2026 is expected to be a year where foundational data investment becomes more tractable and widespread.
9. 2026 Outlook: Foundations and Value Realization
- Organisations are expected to shift from building for FOMO to building because they have evidence of value from 2024–2025 learnings.
- Pressure to demonstrate value will prevent organisations from doing only foundational work.
- “Context engineering” is emerging as a more precise and compelling framing than “data readiness.”
- Evals are expected to become a mainstream topic and capability — even among AI-building companies, over a quarter of senior leaders were unfamiliar with or not using evals in their own internal deployments.
Key Concepts
- Agent Readiness Audit: A structured assessment of an organisation’s cultural, data, technology, and use-case readiness to adopt AI agents at scale.
- Intentional Opportunism: A strategic posture that combines quick, high-ROI AI deployments with a deliberate, phased approach to building longer-term infrastructure.
- Pilot Hell: A condition where an organisation continuously builds proof-of-concept AI projects but never scales any of them to production or meaningful value.
- Analysis Paralysis: A state in which awareness of the full scope of problems prevents any action from being taken.
- RAG (Retrieval-Augmented Generation): A system architecture that retrieves relevant data from external sources and provides it as context to an LLM at inference time.
- SOP (Standard Operating Procedure): A documented, step-by-step description of how a business process is executed; a prerequisite for teaching an agent to perform that process.
- Tribal Knowledge: Undocumented process expertise held informally by a small number of individuals and not systematically recorded or accessible to others.
- MCP (Model Context Protocol): A standardised protocol that wraps a data source in an API interface that AI agents can connect to easily, enabling incremental data integration.
- Evals (Evaluations): Structured datasets and testing methodologies used to assess the accuracy, reliability, and behaviour of AI systems in defined scenarios.
- Horizontal Agent Platform: A general-purpose platform for building or deploying AI agents, applicable across many different use cases and teams (contrasted with vertical/domain-specific tools).
- Vertical Agent Solution: A pre-built AI agent product designed for a specific domain or industry (e.g., legal document review, customer support), typically from a specialist vendor.
- Context Engineering: An emerging term for the discipline of structuring, cleaning, and organising data and context such that AI agents can use it effectively — a more precise evolution of the concept of “data readiness.”
- Secure Sandbox: A controlled, isolated environment where employees can experiment with AI tools without risk of exposing or corrupting production data.
Summary
Nufar, Head of Research at Super Intelligent, presents findings from a large body of agent readiness audits showing that data fragmentation, quality, and access are consistently the most severe barriers to scaling AI agents in organisations — more limiting than culture, technology choices, or budget. Three common failure archetypes are identified: the Magpie (chasing demos, avoiding infrastructure work), the Overwhelmed (paralysed by the scale of the problem), and the Monk (attempting exhaustive foundational projects before any use case work begins). As an alternative, Nufar advocates for “intentional opportunism” — launching carefully chosen, high-visibility early use cases immediately while simultaneously and gradually building reusable data and technology foundations. Practically, this means using AI itself to solve data quality and access problems, documenting tribal knowledge via LLM-assisted screen recordings, focusing cleanup investment on the highest-ROI data sources, and adopting a layered agent platform architecture that empowers both individual teams and centralised governance. On the technology side, build-versus-buy decisions should default to buying or adapting when a solution covers roughly 80% of the need, with custom building reserved for genuine competitive differentiation. Evals are identified as one of the highest-ROI investments available and are expected to become a defining capability for organisations that successfully scale AI in 2026. The overarching message is that organisations should neither be driven by hype into shallow demonstrations nor paralysed into inaction by the scale of their data debt — instead, they should move deliberately, incrementally, and with clear line of sight to value.