Auditing AI: The New Challenge of Diligencing AI-Native Companies (2024–2025)

Why AI Companies Require a Different Diligence Framework

Between 2023 and 2025, AI-native companies — those whose core product is built on or around machine learning models — became one of the most active categories in M&A. Strategic acquirers in software, healthcare, financial services, and media were acquiring AI companies at valuations heavily weighted to projected model performance and proprietary data assets.

The standard IT due diligence framework evaluates infrastructure stability, security posture, and technical debt in a codebase. These remain relevant. But for an AI company, the most material risks frequently sit in categories that don't appear in any traditional diligence checklist: the provenance of the models, the defensibility of the training data, the dependencies on third-party AI infrastructure, the regulatory exposure under emerging AI law, and the true economics of compute.

Each of these can materially affect the acquisition thesis. And each of them requires specific documentation to assess — documentation that most AI companies are not accustomed to organizing for a diligence process.

The Six Categories of AI-Specific Diligence Risk

1. Model provenance and intellectual property

Who built the model, and under what terms? AI companies frequently fine-tune open-source foundation models (Llama, Mistral, Falcon) or build on top of proprietary API-accessed models (GPT-4, Claude, Gemini). Each approach creates a different IP situation. Fine-tuning a model released under a non-commercial license and deploying it in a commercial product creates a licensing liability. Building a product entirely on top of an API — with no proprietary model of your own — creates a dependency risk that should affect the acquisition price.

2. Training data rights

The training data underlying a model is increasingly subject to intellectual property claims. Multiple ongoing lawsuits in 2024–2025 are testing whether scraping publicly available web content for model training constitutes copyright infringement. An acquirer inherits whatever liability exists in the acquired company's training data. The diligence question is: what was the source of training data, what agreements govern its use, and what is the exposure if those agreements are challenged?

3. Third-party API dependency and concentration risk

Many AI product companies are infrastructure-light: they build application layers on top of OpenAI, Anthropic, or Google APIs, with little proprietary model investment. This is a legitimate business model — but it creates concentration risk that significantly affects valuation. If the core product depends on a single external API, pricing changes, availability incidents, or policy changes at that provider directly affect the acquired business's ability to serve customers.

4. Compute cost structure and margin sustainability

AI inference costs are variable, scale with usage, and can be substantially higher than traditional software operating costs. An AI company showing strong revenue growth may be doing so on a unit economics model that doesn't scale — where the cost of serving each additional customer increases faster than revenue. Reviewing compute cost structure, the relationship between inference costs and gross margin, and the trajectory as usage scales is essential to validating the financial model.

5. EU AI Act and regulatory exposure

The EU AI Act, which entered into force in August 2024, imposes compliance obligations that depend on the risk classification of AI systems. High-risk AI systems — those used in hiring, credit scoring, healthcare, law enforcement, or critical infrastructure — face significant compliance requirements: conformity assessments, transparency obligations, human oversight requirements, and CE marking before EU market access. An acquired AI company operating in regulated sectors may have compliance obligations that are not reflected in its operating model or cost structure.

6. Model performance drift and maintenance obligations

Unlike traditional software, AI models degrade over time as the world they model changes. A model trained on 2022 data will perform progressively less well on 2025 inputs without retraining. This creates a maintenance obligation — compute cost, data acquisition, engineering time — that is often not reflected in the target's cost structure or the acquirer's integration budget. Diligence should establish the retraining cadence, the cost of each training run, and the model performance monitoring infrastructure.

What Structured AI Diligence Looks Like

The diligence process for an AI company needs to generate specific documentation on each of these risk categories — not management summaries, but reviewed primary documents with findings and confidence assessments.

→ Model card and architecture documentation review: Parsing available model documentation to establish provenance, base model, fine-tuning approach, and licensing terms — with a flag on any license conditions that restrict commercial use.
→ Training data inventory and rights documentation: Reviewing data source documentation, data use agreements, and scraping practices to produce a training data IP risk assessment — identifying exposure categories and quantifying the proportion of training data with clear usage rights.
→ API dependency mapping: Documenting all third-party AI infrastructure dependencies — providers, API usage volumes, cost per call, contract terms, and switching costs — to produce a dependency risk score and alternative sourcing analysis.
→ Compute cost unit economics analysis: Extracting compute cost data from available infrastructure documentation and financial records to model gross margin at current and projected scale — flagging any cases where margin compression is a structural feature of the cost model.
→ EU AI Act risk classification: Assessing the target's use cases against the EU AI Act risk tiers — identifying any high-risk applications, compliance gaps, and the cost to achieve compliance as a deal input.

The Investment Thesis Implication

Acquirers of AI companies in 2024–2025 are making thesis bets on proprietary model capability, unique training data, and defensible technical moats. Each of those thesis pillars can be validated or refuted through structured diligence — but only if the diligence process is designed to ask the right questions and examine the right documentation.

The companies acquiring AI assets that avoid post-close surprises are those where the diligence brief was explicit: model IP, data rights, compute economics, and regulatory exposure were primary workstreams — not afterthoughts on a traditional IT checklist.

The VEDEKON Framework includes AI-specific assessment criteria for each of these categories — designed to be activated when the target company's core technology asset is a trained model, a fine-tuned foundation model, or a product built on third-party AI infrastructure. The output is a structured risk assessment that maps directly to the investment thesis and deal model, not a qualitative summary that the Investment Committee cannot act on.