Semitora.

30 June 2026

Data readiness for RAG — a checklist before you deploy AI

Many enterprise AI projects don’t fail on the model — they fail on the data. Data readiness for RAG is the state where your documents are complete, clean, governed by permissions and kept up to date — fit to be the knowledge base your AI answers from, with a source. The model is rarely the edge; what matters is what you feed it. Walk this checklist — 25 questions across six areas — before you build, not when the prototype is already hallucinating.

If you’re still working out what RAG is, start with RAG on company documents. Below we assume you know what it’s for, and you’re asking the next question: is my data ready for it.

1. Sources and scope

Before you index anything, you need to know what you’re indexing and where it lives.

2. Permissions and sensitive data

This is the area that most often derails a project after the fact — and is the hardest to fix once you’re live.

3. Document quality and structure

The model is only as good as the chunk it gets. Garbage in, garbage in the citation.

4. Freshness and versioning

A knowledge base isn’t a one-day snapshot. Data that doesn’t refresh ages faster than you think.

5. Tests and quality metrics

Without tests, “it works” is a hunch, not a fact. Building isn’t enough — you have to measure.

6. Cost and maintenance

The most expensive part of GenAI is usually not inference but data engineering — and it doesn’t end at go-live.

In short

Data readiness for RAG is checked across six areas: sources and scope, permissions and sensitive data, document quality and structure, freshness and versioning, quality tests, and cost and maintenance. If you answer “I don’t know” to most of the questions, that isn’t a reason to drop AI — it’s the first phase of the project. The cheapest time to find out is before you build, not after.

What next

How we build RAG on company documents — with sources, on AWS — is on the RAG / knowledge bases page. Tidying data (ETL) and the knowledge base are a distinct delivery step for us, described in how we work. If you don’t know where to start, start with an audit: we’ll walk this checklist on your data.