RAG on company documents — when a chatbot starts answering from sources

A public chatbot answers from what it “remembers” from training. It can be confident and wrong at the same time — and it does not know your documents, your price list or your procedures. RAG changes that arrangement: the model stops guessing and starts answering from your knowledge base, with a pointer to the source. Here is how it works and what you need to prepare.

What RAG is

RAG stands for retrieval-augmented generation — generation supported by retrieval. Instead of asking the model “what do you know about this”, we first ask the knowledge base “which document passages are relevant here”, and only then ask the model to assemble an answer from them. Company knowledge stays outside the model, in a controlled set you can update without retraining anything.

How it works, step by step

First the documents go into the knowledge base: we split them into passages and turn them into vectors (embeddings) — a numeric representation of meaning. When a question arrives, we turn it into a vector the same way and search for the passages closest in meaning. That is semantic search; it understands sense, not just word matching.

The retrieved passages go to the model together with the question and a clear instruction: answer strictly from these, and say where they come from. The user gets an answer with a link to the source and can verify it in seconds.

Where the knowledge base lives

We build the knowledge base on AWS, in Amazon Bedrock Knowledge Bases — the native RAG mechanism in that cloud. The index and the documents stay in your own cloud account, under your control. That matters beyond the technical side: personal data and trade secrets do not travel to public models, which also tidies up the GDPR question.

How RAG limits hallucinations

RAG alone does not guarantee truth — the discipline around it does. Three things make the difference:

Staying strictly on the sources. The model should answer only from the retrieved passages. When the base has no answer, it should say “I don’t know” rather than invent a plausible-sounding sentence.
Citations. Every answer points to the document and the place it came from. That turns “trust me” into “check for yourself”.
Evaluations. We measure answer quality automatically and repeatably, because the base and the questions change over time. Without it, the system quietly loses its edge.

We built exactly this setup for our own product, mojApteczka, in healthcare — where an “almost right” answer is dangerous. Strict RAG, citations and evaluations were not a decoration there; they were the price of entry.

What it costs

The honest answer is: it depends. The cost is driven by preparing the sources (usually the largest piece of work), the size of the base, the number of integrations and the query volume. The cost of a single query can be very low when the architecture is sensible — matching the model to the task, caching and monitoring cost per query keep scale from eating the margin.

When it is worth it, and when it is not

RAG makes sense when answers should come from your own content: product documentation, procedures, a support base, contracts, expert knowledge. When the questions are about general knowledge, or need live computation from transactional systems, other patterns can fit better — sometimes a plain integration rather than a knowledge base. That is why we start every conversation with what you actually want to achieve.

What’s next

The cheapest first step is checking whether your sources are RAG-ready — because they decide the quality, not the model itself. We describe how we deliver RAG on a dedicated page: RAG for business.