Building a RAG Knowledge Base Your Team Will Actually Use

Peripher.AI·1 March 2025·3 min read

Why Most RAG Implementations Fail

Retrieval-Augmented Generation (RAG) is one of the most powerful tools in the AI stack. Feed it your documents, ask it questions, get accurate answers grounded in your actual data — no hallucinations, no generic responses.

In theory.

In practice, most RAG implementations fail for one of three reasons:

Poor chunking — documents are split in ways that destroy context
Weak retrieval — the wrong chunks are returned for a given query
No one uses it — the interface is buried or awkward

We've built enough of these to know exactly where each failure mode lives — and how to prevent it.

Step 1 — Document Preparation

Before anything goes into the vector database, it needs to be clean and well-structured.

What we do:

Strip headers, footers, and page numbers from PDFs
Preserve semantic structure — headings indicate topic boundaries
Keep tables together — never split a table across chunks
Add metadata to every chunk: source document, section title, date

Chunk size: We typically use 512 tokens with a 64-token overlap. Smaller chunks give more precise retrieval; the overlap ensures context isn't lost at boundaries.

Step 2 — Embedding + Vector Storage

We use OpenAI text-embedding-3-small for most projects — fast, cheap, and accurate enough for business documents.

For storage, Supabase with pgvector is our default for projects that need to stay within a client's existing infrastructure. Pinecone or Weaviate for larger scale.

Step 3 — Retrieval Strategy

Naive vector search returns the most similar chunks by cosine distance. It works, but it misses a lot.

We layer two additional strategies on top:

Hybrid search — combine vector similarity with BM25 keyword search. Keyword search catches exact matches (product codes, names, acronyms) that embedding search often misses.

Reranking — after retrieving the top 20 candidates, a cross-encoder reranker (Cohere Rerank or a local model) scores each chunk against the query and returns only the top 5. This dramatically improves precision.

Step 4 — The Interface

The best RAG system in the world fails if nobody opens it.

We build chat interfaces that:

Live where your team already works (Slack, Teams, or a simple web app)
Show source citations so users can verify answers
Fall back gracefully when the knowledge base doesn't have an answer
Log queries so you can identify gaps in coverage

Results We've Seen

A recent deployment for a professional services firm reduced the time to answer internal policy questions from 15 minutes of searching to under 30 seconds — with cited sources the team could verify.

Adoption was at 85% within two weeks because the answers were accurate and the interface was Slack — where the team already spent their day.

Building a knowledge base for your team? Let's talk →

// Ready to automate?

Book a free 30-min discovery call.

We'll identify your biggest automation opportunity — no obligation.

Book a Free Call →← More Articles