peripher.ai/blog/building-a-rag-knowledge-base
RAGAIknowledge baseLLM

Building a RAG Knowledge Base Your Team Will Actually Use

Peripher.AI·1 March 2025·3 min read

Why Most RAG Implementations Fail

Retrieval-Augmented Generation (RAG) is one of the most powerful tools in the AI stack. Feed it your documents, ask it questions, get accurate answers grounded in your actual data — no hallucinations, no generic responses.

In theory.

In practice, most RAG implementations fail for one of three reasons:

  1. Poor chunking — documents are split in ways that destroy context
  2. Weak retrieval — the wrong chunks are returned for a given query
  3. No one uses it — the interface is buried or awkward

We've built enough of these to know exactly where each failure mode lives — and how to prevent it.


Step 1 — Document Preparation

Before anything goes into the vector database, it needs to be clean and well-structured.

What we do:

  • Strip headers, footers, and page numbers from PDFs
  • Preserve semantic structure — headings indicate topic boundaries
  • Keep tables together — never split a table across chunks
  • Add metadata to every chunk: source document, section title, date

Chunk size: We typically use 512 tokens with a 64-token overlap. Smaller chunks give more precise retrieval; the overlap ensures context isn't lost at boundaries.


Step 2 — Embedding + Vector Storage

We use OpenAI text-embedding-3-small for most projects — fast, cheap, and accurate enough for business documents.

For storage, Supabase with pgvector is our default for projects that need to stay within a client's existing infrastructure. Pinecone or Weaviate for larger scale.


Step 3 — Retrieval Strategy

Naive vector search returns the most similar chunks by cosine distance. It works, but it misses a lot.

We layer two additional strategies on top:

Hybrid search — combine vector similarity with BM25 keyword search. Keyword search catches exact matches (product codes, names, acronyms) that embedding search often misses.

Reranking — after retrieving the top 20 candidates, a cross-encoder reranker (Cohere Rerank or a local model) scores each chunk against the query and returns only the top 5. This dramatically improves precision.


Step 4 — The Interface

The best RAG system in the world fails if nobody opens it.

We build chat interfaces that:

  • Live where your team already works (Slack, Teams, or a simple web app)
  • Show source citations so users can verify answers
  • Fall back gracefully when the knowledge base doesn't have an answer
  • Log queries so you can identify gaps in coverage

Results We've Seen

A recent deployment for a professional services firm reduced the time to answer internal policy questions from 15 minutes of searching to under 30 seconds — with cited sources the team could verify.

Adoption was at 85% within two weeks because the answers were accurate and the interface was Slack — where the team already spent their day.


Building a knowledge base for your team? Let's talk →

// Ready to automate?

Book a free 30-min discovery call.

We'll identify your biggest automation opportunity — no obligation.