RegIQ: Building Regulatory Intelligence for East African Banks

2025 RAGLangChainAPIDockerChrome Extension

The problem

Compliance officers at Rwandan banks spend 15–25 minutes per question manually searching BNR directive PDFs. With 26 directives across 10 years, no indexed, searchable system existed. Every lookup meant downloading a PDF, running Ctrl+F, reading context, and cross-referencing other directives.

What I built

RegIQ is a RAG pipeline over the full National Bank of Rwanda directive corpus - 19 directives, article-level chunking, with directive number, article title, chapter, and page preserved as metadata on every chunk.

Scanned PDFs (the majority of African central bank documents) are handled via GPT-4o Vision OCR with table extraction. Clean PDFs use PyMuPDF with layout-aware parsing.

Architecture

The pipeline runs in three stages: ingestion (PDF → chunks → ChromaDB), retrieval (hybrid BM25 + vector search), and generation (GPT-4o with citation-aware prompting).

The Chrome extension intercepts the user's query, hits the FastAPI endpoint, and renders the answer inline with directive citations and confidence scores.

Design decisions

Article-level chunking (not page-level) was the critical choice. Regulatory language is precise - splitting mid-article destroys context. This alone improved retrieval relevance from ~60% to ~91% on the eval set.

Metadata filtering lets users scope queries to a specific directive year or number, cutting hallucination risk significantly.

Results

Query time dropped from 15–25 minutes to under 30 seconds. Tested across 3 compliance officers at a Rwandan commercial bank over 2 weeks.

91% retrieval relevance on 50-question eval set
26 directives indexed, ~4,200 chunks
Chrome extension deployed internally

Links

GitHub Repository