The problem
Compliance officers at Rwandan banks spend 15–25 minutes per question manually searching BNR directive PDFs. With 26 directives across 10 years, no indexed, searchable system existed. Every lookup meant downloading a PDF, running Ctrl+F, reading context, and cross-referencing other directives.
What I built
RegIQ is a RAG pipeline over the full National Bank of Rwanda directive corpus - 19 directives, article-level chunking, with directive number, article title, chapter, and page preserved as metadata on every chunk.
Scanned PDFs (the majority of African central bank documents) are handled via GPT-4o Vision OCR with table extraction. Clean PDFs use PyMuPDF with layout-aware parsing.
Architecture
The pipeline runs in three stages: ingestion (PDF → chunks → ChromaDB), retrieval (hybrid BM25 + vector search), and generation (GPT-4o with citation-aware prompting).
The Chrome extension intercepts the user's query, hits the FastAPI endpoint, and renders the answer inline with directive citations and confidence scores.
Design decisions
Article-level chunking (not page-level) was the critical choice. Regulatory language is precise - splitting mid-article destroys context. This alone improved retrieval relevance from ~60% to ~91% on the eval set.
Metadata filtering lets users scope queries to a specific directive year or directive number, cutting hallucination risk significantly.
Results
Query time dropped from 15–25 minutes to under 30 seconds. Tested across 3 compliance officers at a Rwandan commercial bank over 2 weeks.
- 91% retrieval relevance on 50-question eval set
- 26 directives indexed, ~4,200 chunks
- Chrome extension deployed internally