ClaimLens
Case Study • RAG Systems • Retrieval Engineering
Most RAG systems fail on real-world documents.
ClaimLens solves this by replacing naive chunking with deterministic clause-level retrieval — built for insurance policies where precision isn't optional.
Scroll to explore architecture, evaluation & insights ↓
Introduction
ClaimLens is a production-oriented Retrieval-Augmented Generation (RAG) system designed for insurance policy analysis, where accuracy and traceability are critical.
Traditional RAG pipelines often rely on heuristic chunking and loosely grounded outputs, which can lead to inconsistent retrieval and hallucinations. In domains like insurance, where decisions depend on precise clauses, this becomes a major limitation.
This project focuses on treating retrieval and reasoning as structured, deterministic systems rather than black-box pipelines, ensuring that every output is grounded, traceable, and evaluable.
Problem & Motivation
Most RAG tutorials suggest a simple pipeline: chunk documents, embed them, and retrieve with an LLM. This works well for clean text, but breaks down in real-world documents like insurance policies.
Insurance PDFs are structurally complex, with inconsistent numbering, repeated headings, annexures, and noisy formatting. Naive token-based chunking ignores these structures, often splitting clauses incorrectly or missing important context entirely.
The core problem wasn't retrieval — it was structure.
To address this, I designed a deterministic clause parser that:
- • Detects multiple clause formats (numbered, roman, alphabetic, definitions)
- • Assigns canonical IDs to each clause for traceability
- • Enforces fail-fast behavior to avoid silent parsing errors
This ensures that each retrieval unit maps directly to a real legal clause, improving both retrieval accuracy and interpretability.
While not perfect due to challenges like multi-column layouts and inconsistent formatting, the system significantly outperforms naive chunking and provides a clear evaluation framework to iteratively improve performance.
Overview
ClaimLens is designed as a structured retrieval system rather than a naive RAG pipeline.
The system enforces:
- • Deterministic parsing for stable retrieval units
- • Canonical identifiers for traceability
- • Evaluation-driven design for measurable performance
The goal is to move from "LLM-generated answers" to reliable, reproducible decision support.
System Architecture
The architecture is designed to separate concerns across ingestion, retrieval, ranking, and reasoning, ensuring each component is independently optimizable and testable.
- • Ingestion → Page-level document loading
- • Clause Splitter → Deterministic clause extraction
- • Retriever → Dense retrieval (FAISS)
- • Reranker → Cross-Encoder ranking refinement
- • Reasoner → LLM with strict schema validation
- • Pipeline → End-to-end orchestration
Architecture
The two diagrams below translate the system description into a product view and an execution flow, making it easier to see how ClaimLens moves from a policy question to a grounded answer.
Diagram 01
ClaimLens System Surface
A high-level product view showing how the experience layer, service layer, and retrieval/reasoning engine work together.
Experience Layer
Portfolio UI / user-facing interactions
Service Layer
API orchestration and request handling
ClaimLens Engine
Clause parsing, retrieval, reranking, reasoning
Data Foundation
Model Runtime
Diagram 02
ClaimLens Retrieval and Reasoning Flow
The query is normalized, routed through retrieval, and only then passed into a constrained reasoning layer for a grounded final answer.
Coverage Query
User asks about claim eligibility or policy terms
Query Builder
Transforms the request into retrieval-friendly intent
Pipeline Orchestrator
Coordinates retrieval, reranking, and answer assembly
Retrieval Lane
Reasoning Lane
Clause Evidence
Top-ranked passages retained for answer generation
Validation Gate
Pydantic schema and retry logic enforce structure
Structured Answer
Grounded response with confidence and citations
Design Constraints
- • High precision required for legal clause interpretation
- • Inconsistent document structures across insurers
- • Need for traceable and explainable outputs
- • Minimizing hallucinations in LLM reasoning
Key Engineering Decisions
Deterministic Clause Parsing
Moved from token-based chunking to deterministic clause parsing to ensure retrieval operates on semantically meaningful and stable units, improving both recall and interpretability.
Canonical Clause IDs
Token-based chunks lacked identity across runs, making evaluation inconsistent. Introduced canonical clause identifiers so that retrieval experiments are reproducible and traceable across different queries and document versions.
Fail-Fast Design
Silent failures in LLM pipelines produce unreliable outputs that are difficult to debug. Applied fail-fast validation with explicit error handling at each stage, ensuring that failures surface immediately and prevent cascading issues downstream.
Retrieval Pipeline
- • Dense Retrieval (FAISS + BGE embeddings)
- • Top-K = 40 candidate generation
- • Cross-Encoder reranking → Top 5
- • Eliminated manual hybrid weighting
Reasoning & Validation
- • Strict JSON schema enforcement (Pydantic)
- • Citation grounding constraints
- • Retry mechanism on validation failure
What Makes This Different
Typical RAG
- • Token-based chunking
- • Weak evaluation
- • Hallucination prone
ClaimLens
- • Deterministic clause parsing
- • Canonical IDs
- • Strict validation
Evaluation
Evaluation was treated as a first-class component rather than an afterthought.
Metrics such as Recall@20 and MRR were used to measure retrieval effectiveness, ensuring that relevant clauses are consistently surfaced before reasoning.
This enabled iterative improvements in retrieval quality instead of relying on subjective output inspection.
Recall@20
0.93
MRR
0.89
Trade-offs
• Deterministic parsing increases complexity but improves consistency
• Cross-encoder reranking improves accuracy at the cost of latency
• Strict validation reduces flexibility but ensures reliability
Challenges
- • Handling inconsistent clause structures across insurers
- • Reducing noise from dense retrieval
- • Enforcing strict schema validation on LLM outputs
Future Improvements
- • Adaptive retrieval based on query intent
- • Learning-to-rank for dynamic reranking optimization
- • Feedback loop for continuous evaluation improvement
- • Integration with LangGraph for agentic workflows
What I Learned
• Retrieval quality is the primary bottleneck in RAG systems
• Evaluation is essential for iterative improvement
• Structure and constraints improve LLM reliability more than prompt tuning
Key Insight
Reliable RAG systems are not achieved by better prompts, but by designing retrieval and reasoning as structured, deterministic pipelines with measurable performance.