Live demoOpen source

LP Diligence Agent

A multi-step agent that reads private-equity fund quarterly reports and runs a nine-item LP diligence checklist

Every LP and fund-of-funds manager reads hundreds of GP quarterly reports per quarter. Most of the work is locating the same nine or ten figures inside differently structured documents. This prototype performs that extraction in under a minute per report, with citation-required answers and an explicit refusal mode when data is missing or redacted.

Try the live demo View source on GitHub

PythonFastAPIAnthropic ClaudeMCPsqlite-vecRAGEval harnessNext.js

The checklist

01NAV change drivers

02Capital called (period + cumulative)

03Distributions and DPI

04IRR / TVPI / DPI vs. prior periods

05Top holdings movement

06Unfunded commitment

07Fee and expense anomalies

08Key-person and GP events

09Valuation methodology and Level 1/2/3 hierarchy

How it works

Ingest and chunk

Documents land as PDFs (PSERS quarterly reports) or HTML (SEC 10-Qs). A section detector splits each into named sections with page anchors. A deterministic sentence-aware chunker turns the text into ~500-token windows with overlap.

Retrieve and synthesize

Each checklist item is its own retrieval + LLM call. Local sentence-transformers embeddings and sqlite-vec cosine search surface the top-8 chunks per query. Anthropic Claude synthesizes a citation-tagged answer with prompt caching on the system prompt.

Refuse honestly

The system prompt requires a citation for every numeric claim and forces the LLM to return 'data not available' when retrieved excerpts can't support an answer. Each response is tagged high, medium, or refused, and the refusals are evaluated as a first-class metric.

Cross-document behavior

The agent runs the same checklist against documents that disclose different things. PSERS quarterly reports redact fee terms and don't discuss valuation methodology; SEC 10-Qs disclose both. The interesting signal isn't that the agent answers when data exists. It's that it refuses correctly when it doesn't.

Checklist item

PSERS 2017

Blackstone PES 2025

Fee and expense anomalies

refused

Fee detail redacted in source

high

1.25% Management Fee, $22.0M Q1 2025 (vs $5.7M Q1 2024); flagged July 2024 waiver expiration

Valuation methodology

refused

Not disclosed in PSERS quarterly format

medium

ASC 820, income approach (DCF) primary; Level III $6.39B vs $4.82B; WACC 9.1-31.5%

Key-person events

refused

Correctly refused: no narrative section

refused

Correctly refused: no key-person disclosure

Eval

20-question golden set, judge-scored

A hand-curated golden set covers all five documents and all nine checklist items, including questions targeting redacted or out-of-scope content (which the agent must refuse to score well). Faithfulness, context recall, and context precision are scored by Claude Haiku as judge. Numbers below are published verbatim from the latest run, including the worst-performing rows. Tuning to beat the metrics is out of scope for v1; the next iteration would add chunk re-ranking and per-section query rewriting to lift retrieval precision.

0.78

Faithfulness

Claims supported by citations

0.80

Refusal correctness

16/20 refusal decisions matched

0.80

Keyword match

Expected anchors appeared in answer

0.62

Context recall

Relevant chunks were retrieved

0.60

Context precision

Of retrieved, share that was relevant

4.2s

Avg latency

Per checklist item, end-to-end

Corpus

Five publicly available documents. Three consecutive quarters of PSERS Hamilton Lane quarterly reports (2017), FOIA-released via the Pennsylvania Joint State Government Commission Act 5 archive, the closest public analog to GP-to-LP quarterly communications, with redacted fee terms that exercise the agent's refusal mode by design. Two Blackstone Private Equity Strategies Fund 10-Q filings (Q1 and Q3 2025) cover what the PSERS reports redact: fund-level fee detail and the Level 1/2/3 fair value hierarchy.

Five documents is enough to demonstrate both single-document retrieval and cross-document behavior without overstating coverage. A production deployment would replace the hand-curated corpus with the firm's existing fund-master tables and document store.

Guardrails

Citation required

Every numeric claim must be followed by a bracketed citation label matching a retrieved chunk. The LLM cannot answer from general knowledge.

Refusal as a first-class metric

The eval penalizes both false refusals and false confidences. Refusing when data exists is as bad as fabricating when it doesn't.

Audit trail

Every API and MCP call returns the retrieved chunks, the citation labels cited, the model used, and input/output token counts.

Local-first ingestion

Document parsing and embedding run locally. Only synthesis calls touch a hosted LLM. The synthesis provider is configurable per environment.

Engineering

Python backend with FastAPI + sqlite-vec + sentence-transformers, Anthropic Claude for synthesis, Haiku as the eval judge. The same tools are exposed via an MCP server (usable directly from Claude Desktop) and a Next.js frontend. ~600 chunks across the five-document corpus. The agent uses prompt caching on the system prompt so repeated checklist runs amortize most of the input cost. A checklist run on Anthropic Sonnet 4.6 costs about $0.13 end-to-end.