Back
Live demoOpen source

LP Diligence Agent

A multi-step agent that reads private-equity fund quarterly reports and runs a nine-item LP diligence checklist

Every LP and fund-of-funds manager reads hundreds of GP quarterly reports per quarter. Most of the work is locating the same nine or ten figures inside differently structured documents. This prototype performs that extraction in under a minute per report, with citation-required answers and an explicit refusal mode when data is missing or redacted.

PythonFastAPIAnthropic ClaudeMCPsqlite-vecRAGEval harnessNext.js

The checklist

01NAV change drivers
02Capital called (period + cumulative)
03Distributions and DPI
04IRR / TVPI / DPI vs. prior periods
05Top holdings movement
06Unfunded commitment
07Fee and expense anomalies
08Key-person and GP events
09Valuation methodology and Level 1/2/3 hierarchy

How it works

01
Ingest and chunk

Documents land as PDFs (PSERS quarterly reports) or HTML (SEC 10-Qs). A section detector splits each into named sections with page anchors. A deterministic sentence-aware chunker turns the text into ~500-token windows with overlap.

02
Retrieve and synthesize

Each checklist item is its own retrieval + LLM call. Local sentence-transformers embeddings and sqlite-vec cosine search surface the top-8 chunks per query. Anthropic Claude synthesizes a citation-tagged answer with prompt caching on the system prompt.

03
Refuse honestly

The system prompt requires a citation for every numeric claim and forces the LLM to return 'data not available' when retrieved excerpts can't support an answer. Each response is tagged high, medium, or refused — and the refusals are evaluated as a first-class metric.

Cross-document behavior

The agent runs the same checklist against documents that disclose different things. PSERS quarterly reports redact fee terms and don't discuss valuation methodology; SEC 10-Qs disclose both. The interesting signal isn't that the agent answers when data exists — it's that it refuses correctly when it doesn't.

Checklist item
PSERS 2017
Blackstone PES 2025
Fee and expense anomalies
refused
Fee detail redacted in source
high
1.25% Management Fee, $22.0M Q1 2025 (vs $5.7M Q1 2024); flagged July 2024 waiver expiration
Valuation methodology
refused
Not disclosed in PSERS quarterly format
medium
ASC 820, income approach (DCF) primary; Level III $6.39B vs $4.82B; WACC 9.1-31.5%
Key-person events
refused
Correctly refused — no narrative section
refused
Correctly refused — no key-person disclosure

Eval

20-question golden set, judge-scored

A hand-curated golden set covers all five documents and all nine checklist items, including questions targeting redacted or out-of-scope content (which the agent must refuse to score well). Faithfulness, context recall, and context precision are scored by Claude Haiku as judge. Numbers below are published verbatim from the latest run, including the worst-performing rows. Tuning to beat the metrics is out of scope for v1; the next iteration would add chunk re-ranking and per-section query rewriting to lift retrieval precision.

0.78
Faithfulness
Claims supported by citations
0.80
Refusal correctness
16/20 refusal decisions matched
0.80
Keyword match
Expected anchors appeared in answer
0.62
Context recall
Relevant chunks were retrieved
0.60
Context precision
Of retrieved, share that was relevant
4.2s
Avg latency
Per checklist item, end-to-end

Corpus

Five publicly available documents. Three consecutive quarters of PSERS Hamilton Lane quarterly reports (2017), FOIA-released via the Pennsylvania Joint State Government Commission Act 5 archive — the closest public analog to GP-to-LP quarterly communications, with redacted fee terms that exercise the agent's refusal mode by design. Two Blackstone Private Equity Strategies Fund 10-Q filings (Q1 and Q3 2025) cover what the PSERS reports redact: fund-level fee detail and the Level 1/2/3 fair value hierarchy.

Five documents is enough to demonstrate both single-document retrieval and cross-document behavior without overstating coverage. A production deployment would replace the hand-curated corpus with the firm's existing fund-master tables and document store.

Guardrails

Citation required

Every numeric claim must be followed by a bracketed citation label matching a retrieved chunk. The LLM cannot answer from general knowledge.

Refusal as a first-class metric

The eval penalizes both false refusals and false confidences. Refusing when data exists is as bad as fabricating when it doesn't.

Audit trail

Every API and MCP call returns the retrieved chunks, the citation labels cited, the model used, and input/output token counts.

Local-first ingestion

Document parsing and embedding run locally. Only synthesis calls touch a hosted LLM. The synthesis provider is configurable per environment.

Engineering

Python backend with FastAPI + sqlite-vec + sentence-transformers, Anthropic Claude for synthesis, Haiku as the eval judge. The same tools are exposed via an MCP server (usable directly from Claude Desktop) and a Next.js frontend. ~600 chunks across the five-document corpus. The agent uses prompt caching on the system prompt so repeated checklist runs amortize most of the input cost. A checklist run on Anthropic Sonnet 4.6 costs about $0.13 end-to-end.