← Selected work
05

Multimodal document pipelines

Document and image processing pipelines on Claude Vision and GPT-4V, with Gemini added recently for cost reasons. Handles the first pass of compliance reviews that used to sit with analysts.

Claude VisionGPT-4VGeminiOCRCompliance review automation
Use case
First-pass compliance review
Originally done by
Human analysts

The problem

Compliance review meant analysts opening PDFs and scanning for specific patterns. Hundreds a day. The patterns were teachable but rote. The work was the kind that makes good analysts quit.

The shape

A pipeline that takes scanned and digital documents, OCRs where needed, hands them to a Vision model with a structured extraction prompt, and produces a JSON report flagged with the patterns found. The analyst reviews the JSON, not the PDF. Spot-check rate is much higher than full read rate.

Key decisions

What broke

Early prompts asked for “anything suspicious.” That returned everything. The current prompts enumerate the patterns by name with examples and require citation. False positives dropped, recall held.

← All work Get in touch →