Skip to content

How Capture processes documents

When you submit a document to Capture, it passes through a multi-stage pipeline before landing in your review queue. This guide explains each stage and what the AI extracts.

The first stage classifies the document and checks it’s suitable for processing.

The AI determines:

  • Document type: Is this a bill (accounts payable) or a sales invoice (accounts receivable)?
  • Document validity: Is this a genuine financial document?
  • Rotation: Is the document upside down or rotated?
  • Multiple documents: Does the image contain more than one document?

If the document can’t be classified, it’s marked as Needs classification for you to set the type manually. If multiple documents are detected, it’s marked as Needs manual split.

The AI reads the document and extracts structured data. This is the core of the pipeline.

Header fields:

  • Supplier or customer name
  • Invoice number
  • Issue date and due date
  • Currency

Financial data:

  • Net amount, tax amount, and total amount
  • Line items with description, quantity, unit amount, and line amount
  • Tax rate and account code for each line item

The AI provides reasoning for each extracted field, explaining where it found the data on the document and why it chose specific values. This reasoning is preserved and available during review.

Extracted amounts are checked for arithmetic consistency:

  • Do line item amounts add up correctly?
  • Does the sum of line items match the header totals (net, tax, total)?

Validation uses a small tolerance to account for rounding differences. Mismatches are flagged for your attention during review but don’t prevent the document from being processed.

The extracted supplier or customer name is matched against your existing contacts:

  1. Exact match: A case-insensitive comparison against existing contact names
  2. Fuzzy match: Capture will try to find a similar match to account for different formats, such as Amazon, Amazon.co.uk, Amazon.com

The final stage sets the document’s review status based on the processing outcome:

  • Needs review: Everything processed successfully, ready for you to check
  • Needs classification: Document type couldn’t be determined
  • Needs manual split: Multiple documents detected in one image
  • Triage failed: Document was rejected during triage (not a financial document)
  • Extraction failed: AI couldn’t extract data from the document

If any stage encounters a temporary error (such as a service being briefly unavailable), the error is translated into a clear message explaining what happened.

Here’s a complete list of the data Capture extracts from each document:

FieldDescription
Supplier/customer nameThe name of the company or individual on the document
Invoice numberThe invoice or receipt reference number
Issue dateWhen the document was issued
Due dateWhen payment is due
CurrencyThe currency code (e.g., GBP, USD, EUR)
Net amountTotal before tax
Tax amountTotal tax
Total amountGrand total including tax
Line itemsIndividual items with description, quantity, unit amount, line amount, tax rate, and account code

Most documents are processed within a few minutes of submission. Processing time can vary depending on:

  • Document complexity (number of line items)
  • Image quality and clarity
  • Current queue volume