Labelled financial document packs for QA, training, and fraud testing.
Use de-identified originals and synthetic documents with field-level ground truth, bounding boxes, scanned variants, and fraud labels. Test your document pipeline before real submissions reach production.
Original + synthetic
real (de-identified) and fictitious documents
20+
financial document types
Field-level
labels + bounding boxes
The value isn't the documents.It's the ground truth beside them.
The document
Original (personal data removed) or fictitious — a real-looking PDF or image.
Ground truth
Every field as structured CSV + JSONL — name, ABN, totals, dates.
Bounding boxes
Per-field coordinates for OCR and layout-model training.
Class labels
Document type, genuine-vs-tampered, and AI-vs-human, per document.
Scanned variant
A photographed/scanned copy — rotation, compression, noise — to mirror real capture.
Manifest + datasheet
Provenance, PII method, label schema, and class balance for the whole set.
See what a pack looks like.
Rendered samples with fictional data — every document is marked a synthetic sample. Production packs add the ground truth, bounding boxes, and a scanned variant.
Harbourline Advisory Pty Ltd
ABN 12 345 678 901 · Pay advice
Hours Paid
38
Gross Earnings
$2,940.00
Net Payment
$2,234.00
Super
$338.10
Financial documents we cover.
Each type is available as synthetic, de-identified, or both. Don't see what you need? It's a commission away.
Income & employment
- Payslips
- Employment letters
- Employment contracts
- PAYG summaries
Banking
- Bank statements
- Credit-card statements
- Transaction histories
Tax & government
- Tax returns
- ATO assessment notices
- BAS
- Centrelink statements
Trade & expenses
- Tax invoices
- Receipts
- Quotes
Lending & property
- Loan applications
- Mortgage packs
- Rental ledgers
- Tenancy agreements
Wealth & business
- Superannuation statements
- P&L statements
- Balance sheets
Generated, labelled, and de-identified in one pass.
01
Source
Original documents collected with rights, and fictitious sets generated from reference layouts — at a class balance you control.
02
De-identify
Personal data is removed from every original document. The PII itself is never provided.
03
Label
Type, fields, bounding boxes, and genuine / tampered / AI flags, produced in the same pass.
04
Package
Each set ships with a datasheet, a manifest, and a single-buyer licence.
Start with a sample. Scale to a library.
Free sample
2 submission packs
Free
First look. Review the schema and the datasheet.
Request free sampleQA Sprint Pack
10 packs + red-flag summary + 30-min handover
AUD $2,500
Pipeline QA. Vendor evaluation.
Request sprint packProduction library
100+ submission packs
Contact for quote
Production regression suite. Internal QA at scale.
Contact usTraining library
1,000+ packs · train / val / test splits
Contact for quote
ML model fine-tuning at scale.
Contact usPrices in AUD. Libraries are quoted by volume and label depth; ask about a quarterly refresh for ongoing QA and training.
Built by a document-verification team.
Real-looking documents
Realistic layouts and value ranges, with visual variety across a set — not the same template ten times.
Ground truth included
CSV / JSONL field values and bounding boxes ship with every document, ready for training.
Scanned variants
Photographed copies with rotation, compression, and noise — so models learn real-world capture.
Reproducible by seed
Sets are regenerable from a seed, so a regression suite stays stable run to run.
Personal data removed
PII is stripped from every original and never provided. Licensed to you for internal use — not for resale or redistribution.
Direct delivery
Signed download, no third-party data brokers in the middle.
Off the shelf, to spec, or contribute.
Catalog
Off-the-shelf datasets by document type — original (de-identified) and synthetic, ready to download. Start with a free sample.
Request a free sampleCustom
A bespoke set to your spec — synthetic-to-spec, or bring your own corpus and we de-identify, label, and augment it under a processing agreement.
Commission a setContribute documents safely
Contribute your own financial documents and earn a reward. You grant usage rights at submission; we de-identify everything before it ships.
Contribute documentsNo personal data ships. Ever.
Original and synthetic
Real, de-identified documents and fictitious generated sets, side by side — the mix is the value.
Personal data removed
Every original has its PII stripped before it ships, then re-scanned and reviewed. The personal data is never provided.
Documented in a datasheet
Every set records its provenance, PII method, label schema, and class balance.
Licensed, not resold
Datasets are licensed to you for internal use — they cannot be reused, resold, or redistributed.
Original (de-identified) and synthetic documents are licensed for internal software testing and model training only. They are not genuine documents, must not be used as identity, income, or financial evidence or for lending, underwriting, claims, regulatory, or legal purposes, and may not be resold, redistributed, or shared beyond the licensed organisation.
Try the libraries — start with two packs, free.
No credit card. Same-day delivery.