Stipple Datasets

Labelled financial document packs for QA, training, and fraud testing.

Use de-identified originals and synthetic documents with field-level ground truth, bounding boxes, scanned variants, and fraud labels. Test your document pipeline before real submissions reach production.

Original + synthetic

real (de-identified) and fictitious documents

20+

financial document types

Field-level

labels + bounding boxes

What ships in every pack

The value isn't the documents.It's the ground truth beside them.

The document

Original (personal data removed) or fictitious — a real-looking PDF or image.

Ground truth

Every field as structured CSV + JSONL — name, ABN, totals, dates.

Bounding boxes

Per-field coordinates for OCR and layout-model training.

Class labels

Document type, genuine-vs-tampered, and AI-vs-human, per document.

Scanned variant

A photographed/scanned copy — rotation, compression, noise — to mirror real capture.

Manifest + datasheet

Provenance, PII method, label schema, and class balance for the whole set.

Sample documents

See what a pack looks like.

Rendered samples with fictional data — every document is marked a synthetic sample. Production packs add the ground truth, bounding boxes, and a scanned variant.

Income & employmentPayslip (AU)
1 / 5

Harbourline Advisory Pty Ltd

ABN 12 345 678 901 · Pay advice

Synthetic sample

Hours Paid

38

Gross Earnings

$2,940.00

Net Payment

$2,234.00

Super

$338.10

EmployeeJordan AveryAddress14 Marina Way, Brookvale NSW 2100
Period ending04 Jun 2026Date paid05 Jun 2026
EarningsRateThis pay
Permanent ordinary hours (38)77.372,940.00
PAYG withholding−706.00
Superannuation (SGC 11.5%)338.10
Net payment2,234.00
Bank payments
Acct 062xxx ****1290$2,234.00
Leave (annual)
Accrued12.67 hRemaining151.99 h
The catalog

Financial documents we cover.

Each type is available as synthetic, de-identified, or both. Don't see what you need? It's a commission away.

Income & employment

  • Payslips
  • Employment letters
  • Employment contracts
  • PAYG summaries

Banking

  • Bank statements
  • Credit-card statements
  • Transaction histories

Tax & government

  • Tax returns
  • ATO assessment notices
  • BAS
  • Centrelink statements

Trade & expenses

  • Tax invoices
  • Receipts
  • Quotes

Lending & property

  • Loan applications
  • Mortgage packs
  • Rental ledgers
  • Tenancy agreements

Wealth & business

  • Superannuation statements
  • P&L statements
  • Balance sheets
How it's made

Generated, labelled, and de-identified in one pass.

01

Source

Original documents collected with rights, and fictitious sets generated from reference layouts — at a class balance you control.

02

De-identify

Personal data is removed from every original document. The PII itself is never provided.

03

Label

Type, fields, bounding boxes, and genuine / tampered / AI flags, produced in the same pass.

04

Package

Each set ships with a datasheet, a manifest, and a single-buyer licence.

Pricing

Start with a sample. Scale to a library.

Free sample

2 submission packs

Free

First look. Review the schema and the datasheet.

Request free sample
Most popular

QA Sprint Pack

10 packs + red-flag summary + 30-min handover

AUD $2,500

Pipeline QA. Vendor evaluation.

Request sprint pack

Production library

100+ submission packs

Contact for quote

Production regression suite. Internal QA at scale.

Contact us

Training library

1,000+ packs · train / val / test splits

Contact for quote

ML model fine-tuning at scale.

Contact us

Prices in AUD. Libraries are quoted by volume and label depth; ask about a quarterly refresh for ongoing QA and training.

Why Stipple datasets

Built by a document-verification team.

Real-looking documents

Realistic layouts and value ranges, with visual variety across a set — not the same template ten times.

Ground truth included

CSV / JSONL field values and bounding boxes ship with every document, ready for training.

Scanned variants

Photographed copies with rotation, compression, and noise — so models learn real-world capture.

Reproducible by seed

Sets are regenerable from a seed, so a regression suite stays stable run to run.

Personal data removed

PII is stripped from every original and never provided. Licensed to you for internal use — not for resale or redistribution.

Direct delivery

Signed download, no third-party data brokers in the middle.

Three ways in

Off the shelf, to spec, or contribute.

Catalog

Off-the-shelf datasets by document type — original (de-identified) and synthetic, ready to download. Start with a free sample.

Request a free sample

Custom

A bespoke set to your spec — synthetic-to-spec, or bring your own corpus and we de-identify, label, and augment it under a processing agreement.

Commission a set

Contribute documents safely

Contribute your own financial documents and earn a reward. You grant usage rights at submission; we de-identify everything before it ships.

Contribute documents
Original and synthetic. Personal data removed.

No personal data ships. Ever.

Original and synthetic

Real, de-identified documents and fictitious generated sets, side by side — the mix is the value.

Personal data removed

Every original has its PII stripped before it ships, then re-scanned and reviewed. The personal data is never provided.

Documented in a datasheet

Every set records its provenance, PII method, label schema, and class balance.

Licensed, not resold

Datasets are licensed to you for internal use — they cannot be reused, resold, or redistributed.

Original (de-identified) and synthetic documents are licensed for internal software testing and model training only. They are not genuine documents, must not be used as identity, income, or financial evidence or for lending, underwriting, claims, regulatory, or legal purposes, and may not be resold, redistributed, or shared beyond the licensed organisation.

Try the libraries — start with two packs, free.

No credit card. Same-day delivery.