Colorful abstract background with a black triangle on the left side and difficult-to-read black text in the middle.

Technical Reference

Overview

This page provides structural and modelling context for the Syntheticr trial dataset. It is intended to support technical users processing the data within AML systems, rules engines, or machine learning workflows.

For detailed field-level definitions, refer to the Data Dictionary (XLSX) linked below.

Download Data Dictionary (XLSX) →

Dataset Overview

The Syntheticr trial dataset represents a single synthetic financial institution operating within a broader financial ecosystem.

The dataset spans 24 months of activity and includes:

Customer profiles (individual and business)
Customer-related parties
Financial transactions
Risk intelligence
AML alerts

The dataset is ISO20022-aligned and structured to reflect operational banking environments.

Temporal Structure

The dataset spans 24 months.

Months 1–6: Behavioural calibration period (no alerts generated)
Months 1–18: Risk intelligence available
Months 19–24: Unlabelled “greenfield” period

The final 6 months contain no risk intelligence. This is intentional and designed to support unbiased evaluation and model validation.

Risk intelligence data should not be assumed to exist beyond month 18.

Key Join Principles

When joining tables, always include bank_id in join conditions.

Core identifiers include:

entity_id (customer-level identifier)
transaction_id
alert_id (where applicable)

Transactions link to entities via entity_id.

Risk intelligence and AML alerts link to entities using entity_id.

Related party relationships link entities to other entities via defined relationship types.

Ensure join logic reflects the multi-table nature of the dataset rather than assuming a flattened structure.

Entity Layer

The customer profile tables contain:

KYC information
Customer type (individual or business)
Risk level indicators
Geographic and behavioural attributes

Business entities may include:

Directors
UBO/PSC relationships
Related party linkages

Customer risk levels do not guarantee criminality. They are behavioural signals within the ecosystem.

Transaction Layer

Transaction tables represent a range of payment types, including:

Domestic transfers
International transfers
Card activity
Cash activity
Standing orders and direct debits

Transaction volume varies significantly across entities. This is intentional and reflects realistic customer behaviour distributions.

Detection performance may vary across transaction types and volume bands.

Risk Intelligence and Alerts

Risk intelligence includes:

Alerts
SAR indicators
Exit markers

Risk intelligence contains both true positives and false positives. This is deliberate and designed to reflect real-world signal-to-noise conditions.

SAR filing does not automatically result in exit.

Operational close-out windows may show post-exit transaction activity.

Risk intelligence should not be treated as a training label without understanding its structure and temporal limits.

Network and Related Parties

The dataset includes:

Multi-entity networks
Cross-institutional networks
UBO/PSC links
Shared directors
Shared addresses
Family relationships

Network detection performance is a core dimension of the Syntheticr scorecard.

Users should consider both entity-level detection and network-level detection when evaluating system behaviour.

Scorecard Context

The Syntheticr scorecard evaluates detection performance against fully known ground truth.

Key performance concepts:

Detection rate: proportion of criminal entities detected
Precision: proportion of alerts that correctly identify criminal entities
Detection grade: entity-level grading based on detection rate

Detection capability should be interpreted alongside precision to avoid over-alerting or under-detection.

The scorecard measures detection capability only. It does not assess investigative workflow quality.

Data Dictionary

For complete field-level definitions and table structures:

Download Data Dictionary (XLSX) →

If you encounter ambiguity in table structure or joins, contact hello@syntheticr.ai.