Technical Reference

Overview

This page provides structural and modelling context for the Syntheticr trial dataset. It is intended to support technical users processing the data within AML systems, rules engines, or machine learning workflows.

For detailed field-level definitions, refer to the Data Dictionary (XLSX) linked below.

Download Data Dictionary (XLSX) →

Dataset Overview

The Syntheticr trial dataset represents a single synthetic financial institution operating within a broader financial ecosystem.

The dataset spans 24 months of activity and includes:

  • Customer profiles (individual and business)

  • Customer-related parties

  • Financial transactions

  • Risk intelligence

  • AML alerts

The dataset is ISO20022-aligned and structured to reflect operational banking environments.

Temporal Structure

The dataset spans 24 months.

  • Months 1–6: Behavioural calibration period (no alerts generated)

  • Months 1–18: Risk intelligence available

  • Months 19–24: Unlabelled “greenfield” period

The final 6 months contain no risk intelligence. This is intentional and designed to support unbiased evaluation and model validation.

Risk intelligence data should not be assumed to exist beyond month 18.

Key Join Principles

When joining tables, always include bank_id in join conditions.

Core identifiers include:

  • entity_id (customer-level identifier)

  • transaction_id

  • alert_id (where applicable)

Transactions link to entities via entity_id.

Risk intelligence and AML alerts link to entities using entity_id.

Related party relationships link entities to other entities via defined relationship types.

Ensure join logic reflects the multi-table nature of the dataset rather than assuming a flattened structure.

Entity Layer

The customer profile tables contain:

  • KYC information

  • Customer type (individual or business)

  • Risk level indicators

  • Geographic and behavioural attributes

Business entities may include:

  • Directors

  • UBO/PSC relationships

  • Related party linkages

Customer risk levels do not guarantee criminality. They are behavioural signals within the ecosystem.

Transaction Layer

Transaction tables represent a range of payment types, including:

  • Domestic transfers

  • International transfers

  • Card activity

  • Cash activity

  • Standing orders and direct debits

Transaction volume varies significantly across entities. This is intentional and reflects realistic customer behaviour distributions.

Detection performance may vary across transaction types and volume bands.

Risk Intelligence and Alerts

Risk intelligence includes:

  • Alerts

  • SAR indicators

  • Exit markers

Risk intelligence contains both true positives and false positives. This is deliberate and designed to reflect real-world signal-to-noise conditions.

SAR filing does not automatically result in exit.

Operational close-out windows may show post-exit transaction activity.

Risk intelligence should not be treated as a training label without understanding its structure and temporal limits.

Network and Related Parties

The dataset includes:

  • Multi-entity networks

  • Cross-institutional networks

  • UBO/PSC links

  • Shared directors

  • Shared addresses

  • Family relationships

Network detection performance is a core dimension of the Syntheticr scorecard.

Users should consider both entity-level detection and network-level detection when evaluating system behaviour.

Scorecard Context

The Syntheticr scorecard evaluates detection performance against fully known ground truth.

Key performance concepts:

  • Detection rate: proportion of criminal entities detected

  • Precision: proportion of alerts that correctly identify criminal entities

  • Detection grade: entity-level grading based on detection rate

Detection capability should be interpreted alongside precision to avoid over-alerting or under-detection.

The scorecard measures detection capability only. It does not assess investigative workflow quality.

Data Dictionary

For complete field-level definitions and table structures:

Download Data Dictionary (XLSX) →

If you encounter ambiguity in table structure or joins, contact hello@syntheticr.ai.