Why production data is the wrong foundation for AML testing

Most Anti-Money Laundering (AML) testing and training relies on production data. That data was not created for testing, validation, or performance measurement. It is legally sensitive, difficult to access, historically biased, and severely label-scarce. Confirmed money laundering is rare by definition.

As a result, production data produces weak and often misleading evidence about AML system performance. This problem becomes more acute as institutions adopt machine learning and AI-based detection methods. Synthetic data with known ground truth has emerged as the only practical way to test AML systems objectively, repeatably, and without production risk.

Production data was not designed for AML testing

Production transaction data exists to run the business and meet operational obligations. It was not designed to support controlled testing, comparison, or benchmarking.

There are four structural limitations that make production data a poor foundation for AML testing.

Access and sharing constraints - Real AML data is highly sensitive. Privacy, confidentiality, and governance requirements restrict who can access it, where it can be used, and whether it can be shared externally. As a result, there are effectively no publicly available, real-world AML datasets suitable for benchmarking or independent evaluation. This is why academic and industry research has relied on synthetic benchmarks to compare AML detection methods.

Lack of reliable ground truth - Operational outcomes such as alerts, case decisions, and SAR filings are not equivalent to confirmed money laundering. Many laundering events are never detected. Many alerts do not correspond to criminal activity. Production data therefore lacks reliable labels for true positives and true negatives, which undermines objective measurement.

Bias and feedback effects - Production data reflects historic policies, rule sets, investigation capacity, and prior detection decisions. These factors shape what is labelled, investigated, and learned. When models are trained or evaluated on this data, existing biases and blind spots are reinforced rather than exposed.

Weak evidence for critical decisions - When the data foundation is unstable, it becomes difficult to answer basic questions with confidence. Did a model change actually improve detection? Is one system better than another? Did a tuning exercise introduce new risks? Production data rarely provides clear answers.

The UK Financial Conduct Authority has highlighted data access as a key barrier to innovation in AML detection. This is a central reason why controlled testing environments and synthetic datasets have been explored through initiatives such as the Digital Sandbox and related research programmes.

AI makes AML performance harder to measure, not easier

Machine learning and AI models introduce additional complexity into AML testing.

These models are sensitive to data distributions, historical decisions, and feedback loops. Small changes in customer behaviour, product mix, or investigation practices can materially affect apparent performance. Aggregate metrics can improve even when detection capability does not.

AI systems can also optimise against incomplete or biased signals. This creates the risk that models appear to improve while overfitting to historic patterns or degrading performance in less visible segments or typologies.

For these reasons, regulators and innovation bodies have emphasised the need for controlled experimentation. In public commentary on AI in financial services, the FCA has described the use of synthetic datasets derived from real-world money laundering cases to support testing and evaluation of AML detection tools.

Operational outcomes hide real performance gaps

Transaction monitoring environments are characterised by high alert volumes and significant noise. False positives dominate operational output in many systems. While reported rates vary by institution and configuration, the practical effect is consistent. Analysts spend most of their time clearing alerts that do not correspond to laundering activity.

From a testing perspective, this matters because operational throughput is not a proxy for detection quality. When outcomes are driven by noise and proxy signals, production data obscures true system behaviour. Performance gaps remain hidden, and improvement claims are difficult to validate.

Why synthetic data enables objective AML testing

Synthetic data is valuable for AML testing because it can be engineered for measurement.

Known ground truth - Synthetic datasets embed defined money laundering activity across transactions, customers, and networks. This makes it possible to measure true positives, false negatives, and typology coverage directly.

Controlled scenarios - Testing scenarios can be designed to reflect specific laundering methodologies, network structures, customer segments, and edge cases. This reveals failure modes that production data rarely exposes.

Repeatability and comparison - The same dataset can be reused to compare systems, validate changes, and track performance over time. This supports like-for-like evaluation and defensible benchmarking.

Safe collaboration - Because synthetic data does not contain real customer information, it can be used across teams, vendors, and consultancies without exposing sensitive production records.

Academic benchmark datasets such as SynthAML were created for exactly this reason. Real AML data cannot be shared publicly, and labels are incomplete. Synthetic benchmarks make evaluation possible where production data cannot.

What Syntheticr provides

Syntheticr replaces production data as the basis for AML testing and training. The platform combines:

  • Synthetic datasets covering transactions, customer profiles, and risk intelligence, with embedded money laundering activity and known ground truth

  • Objective performance scorecards that measure detection capability directly and show where systems underperform, and why

  • Workflows for repeatable testing, used for one-off assessments or embedded into ongoing improvement cycles

Syntheticr is designed to answer a simple question with confidence: how well does an AML system actually detect money laundering, and where does it fail?

FAQs

To learn more about Syntheticr and how it is used to test and evaluate AML systems without production data, see our FAQs

References and further reading

FCA Digital Sandbox (overview)

FCA Digital Sandbox – Authorised Push Payment (APP) synthetic data page

FCA speech: “AI: Flipping the coin in financial services”

FCA research article: “Exploring Synthetic Data Validation – privacy, utility and fidelity”

Alan Turing Institute project: “Synthetic Data for Anti-Money Laundering”

Alan Turing Institute news post about the AML synthetic data project

SynthAML paper (PDF): Nature Scientific Data (2023) “SynthAML: a synthetic data set to benchmark anti-money laundering methods”

ArXiv paper: “Realistic Synthetic Financial Transactions for Anti-Money Laundering Models”

Previous
Previous

What synthetic AML testing actually measures

Next
Next

FCA: Authorised Push Payment Synthetic Data