How much did the Arup deepfake case cost?

USD $25.6M (HK$200M) across 15 transfers to five Hong Kong bank accounts in February 2024. None of the funds were recovered as of Arup's public disclosure in May 2024. The case is the canonical reference for deepfake-enabled whaling and the largest publicly-disclosed deepfake-vishing single-event loss to date.

What stops a deepfake-vishing attack?

Procedural defences only. The technical countermeasures against deepfake voice (audio-watermarking, voice-authentication, anti-deepfake AI detection) are immature and inconsistent in practice. The only meaningful defence is a hard organisational rule that out-of-pattern wire requests are verified by an out-of-band channel (phone call to a known number, in-person confirmation, hardware-token approval) that the attacker cannot simultaneously control.

Did the Ferrari and WPP cases succeed?

No. Ferrari September 2024 (CEO Vigna impersonation): the targeted executive asked a verification question about a recently-recommended book that the attacker could not answer. WPP May 2024 (CEO Read impersonation via WhatsApp video): the targeted executive triggered an out-of-band confirmation procedure. Both attempts failed because the targeted employees recognised that the in-channel verification was insufficient and stepped to an out-of-channel verification.

Filing 07.02.00Field 27 APR 2026Classification PublicStatus Open

Deepfake vishing: the $25M Arup case and the procedural-defence imperative

Q: What is deepfake vishing?

Deepfake vishing combines voice-cloning technology with traditional voice-phishing (phone-based social engineering). The attacker uses 60 seconds or more of source audio to produce a real-time voice clone of an executive, then calls a target employee impersonating that executive to request a high-value action (typically a wire transfer).

Q: What is the Q1 2025 wave?

Aggregating publicly-disclosed deepfake-call whaling losses across Q1 2025 produces a conservative floor of $200M+ in defender losses, drawn from SEC 8-K filings, court records, and major-incident press disclosures. The true figure is materially higher because most settled losses are not disclosed publicly.

Q: How much does a voice clone cost the attacker?

Sub-$10 per month for consumer-grade hosted services that produce a real-time clone from 60 seconds of source audio. Free for open-weight models that run locally on a gaming-tier GPU. The toolchain cost is approximately five orders of magnitude below the typical attack outcome value, which means the attacker economics are heavily favourable.

Deepfake vishing produced its first 9-figure single-event loss at Arup Hong Kong in February 2024 (USD $25.6M, 15 transfers, zero recovery). The Q1 2025 wave produced $200M+ in publicly-reported losses across the disclosed cohort. The technical countermeasures remain immature; procedural defence is the only meaningful control.

Exhibit A

The category as it exists in 2026

Deepfake vishing emerged as a distinct attack category through 2023-2024 and became the dominant high-impact whaling pattern in 2025. The category combines voice-cloning technology (which became commodity-priced through 2023) with traditional voice-phishing techniques (which have existed for decades). The combination produces an attack that defeats the legacy out-of-band verification procedures that organisations relied on for two decades.

The pattern works as follows. The attacker harvests public audio of a target executive (earnings calls, conference recordings, podcast appearances, corporate-communications videos). Using a current-generation voice-cloning model, the attacker produces a real-time voice clone capable of synthesising the executive's voice in arbitrary text input. The attacker calls a finance-team or treasury-team employee, impersonating the executive, and requests a high-value action: typically a confidential wire transfer, a banking-detail change, or a sensitive document release. The targeted employee, hearing what sounds like a familiar voice with familiar vocal mannerisms, may attempt to verify the request via the channel under attack (calling the number that called them, joining a video conference with the apparent executive) and find that the verification appears to succeed because the attacker controls the verification channel as well as the request channel.

The Arup Hong Kong case of February 2024 is the canonical reference for the category. The Q1 2025 wave produced $200M+ in publicly-reported losses across the disclosed cohort. The true figure is materially higher because most settled deepfake-vishing losses are not disclosed publicly. The trajectory through 2025-2026 has been one of continued growth in both incident count and per-event loss, with the technical countermeasures remaining structurally behind the offensive capability.[Arup public statement May 2024 + aggregated Q1 2025 SEC 8-K filings + Keepnet 2025 State of Vishing surge data]

Exhibit B

The Arup Hong Kong case in detail

CASE FILE

In late January 2024, an Arup employee in the firm's Hong Kong office received an email purportedly from the CFO requesting a confidential transaction. The employee was suspicious because confidential-transaction language is itself a known phishing lure. To verify, the employee joined a video conference call where the CFO, two other senior leaders, and several colleagues were present and visibly authorised the transaction. Over the following weeks the employee initiated 15 transfers totalling approximately HK$200M (USD $25.6M) to five Hong Kong bank accounts. Every other participant on the video call had been a deepfake reconstruction built from publicly-available video and audio of the executives.

The technical anatomy is now well understood: attackers harvested public video of the executives from earnings calls, conference appearances, and corporate communications, used a current-generation face-swap and voice-clone toolchain to build the deepfake participants, and ran the video call live with a synthesised script tailored to the target's verification expectations. The toolchain cost was estimated by independent analysts at under $5,000 in compute and software, against a $25.6M attack outcome. The unit economics are unfavourable to defenders by a factor of approximately 5,000:1.

The defender lesson from Arup is not that the employee was negligent; the employee did exactly what training would recommend, which is to verify the request via a video channel. The lesson is that in-channel verification (joining a Teams or Zoom call to confirm) is no longer a sufficient control because the attacker can simultaneously control both the request channel and the verification channel. Verification has to operate through an entirely different channel that the attacker cannot simultaneously control, such as an in-person confirmation, a callback to a phone number held in the HR system (not the number in the email or any other inbound channel), or a hardware-token-signed approval workflow.[HK Police press release Feb 2024 + Arup public statement May 2024 + independent toolchain-cost analyses]

Exhibit C

The Q1 2025 wave: $200M+ in disclosed losses

The Q1 2025 deepfake-vishing wave produced $200M+ in publicly-disclosed defender losses, drawn from SEC 8-K filings (under Item 1.05 cybersecurity-incident disclosure), court records of attempted recoveries, and major-incident press disclosures. The disclosed cases cluster in three loss patterns.

Pattern	Typical destination	Per-event loss band	Recovery posture
APAC subsidiary to HK/UAE wire	Hong Kong, UAE	$8M-$30M	Near zero (no IC3 jurisdiction)
Intra-EU SEPA fast transfer	EU member state	$1M-$5M	Single-digit % (irreversible scheme)
US domestic 72h-exploit wire	US bank	$3M-$15M	~30% gross with FFKC; near zero without

The true total wave figure is materially higher than the $200M public floor because the disclosure threshold for SEC Item 1.05 is the materiality bar, which many of the smaller losses do not cross for larger issuers, and because private-company and non-US losses are not subject to equivalent public-disclosure requirements. Independent law-enforcement and IR-vendor estimates place the true Q1 2025 wave at $500M-$1B in aggregate defender loss; the precise figure is unrecoverable from public sources but the order of magnitude is well-supported.[Aggregated public 2025 Q1 disclosures + Chainalysis 2025 deepfake-fraud annex + independent IR-vendor analyses]

Exhibit D

The voice-clone commoditisation timeline

The technical inflection that drove the 2024-2025 deepfake-vishing surge was the commoditisation of voice-cloning tools through 2022-2024. Pre-2022, voice cloning required either a large source-audio corpus (hours, not minutes) or a professionally-trained voice imitator. The combination was operationally expensive and limited the attack to very high-value targets only.

The 2022 release of several open-weight voice-cloning models (ElevenLabs production release, several Hugging Face-hosted alternatives, and various derivative tooling) compressed the source-audio requirement to approximately 60 seconds while improving output quality. The 2023 release of real-time voice-conversion tooling (which produces the clone live during a phone call rather than as a pre-recorded message) added the operational capability needed for interactive social-engineering attacks. By 2024 the toolchain was commodity-priced at sub-$10 per month for consumer-grade hosted services and free for self-hosted open-weight models running on a gaming-tier GPU.

The attacker can in practice produce a usable deepfake-voice clone of any executive at any public company in under one hour at zero meaningful marginal cost. Source audio is universally available through earnings calls, conference recordings, podcast appearances, and corporate communications. The attacker's only meaningful constraint is the operational overhead of running the call itself, which has also been reduced through AI-driven script-following and conversation-management tooling. The result is that the deepfake-vishing attack class is now structurally accessible to attackers at every skill level, not just nation-state or top-tier criminal operators.[ENISA AI Threat Landscape 2024 + voice-cloning toolchain analyses 2022-2024]

Exhibit E

The Ferrari and WPP procedural defences that worked

Two publicly-disclosed cases through 2024 demonstrate that procedural defences can defeat deepfake-vishing attacks. The Ferrari case of September 2024 involved an attacker placing deepfake-voice calls to a senior Ferrari executive impersonating the CEO Benedetto Vigna and instructing the executive to assist with a confidential acquisition. The Ferrari executive became suspicious and asked a verification question about a book the CEO had recently recommended. The attacker could not answer correctly because the personal context was not available in the public-corpus training data, and the attempt failed.

The WPP case of May 2024 involved CEO Mark Read disclosing publicly that attackers had cloned his image and voice for a WhatsApp video call attempting to extract confidential information from a senior WPP executive. The attempt failed because the targeted executive triggered an out-of-band confirmation procedure rather than verifying within the channel under attack. WPP's public disclosure was unusually candid and provided the industry with a working reference for the WhatsApp delivery channel and the procedural defence that worked against it.

The shared lesson from Ferrari and WPP is that procedural defences which operate outside the channel under attack do work. A verification question requiring personal context the attacker cannot OSINT, a callback to a phone number held in the HR system rather than the inbound channel, an in-person confirmation requirement for high-value actions: each of these defences works regardless of how convincing the deepfake is, because they bypass the deepfake's strength (in-channel verisimilitude) entirely. The procedural-defence approach is the only meaningful control category in 2026 because the technical countermeasures remain immature.[Ferrari Sept 2024 public reporting + WPP May 2024 disclosure + procedural-defence pattern analyses]

Exhibit F

Why technical countermeasures remain immature

Several technical-countermeasure categories have been proposed and partially deployed to address deepfake vishing. Each has limitations that prevent it from being a meaningful primary defence.

Voice authentication (voice biometrics)

Limitation: Vulnerable to high-quality voice clones. The 2024 generation of voice-cloning tools defeats most consumer-grade voice-authentication systems and many enterprise-grade systems.

Audio-watermarking on legitimate corporate audio

Limitation: Requires industry-wide adoption to be useful. Currently deployed by a small number of major media companies. Does not help against impersonation of an executive whose legitimate audio is not watermarked.

Anti-deepfake AI detection

Limitation: Detection-versus-generation arms race favours generation. Current detection accuracy against current generation models is approximately 70-85 percent under research conditions and much lower in production. Detection lag behind generation is approximately 6-12 months.

Liveness detection on video calls

Limitation: Defeated by current-generation real-time face-swap tooling combined with live operator behind the deepfake avatar. Adds friction but does not provide a reliable verification signal.

STIR/SHAKEN voice-call authentication

Limitation: Authenticates the calling-number origin, not the speaker identity. A legitimate caller-number can be paired with a deepfake voice. Useful as one signal but does not address the speaker-identity question.

The pragmatic implication is that the procedural-defence pillar is the only mature control category in 2026 and likely through at least 2027-2028. Organisations should not delay procedural-control implementation while waiting for technical countermeasures to mature, because the technical countermeasures are not on a clear timeline to mature.[NIST AI Risk Management Framework 1.0 + ENISA AI Threat Landscape 2024 + academic deepfake-detection literature]

Exhibit G

Frequently filed questions

ON RECORD

What is deepfake vishing?[open]

Combines voice-cloning technology with traditional voice-phishing. Attacker uses 60 seconds of source audio to produce a real-time voice clone of an executive, then calls a target employee impersonating that executive.

How much did the Arup case cost?[open]

USD $25.6M (HK$200M) across 15 transfers to five Hong Kong bank accounts in February 2024. Zero recovery. Canonical reference for deepfake-enabled whaling.

What is the Q1 2025 wave?[open]

$200M+ in publicly-reported losses across the disclosed cohort, drawn from SEC 8-K filings, court records, and press disclosures. True total estimated at $500M-$1B in aggregate.

How much does a voice clone cost the attacker?[open]

Sub-$10 per month for consumer-grade hosted services. Free for self-hosted open-weight models on gaming-tier GPU. Five orders of magnitude below typical attack outcome value.

Did Ferrari and WPP defences work?[open]

Yes, both. Ferrari executive asked a verification question requiring personal context the attacker could not OSINT. WPP executive triggered out-of-band confirmation procedure. Both attempts failed because the targeted employees stepped outside the channel under attack.

What stops deepfake vishing?[open]

Procedural defences only. Hard organisational rules that out-of-pattern wire requests are verified by an out-of-band channel (phone call to known HR-system number, in-person confirmation, hardware-token approval) that the attacker cannot simultaneously control.

Are technical countermeasures effective?[open]

No, not as primary defence. Voice biometrics, anti-deepfake AI detection, audio-watermarking, and liveness detection all have material limitations. The procedural-defence pillar is the only mature control category in 2026.

Exhibit H