Disrupting Adversarial AI: An Overview of Google’s Framework for Evaluating Emerging Cyberattack Capabilities of AI


Introduction

As technology continues to advance at an exponential rate, traditional methods of attack and defense are quickly outpaced by new tools and methods. The latest example of this is the use of AI by both attackers and defenders. The complexity levels of previously advanced attacks have the potential of being reduced, allowing them to be utilized by a larger population of adversary. As well, it is a force-multiplier for APTs, allowing them to be more efficient and effective, and therefore more dangerous and difficult to defend against.

The team from Google DeepMind recently released a paper in which they describe this advancement and the need to development of a new framework that integrates AI threat evaluation with existing kill chain frameworks. This article will provide a brief overview of that effort. I suggest reading the paper in full to get a full understanding of the studies and methodologies used to develop the framework.


Reasoning for a New Framework

According to the team, current methods of identifying where AI can be leveraged by an attacker are “ad-hoc, lacking systematic analysis of attack phases and guidance on targeted defenses,”¹ and the paper aims to address this by “(1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations and conduct AI-enabled adversary emulation for red teaming.”¹

As I previously mentioned, AI significantly increases the accessibility of sophisticated malicious attack methods, and increases the efficiency and effectiveness of already sophisticated threat actors. The Google team has also identified these AI risks and refers to them as capability uplift and throughput uplift, respectively. They have also identified a third, that being the ability of autonomous systems to create “new threats via automated reconnaissance, social engineering, and autonomous cyber agents, boosting attack effectiveness and discretion.”¹ These three risks encapsulate their primary argument for the framework, or what they refer to as The Cost Collapse Argument:

To bridge the gap between AI evaluations and actionable defense insights, we must consider how advanced AI could fundamentally alter cyberattack economics. We argue the primary risk of frontier AI in cyber is its potential to drastically reduce costs for attack stages historically expensive, time-consuming, or requiring high sophistication.¹

Further, the team explains that:

Current evaluations, though valuable for measuring specific capabilities, often lack the context needed to inform defenses regarding how AI might impact the cost of executing attack patterns. Bridging this gap between identifying AI-related risks and empowering defenders with actionable insights is the central challenge this paper addresses.”¹

The framework aims to systematically evaluate AI cyberattack capabilities across the end-to-end attack chain, inform AI-enabled adversary emulation, help identify gaps in AI threat evaluation, and provide defenders insights on where to target and prioritize defenses.¹


The Framework Methodology

The framework was developed through four phases, and resulted in seven representative cyberattack chain archetypes, and “a new AI cyber capability benchmark with 50 challenges across the attack chain, covering intelligence gathering, operational security, vulnerability exploitation, and malware development.”¹ The result is a methodology that calculates AI’s ability to reduce costs of attacks and determines where in the attack chain its use can be disrupted.

The framework was developed in four phases:

  • Stage 1: Curating a Basket of Representative Attack Chains

  • Stage 2: Bottleneck Analysis Across Representative Attack Chains

  • Stage 3: Devising Targeted Cybersecurity Model Evaluations

  • Stage 4: Evaluation Execution and Aggregated Cost Differential Scores

In the first stage, the team “analyzed over 12,000 real-world instances of AI use attempts in cyberattacks and utilized a large dataset of cyber incidents from Google’s Threat Intelligence Group and Mandiant” to obtain “the breadth and depth of the threat landscape.”¹

In the second stage, the team conducted analysis of “bottlenecks” which are points in the attack chain that present significant hurdles for an adversary and indicate potential opportunity’s for disruption.¹ In this phase they subjectively quantified the costs to an attacker at each phase of an attack, and identified '“critical phases in the attack lifecycle most susceptible to AI influence.”¹ The bottlenecks collected in the study can be found in Appendix A of the paper.

In the third stage the team created evaluations for measuring an AI’s ability to reduce associated costs of each bottleneck.¹ This entailed simulating real-world conditions, and quantifying cost reduction metrics such as time to completion, success rate, capability level required, and capability level required.¹

In the fourth stage the team executed the evaluations to “assess an AI model’s potential cost impact across the representative attack chains” and “provide a ‘cost differential score’ for the model, capturing its potential to amplify offensive cyber capabilities.”¹ The result being that a higher score indicates an area requiring higher mitigation prioritization.¹

Using their collected knowledge they applied the following criteria:

  • Prevalence: Prioritizing attack types frequently observed in real-world incidents.

  • Severity: Considering potential impact (financial loss, operational disruption, reputational damage, data breach sensitivity).

  • Likelihood to Benefit from AI: Prioritizing attack types where AI could offer substantial "capability" or "throughput uplift," informed by real-world AI misuse data and capability evaluations. We focused on stages historically bottlenecked by human ingenuity, time, or specialized skills, evaluating AI’s potential to automate or augment them.

From this, the seven attack chain archetypes were derived:

Phishing
Malware
Denial-of-Service (DOS)
Man-in-the-Middle (MitM)
SQL Injection
Zero-Day Attack
Cross-Site Scripting (XSS)

The archetypes are the basis for applying the bottleneck analysis and evaluation methodologies and are based on “real-world patterns relevant to emerging AI capabilities.”¹

The paper includes figures representing the concepts described above, omitted here because of potential copyright restrictions; as well as technical details of their methodologies, omitted here for brevity.


The Framework in Action

Finally, the framework assists defenders in:

  • Threat Coverage Gap Assessment

  • Development and Deployment of Targeted Mitigations

  • Grounding AI-enabled Adversary Emulation

  • Benchmarking Defenses

One thing to clarify is that the paper does not make suggestions about controls to combat AI at any specific point in the kill chain. Instead, it provides a framework for the continuous evaluation of real-world AI attacks, and in which phase of an attack there is the most potential to reduce its effectiveness. The framework in practice seems to lend itself well to a heatmap model for highlighting this, as shown in Figure 12 and Figure 13 of the paper.


Summary

Attackers are constantly seeking novel ways to become more effective and efficient, and AI poses a significant risk because of its ability to assist them. The Google DeepMind team has taken a major step toward empowering defenders with a solid, standardized approach to weighing the risk in real-world scenarios, allowing for the identification of weak points in threat actors’ ability to leverage AI.


References

¹Google DeepMind. Frontier safety framework 2.0, 2025.
URL https: //deepmind.google/discover/blog/ updating-the-frontier-safety-framework/


Daily Cuppa

Today’s cup of tea is Organic Earl Grey provided by Equal Exchange. Fair trade, organic, and delicious with a comforting aroma like the scent of well-read books and pipe tobacco.


If you enjoyed this article feel free to buy the author a cup of tea.

Previous
Previous

Cybersecurity Trend Reports for the Well-Informed IT Security Pro

Next
Next

I, Cyborg: A Shallow Dive Into the Security of Brain-Computer Interfaces (BCIs)