Skip to main navigation Skip to search Skip to main content

On Evaluating Stateful Defence Models against Query-Based Black-Box Attacks

  • Norwegian University of Science and Technology
  • Covatic Limited

Research output: Contribution to conferencePaperpeer-review

Abstract

Stateful Defence Models (SDMs) aim to detect the process of adversarial example generation during the query stage. Although they are not designed to counter zero-query attacks, they have shown varying levels of success against query-based black-box attacks. Recently, several SDMs have claimed $100$% robustness against query-based attacks, which is an extraordinary assertion requiring a thorough evaluation. In this work, we show that such defenses exhibit both shared and system-specific weaknesses. Exposing the vulnerabilities requires following a standard set of evaluation strategies, which we propose in our paper.
Furthermore, we show that these vulnerabilities are amplified under DazzlePatch, a novel patch attack that uniquely replaces the borders of the input during the query phase to minimize detection while perturbing the central patch using standard query-based attacks. To ensure compliance with the ℓ∞ threat model, the attack restores the original borders in the final iteration, yielding a valid adversarial example within the permissible perturbation budget. Our results demonstrate a substantial reduction in detection rates and a corresponding increase in attack success rates across multiple SDMs. We then show that incorporating input randomisation, such as Random-Resized Cropping (RRC), significantly enhances SDM robustness, reducing attack success rates by up to $26.5$%. These findings suggest that while current SDMs are vulnerable to tailored adaptive attacks, integrating them with additional defense mechanisms may offer improved resilience.
Original languageEnglish
Publication statusPublished (VoR) - 1 Jun 2026

Funding

Funders
Covatic Limited

    Fingerprint

    Dive into the research topics of 'On Evaluating Stateful Defence Models against Query-Based Black-Box Attacks'. Together they form a unique fingerprint.

    Cite this