Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/154587
 Title: Adversarial robustness of deep reinforcement learning Authors: Qu, Xinghua Keywords: Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Issue Date: 2021 Publisher: Nanyang Technological University Source: Qu, X. (2021). Adversarial robustness of deep reinforcement learning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/154587 Abstract: Over the past decades, the advancements in deep reinforcement learning (DRL) have demonstrated that deep neural network (DNN) policies can be trained to prescribe near-optimal actions in many complex tasks. Unfortunately, DNN policies are shown to be vulnerable to adversarial perturbations in the input states, which creates obstacles for the real-world deployments of RL agents, especially on those security-sensitive tasks. Different adversarial attacks to understand the vulnerability and corresponding defense approaches to resist against attacks have been proposed. Although some achievements and interesting findings have been observed, existing adversarial attacks are deemed to be less realistic due to the extensive assumptions utilized, such as white-box policy access and full-state adversary setting. Moreover, existing adversarial defense approaches are largely built on the adversarial training, an adversary dependent defense that is deemed to be less realistic in the wild. Given the research gaps, investigating more realistic adversarial robustness evaluation procedures (i.e., through adversarial attacks) and accordingly designing robust policies have been becoming significant but under developed topics in DRL. In this dissertation, firstly, we propose a minimalistic attack in Chapter 3 by taking a more restrictive view towards adversary generation - with the goal of unveiling the limits of a DRL model's vulnerability. To this end, we define three key settings: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbing only 1% frames. Secondly, considering the computational complexity of the minimalistic attacks (in Chapter 3) caused by treating every frame in isolation, Chapter 4 showcases the first study of how transferability across frames could be exploited for boosting the creation of {minimal} yet powerful attacks in image-based RL. In doing so, we introduce three types of frame-correlation transfers (i.e., anterior case transfer, random projection based transfer, and principal components based transfer) with varying degrees of computational complexity in generating adversaries via a genetic algorithm. We empirically demonstrate the trade-off between the complexity and potency of the transfer mechanism by exploring four fully trained state-of-the-art policies on six Atari games. Our frame-correlation transfers dramatically speed up the attack generation compared to existing methods, often significantly reducing the required computation time; thus, shedding light on the real threat of real-time attacks in RL. Last but not the least, to alleviate the vulnerability issue of DRL, in Chapter \ref{chapter:A2PD}, we propose an adversary agnostic defense approach in order to increase the robustness of existed DRL policies. Particularly, to increase the robustness of DRL policies, previous approaches assume that the knowledge of adversaries can be added into the training process to achieve the corresponding generalization ability on these perturbed observations. However, such an assumption not only makes the robustness improvement more expensive, but may also leave a model less effective to other kinds of attacks in the wild. In contrast, we propose an adversary agnostic robust DRL paradigm that does not require learning from adversaries. To this end, we first theoretically derive that robustness could indeed be achieved independently of the adversaries based on a policy distillation setting. Motivated by this finding, we propose a new policy distillation loss with two terms: 1) a prescription gap maximization loss aiming at simultaneously maximizing the likelihood of the action selected by the teacher policy and the entropy over the remaining actions; 2) a corresponding Jacobian regularization loss that minimizes the magnitude of gradient with respect to the input state. The theoretical analysis shows that our distillation loss guarantees to increase the prescription gap and the adversarial robustness. Furthermore, experiments on five Atari games firmly verify the superiority of our approach in terms of boosting adversarial robustness compared to other state-of-the-art methods. Most importantly, we hope this dissertation will provide a useful starting point for both evaluation and improvement of adversarial robustness in DRL. URI: https://hdl.handle.net/10356/154587 DOI: 10.32657/10356/154587 Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). Fulltext Permission: open Fulltext Availability: With Fulltext Appears in Collections: SCSE Theses

###### Files in This Item:
File Description SizeFormat
Thesis_XinghuaQU_Signed.pdf3.18 MBAdobe PDF

#### Page view(s)

79
Updated on May 17, 2022

#### Download(s) 50

93
Updated on May 17, 2022

Check

#### Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.