Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/182357
Title: | Scalable, generalizable, and offline methods for imperfect-information extensive-form games | Authors: | Li, Shuxin | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Li, S. (2025). Scalable, generalizable, and offline methods for imperfect-information extensive-form games. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182357 | Abstract: | Imperfect-information extensive-form games (IIEFGs) offer a versatile framework for modeling interactions between multiple players in stochastic and imperfect-information scenarios. Due to their superior modeling capabilities, IIEFGs can be applied in a wide range of fields, including economics and computer science. Given their broad application, IIEFGs have become a vital area of research in game theory. There are numerous algorithms developed to solve IIEFGs, ranging from traditional programming-based to advanced deep learning-based methods. Although these algorithms have been proven successful, they still suffer from scalability and generalizability issues when solving large-scale games. Furthermore, these algorithms require continuous interaction with the game environment, which impedes the applications in many real-world scenarios where the environment may be impractical. To mitigate these issues, this thesis is devoted to improving the scalability and generalizability and extending offline learning to the IIEFGs setting. Our first emphasis is on improving the scalability of existing algorithms to solve IIEFGs. We propose two scalable algorithms: CFR-MIX and PSRO with pre-trained strategies which are based on the counterfactual regret minimization (CFR) and policy space response oracle (PSRO) frameworks, respectively. CFR-MIX is designed to address the exponential combinatorial action space problem in a special type of IIEFGs known as team-adversary games. In the CFR-MIX algorithm, we first propose a new strategy representation that represents a joint action strategy using the individual strategies of all agents, maintaining cooperation between agents through a consistency relationship. To compute the equilibrium with the new strategy representation under the CFR framework, we transform the strategy consistency relationship into a consistency relationship between cumulative regret values. We then introduce a novel decomposition method for cumulative regret values to ensure this consistency. CFR-MIX algorithm employs a mixing layer to implement the decomposition method by estimating cumulative regret values of joint actions as a non-linear combination of the cumulative regret values of individual actions. Experimental results show that CFR-MIX significantly outperforms existing algorithms. PSRO with pre-trained strategies method is designed to solve the long-term decision-making issue in large-scale pursuit-evasion games (PEGs), the special type of two-player zero-sum IIEFGs. This algorithm integrates the pre-training and fine-tuning paradigm into the PSRO framework. Specifically, we first pre-train the pursuer's policy base model against many different strategies of the evader. Then we proceed with the PSRO loop and fine-tune the pre-trained policy to attain the pursuer's best responses. Empirical evaluation shows that our approach significantly outperforms baselines in terms of speed and scalability. The next focus is on enhancing the generalizability of existing algorithms for solving PEGs. We propose a generalizable framework, Grasper, designed to generate pursuer policies tailored to specific PEGs. Grasper features a novel architecture with two components: a graph neural network (GNN) to encode PEGs into hidden vectors and a hypernetwork to generate pursuer policies based on these hidden vectors. We develop a three-stage training pipeline involving a pre-pretraining stage for training the GNN through the GraphMAE algorithm, a pre-training stage for training the hypernetwork by utilizing heuristic-guided multi-task learning, and a fine-tuning stage for fine-tuning the pursuer base model generated by the hypernetwork to compute the best response strategy. Experimental results demonstrate that Grasper outperforms baselines in both solution quality and generalizability. The final emphasis is on applying offline learning to game solving. We introduce the offline equilibrium finding (Offline EF) paradigm, which computes equilibrium strategies from offline datasets. Then we construct diverse datasets encompassing a range of games using several methods. These datasets serve as a basis for evaluating the performance of offline algorithms. Next, we design a novel framework, BOMB, which integrates behavior cloning and model-based methods along with a novel parameter estimation method. The model-based method adapts any online EF algorithm to the offline setting. Then, our theoretical analysis provides performance guarantees of BOMB across various datasets. Experiments demonstrate BOMB’s superior efficiency over offline RL methods in computing equilibrium strategies. To conclude, this doctoral thesis addresses key challenges in solving imperfect-information extensive-form games, focusing on scalability, generalizability, and offline learning. Through innovative algorithms like CFR-MIX, PSRO with pre-trained strategies, Grasper, and BOMB, it advances the state of the art, offering efficient and practical solutions for both theoretical and real-world applications. | URI: | https://hdl.handle.net/10356/182357 | DOI: | 10.32657/10356/182357 | Schools: | College of Computing and Data Science | Rights: | This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
PhD_Thesis__Shuxin.pdf | 6.87 MB | Adobe PDF | View/Open |
Page view(s)
117
Updated on Mar 16, 2025
Download(s) 50
90
Updated on Mar 16, 2025
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.