Please use this identifier to cite or link to this item:
Title: Large scale strategic decision making in multi-agent systems
Authors: Chen, Haipeng
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Issue Date: 2018
Source: Chen, H. (2018). Large scale strategic decision making in multi-agent systems. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: With the rapid development over the recent years, artificial intelligence (AI) technologies (e.g., machine learning, data mining and multi-agent systems) are becoming more and more popular among large companies and governments. In both academy and industry, there exist vast research opportunities in the interdisciplinary areas of AI and traditional research (e.g., game theory, resource allocation, operations research and transportation). This thesis is targeted at engineering solutions methodologies across the range of possible multi-agent system requirements. The range coverage was achieved by considering specific significant representative problems such as the airlines industry, cloud computing and public transportation. More specifically, when making strategic decisions, the utility of one agent is usually affected by the decisions of multiple other agents. For example, the market share of an airline is not only affected by its own pricing strategy, but also by its competitors’ pricing strategies. The large scale of the problem domain, the complex interactions among multiple agents, together with the uncertainties and dynamism from the environment, make this class of problems extremely challenging. To this end, this thesis investigates large scale strategic decision making in multi-agent systems, provides analysis over the problems and propose several solution algorithms to assist the agents in making intelligent decisions. We start with investigating a relatively simple scenario where the strategies of other agents are fixed. This scenario is motivated by a practical problem where an airline wants to maximize its profit by optimally deciding its flight frequencies over its multiple routes, subject to certain budget constraints. In making optimal flight frequency strategies, there are two key issues. First, we need to solve a prediction problem, i.e., given a certain flight frequency over a route, what will be the profit of that route? Second, using the profit prediction as a sub-routine, we need to solve a resource allocation problem, i.e., given a cost budget constraint, how to allocate different flight frequencies to different routes, so that the overall profit of an airline of maximized? To solve this problem, we propose the MAP (Maximizing Airline Profits) architecture, which consists of a novel Ensemble Forecasting (MAP-EF) approach and two frequency allocation algorithms, i.e., Bilevel Branch and Bound (MAP-BBB) and MAP-Greedy (MAP-G). Experimental results show that by using MAP, airlines can increase profits by a significant margin We then extend our framework to scenarios where there are interactions among the multiple agents, and the utility of one agent is dependent on the other agents’ strategies. First, we extend our MAP architecture where all the airlines are strategic in deciding their flight frequencies. This problem is formulated as competitive game, and we propose the ITER-Freq algorithm to compute the Nash equilibrium strategies of the competing airlines. We also provide extensive analysis (both empirically and theoretically) on the existence of Nash equilibrium as well as the performance of the algorithm. Second, we consider another practical, and yet more complex scenario in the cloud computing industry, where a private (secondary) cloud wants to enhance its service quality by 1) outsourcing workload to public (primary) clouds via vertical federation or 2) sharing resources with other secondary clouds through horizontal federation. We analyse the interrelated workload factoring and federation formation among secondary clouds, while providing scalable algorithms to assist them to optimally select partners and outsource workload. We use a game theoretic approach to model the federation formation of clouds as a coalition game with externalities. The results also show that compared with the two common practices, secondary clouds can decrease service delay penalty by around 11% with the proposed VHCF (Vertical and Horizontal Cloud Federation) framework. Thrid, we consider sequential decision making in dynamic environments. We first focus on a dynamic tolling problem in a traffic network, which is motivated by deficiencies of Singapore’s current static Electronic Toll Collection (ETC, also called Electronic Road Pricing in Singapore). We propose a novel dynamic ETC (DyETC) scheme which adjusts tolls to traffic conditions in realtime. The DyETC problem is formulated as a Markov decision process (MDP), the solution of which is very challenging due to its high dimension state and action spaces. Due to the complexity of the formulated MDP, existing methods cannot be applied to our problem. Therefore, we develop a novel algorithm, PG- (Policy Gradient method with Beta distribution based policy functions), which belongs to the family of reinforcement learning algorithms Finally, we study a scenario where self-interested agents make optimal sequential decisions in complex multiagent systems, and apply our designed algorithm on a representative example via a game scenario on the Minecraft platform. As an open problem for many years, various characteristics, such as complex interactions among agents, uncertainties, sequential decision making and limited learning trials all make it extremely challenging to find effective strategies. We make two key innovations to this class of problems. One key innovation is a generalized agent type hypothesis framework to identify the behavior model of the other agents, which is demonstrated to be robust to observation uncertainty. On top of that, a second key innovation is a novel Q-learning approach to learn effective policies against each type of the collaborating agents. Various ideas are proposed to adapt traditional Q-learning to handle complexities, including state-action abstraction to reduce problem scale, a warm start approach using human reasoning for addressing limited learning trials, and an active greedy strategy to balance exploitation-exploration. Applying our proposed solution algorithms to the Microsoft Malmo Collaborative AI Challenge (MCAC) in 2017, our agent HogRider won the first place among 81 teams from 26 countries.
DOI: 10.32657/10356/74968
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:IGS Theses

Files in This Item:
File Description SizeFormat 
thesis_HP_final.pdfMain article8.32 MBAdobe PDFThumbnail

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.