Please use this identifier to cite or link to this item:
|Title:||Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning||Authors:||Zhang, Shizhuo||Keywords:||Science::Mathematics||Issue Date:||2022||Publisher:||Nanyang Technological University||Source:||Zhang, S. (2022). Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157056||Abstract:||We study the problem of sequential task allocation among selfish agents through the lens of dynamic mechanism design framework. In this game, the manager has to maximize its own utility in face of a random team of selfish agents.The problem assumes a discrete-time setting in which each time step comprises of two sub-procedures: 1) contracting, where the manager offers payments to ask each agent to pursue certain goals and agents decide on whether they are satisfied; and 2) acting. The complication of this set-up lies in that reporting is involved as in traditional mechanism design settings, and truthful revelation of hidden information is impossible. Meanwhile, the agents act in a high-dimensional space, adding to the difficulty of making proper assumptions and devising optimization algorithms. To this end, we leverage the power of deep reinforcement learning. It is necessary to model the agents’ hidden information for the manager to make correct decisions, while this makes the learning problem non-Markovian, causing complications in applying reinforcement learning algorithms. We proposed a framework to tackle historical dependency leveraging the strong representation learning capability of deep learning methods and gradient-based multi-task updates, allowing the RL-based manager to act in a Markov latent space. We proposed the use of successor-representation based intrinsic reward to encourage strategic exploration. We performed empirical studies in various game settings to demonstrate the power of our proposed framework.||URI:||https://hdl.handle.net/10356/157056||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SPMS Student Reports (FYP/IA/PA/PI)|
Updated on May 18, 2022
Updated on May 18, 2022
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.