Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/176294
Title: | Curriculum learning improves compositionality of reinforcement learning agent across concept classes | Authors: | Lin, Zijun | Keywords: | Computer and Information Science | Issue Date: | 2024 | Publisher: | Nanyang Technological University | Source: | Lin, Z. (2024). Curriculum learning improves compositionality of reinforcement learning agent across concept classes. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/176294 | Abstract: | The compositional structure afforded by language allows humans to decompose complex phrases and map them to novel visual concepts, demonstrating flexible intelligence. Although there have been several algorithms that can demonstrate compositionality, they do not give us insights on how humans learn to compose concept classes to ground visual cues. To study this multi-modal learning problem, we created a 3-dimensional environment, where a reinforcement learning agent has to navigate to a location specified by a natural language phrase (instruction). The instruction is composed of nouns, attributes and additionally, determiners or prepositions. This visual grounding task increases the compositional complexity for reinforcement learning agents, as navigating to the blue cubes above some red spheres will not be rewarded when the instruction is to navigate to “some blue cubes below the red sphere”. We first demonstrate that reinforcement learning agents can ground determiner concepts to visual scenes but struggle to ground the more complex preposition concepts. Secondly, we show that curriculum learning, a strategy employed by humans, improves concept learning efficiency by reducing the total number of training episodes needed to achieve a certain performance criterion by 15% in determiner environment. Moreover, it enables the agents to learn the preposition concepts. Lastly, we establish that agents trained on determiner or preposition concepts can decompose held-out test instructions, and also rapidly map their navigation policies to unseen visual object combinations. Various text encoders are also being compared to see whether they could facilitate the agents’ training. To conclude, our results clarify that multi-modal reinforcement learning agents can achieve compositional understanding of complex concept classes, and demonstrate the effectiveness of human-like learning strategies to improve the learning efficiency for artificial systems. | URI: | https://hdl.handle.net/10356/176294 | Schools: | School of Electrical and Electronic Engineering | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | EEE Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Lin Zijun -- Final Report.pdf Restricted Access | 9.25 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.