Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/148803
Title: Investigating sim-to-real transfer for reinforcement learning-based robotic manipulation
Authors: Cheng, Jason Kuan Yong
Keywords: Engineering::Electrical and electronic engineering
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Cheng, J. K. Y. (2021). Investigating sim-to-real transfer for reinforcement learning-based robotic manipulation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/148803
Project: B3224-201
Abstract: In this project, model-free Deep Reinforcement Learning (DRL) algorithms were implemented to solve complex robotic environments. These include low- dimensional and high-dimensional robotic tasks. Low-dimensional tasks have state inputs that are discrete values such as robotic arm joint angles, position, and velocity. High-dimensional tasks have state inputs that are images, with camera views of the environment in various angles. The low dimensional robotic environments involve CartPole Continuous, Hop- per, Half-Cheetah and Ant Bullet environments using the PyBullet (Coumans and Bai, 2016–2019) as the back-end physics engine for the robotic simulator. The high dimensional robotic manipulation tasks involve Open Box, Close Box, Pick- up Cup, and Scoop with Spatula, from the RLBench (James et al., 2020) task implementations. From the results of the experiments, off-policy algorithms like Deep Deter- ministic Policy Gradients (DDPG) and Twin-Delayed Deep Deterministic Policy Gradeints (TD3) outperformed the other algorithms on low dimensional tasks due to learning from experience replay, thereby having superior sample efficiency compared to on-policy algorithms like Trust Region Policy Optimisation (TRPO) and Proximal Policy Optimisation (PPO). For the high-dimensional environments, only the Option-Critic algorithm was able to solve some of the environments like open-box and close-box. Off-policy algorithms do not perform well due to the high memory constraint related to holding images in experience replay, thus the agent could not learn well from experience replay. On-policy algorithms are also not able to learn well from high-dimensional environments as they are unable to generalise due to the sparse reward signals. No algorithms implemented were able to solve the more complex manipulation tasks like scoop with spatula and pick-up cup as the agent were not able to solve the task using random exploration to get the sparse reward signal to learn from.
URI: https://hdl.handle.net/10356/148803
Schools: School of Electrical and Electronic Engineering 
Organisations: Institute of High-Powered Computing
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
NTU_FYP_Report.pdf
  Restricted Access
5.1 MBAdobe PDFView/Open

Page view(s)

319
Updated on Mar 25, 2025

Download(s)

8
Updated on Mar 25, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.